【问题标题】:How to read large NetCDF data sets without using a for - Python如何在不使用 for 的情况下读取大型 NetCDF 数据集 - Python
【发布时间】:2021-04-15 06:38:38
【问题描述】:

早上好,我在python中读取一个包含气象信息的大型netCDF文件时遇到问题,该信息必须通过它来组装信息然后将其插入数据库,但是它需要花费时间并且组装信息太多了,我知道必须有其他方法可以更有效地执行相同的过程,目前我通过代码下方的for循环访问信息

 content = nc.Dataset(pathFile+file)
 XLONG, XLAT = content.variables["XLONG"], content.variables["XLAT"]
 Times = content.variables["Times"]  #Horas formar b 'b
 RAINC  =  content.variables["RAINC"] #Lluvia
 Q2 = content.variables["Q2"] #Humedad especifica
 T2 = content.variables["T2"] #Temperatura
 U10 = content.variables["U10"] #Viento zonal
 V10 = content.variables["V10"] #Viento meridional
 SWDOWN = content.variables["SWDOWN"] #Radiacion incidente
 PSFC = content.variables["PSFC"] #Presion de la superficie
 SST = content.variables["SST"] #Temperatura de la superficie del mar
CLDFRA = content.variables["CLDFRA"] #Fraccion de nubes

 for c2 in range(len(XLONG[0])):
    for c3 in range(len(XLONG[0][c2])):
    position += 1  
    for hour in range(len(Times)):
        dateH = getDatetimeInit(dateFormatFile.hour) if hour == 0 else getDatetimeForHour(hour, dateFormatFile.hour)
        hourUTC = getHourUTC(hour)        

        RAINH = str(RAINC[hour][0][c2][c3])
        Q2H = str(Q2[hour][0][c2][c3])
        T2H = str(convertKelvinToCelsius(T2[hour][0][c2][c3]))
        U10H = str(U10[hour][0][c2][c3])
        V10H = str(V10[hour][0][c2][c3])
        SWDOWNH = str(SWDOWN[hour][0][c2][c3])
        PSFCH = str(PSFC[hour][0][c2][c3])
        SSTH = str(SST[hour][0][c2][c3])
        CLDFRAH = str(CLDFRA[hour][0][c2][c3] )


        rowData = [idRun, functions.IDMODEL, idTime, position, dateH.year, dateH.month, dateH.day, dateH.hour, RAINH, Q2H, T2H, U10H, V10H, SWDOWNH, PSFCH, SSTH, CLDFRAH]           
        dataProcess.append(rowData)

【问题讨论】:

    标签: python python-3.x dataset netcdf4


    【解决方案1】:

    我会使用 NumPy。让我们假设您有带有 2 个变量“t2”和“slp”的 netCDF。然后你可以使用下面的代码来向量化你的数据:

    #!//usr/bin/env ipython
    # ---------------------
    import numpy as np
    from netCDF4 import Dataset
    # ---------------------
    filein = 'test.nc'
    ncin = Dataset(filein);
    tair = ncin.variables['t2'][:];
    slp  = ncin.variables['slp'][:];
    ncin.close();
    # -------------------------
    tairseries = np.reshape(tair,(np.size(tair),1));
    slpseries =  np.reshape(slp,(np.size(slp),1));
    # --------------------------
    ## if you want characters:
    #tairseries = np.array([str(val) for val in tairseries]);
    #slpseries = np.array([str(val) for val in slpseries]);
    # --------------------------
    rowdata = np.concatenate((tairseries,slpseries),axis=1);
    # if you want characters, do this in the end:
    row_asstrings = [[str(vv) for vv in val] for val in rowdata]
    # ---------------------------
    

    不过,我感觉使用字符串并不是一个好主意。在我的示例中,从数值数组到字符串的转换需要很长时间,因此在连接之前我没有实现它。

    如果您还想要一些时间/位置信息,您可以这样做:

    #!//usr/bin/env ipython
    # ---------------------
    import numpy as np
    from netCDF4 import Dataset
    # ---------------------
    filein = 'test.nc'
    ncin = Dataset(filein);
    xin = ncin.variables['lon'][:]
    yin = ncin.variables['lat'][:]
    timein = ncin.variables['time'][:]
    tair = ncin.variables['t2'][:];
    slp  = ncin.variables['slp'][:];
    ncin.close();
    # -------------------------
    tairseries = np.reshape(tair,(np.size(tair),1));
    slpseries =  np.reshape(slp,(np.size(slp),1));
    # --------------------------
    ## if you want characters:
    #tairseries = np.array([str(val) for val in tairseries]);
    #slpseries = np.array([str(val) for val in slpseries]);
    # --------------------------
    rowdata = np.concatenate((tairseries,slpseries),axis=1);
    # if you want characters, do this in the end:
    #row_asstrings = [[str(vv) for vv in val] for val in rowdata]
    # ---------------------------
    # =========================================================
    nx = np.size(xin);ny = np.size(yin);ntime = np.size(timein);
    xm,ym = np.meshgrid(xin,yin);
    xmt = np.tile(xm,(ntime,1,1));ymt = np.tile(ym,(ntime,1,1))
    timem = np.tile(timein[:,np.newaxis,np.newaxis],(1,ny,nx));
    xvec = np.reshape(xmt,(np.size(tair),1));yvec = np.reshape(ymt,(np.size(tair),1));timevec = np.reshape(timem,(np.size(tair),1)); # to make sure that array's size match, I am using the size of one of the variables
    rowdata = np.concatenate((xvec,yvec,timevec,tairseries,slpseries),axis=1);
    

    无论如何,对于可变大小 (744,150,150),向量化 2 个变量只需不到 2 秒。

    【讨论】:

    • 谢谢@msi_gerva,我会试试你的解决方案看看效果如何,我希望它对我有用
    猜你喜欢
    • 2022-07-25
    • 1970-01-01
    • 2021-11-06
    • 1970-01-01
    • 2019-07-24
    • 2017-01-16
    • 1970-01-01
    • 2019-12-26
    • 1970-01-01
    相关资源
    最近更新 更多