如何在不使用 for 的情况下读取大型 NetCDF 数据集 - Python答案

【问题标题】：How to read large NetCDF data sets without using a for - Python如何在不使用 for 的情况下读取大型 NetCDF 数据集 - Python
【发布时间】：2021-04-15 06:38:38
【问题描述】：

早上好，我在python中读取一个包含气象信息的大型netCDF文件时遇到问题，该信息必须通过它来组装信息然后将其插入数据库，但是它需要花费时间并且组装信息太多了，我知道必须有其他方法可以更有效地执行相同的过程，目前我通过代码下方的for循环访问信息

 content = nc.Dataset(pathFile+file)
 XLONG, XLAT = content.variables["XLONG"], content.variables["XLAT"]
 Times = content.variables["Times"]  #Horas formar b 'b
 RAINC  =  content.variables["RAINC"] #Lluvia
 Q2 = content.variables["Q2"] #Humedad especifica
 T2 = content.variables["T2"] #Temperatura
 U10 = content.variables["U10"] #Viento zonal
 V10 = content.variables["V10"] #Viento meridional
 SWDOWN = content.variables["SWDOWN"] #Radiacion incidente
 PSFC = content.variables["PSFC"] #Presion de la superficie
 SST = content.variables["SST"] #Temperatura de la superficie del mar
CLDFRA = content.variables["CLDFRA"] #Fraccion de nubes

 for c2 in range(len(XLONG[0])):
    for c3 in range(len(XLONG[0][c2])):
    position += 1  
    for hour in range(len(Times)):
        dateH = getDatetimeInit(dateFormatFile.hour) if hour == 0 else getDatetimeForHour(hour, dateFormatFile.hour)
        hourUTC = getHourUTC(hour)        

        RAINH = str(RAINC[hour][0][c2][c3])
        Q2H = str(Q2[hour][0][c2][c3])
        T2H = str(convertKelvinToCelsius(T2[hour][0][c2][c3]))
        U10H = str(U10[hour][0][c2][c3])
        V10H = str(V10[hour][0][c2][c3])
        SWDOWNH = str(SWDOWN[hour][0][c2][c3])
        PSFCH = str(PSFC[hour][0][c2][c3])
        SSTH = str(SST[hour][0][c2][c3])
        CLDFRAH = str(CLDFRA[hour][0][c2][c3] )


        rowData = [idRun, functions.IDMODEL, idTime, position, dateH.year, dateH.month, dateH.day, dateH.hour, RAINH, Q2H, T2H, U10H, V10H, SWDOWNH, PSFCH, SSTH, CLDFRAH]           
        dataProcess.append(rowData)

【问题讨论】：

标签： python python-3.x dataset netcdf4

【解决方案1】：

我会使用 NumPy。让我们假设您有带有 2 个变量“t2”和“slp”的 netCDF。然后你可以使用下面的代码来向量化你的数据：

#!//usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset
# ---------------------
filein = 'test.nc'
ncin = Dataset(filein);
tair = ncin.variables['t2'][:];
slp  = ncin.variables['slp'][:];
ncin.close();
# -------------------------
tairseries = np.reshape(tair,(np.size(tair),1));
slpseries =  np.reshape(slp,(np.size(slp),1));
# --------------------------
## if you want characters:
#tairseries = np.array([str(val) for val in tairseries]);
#slpseries = np.array([str(val) for val in slpseries]);
# --------------------------
rowdata = np.concatenate((tairseries,slpseries),axis=1);
# if you want characters, do this in the end:
row_asstrings = [[str(vv) for vv in val] for val in rowdata]
# ---------------------------

不过，我感觉使用字符串并不是一个好主意。在我的示例中，从数值数组到字符串的转换需要很长时间，因此在连接之前我没有实现它。

如果您还想要一些时间/位置信息，您可以这样做：

#!//usr/bin/env ipython
# ---------------------
import numpy as np
from netCDF4 import Dataset
# ---------------------
filein = 'test.nc'
ncin = Dataset(filein);
xin = ncin.variables['lon'][:]
yin = ncin.variables['lat'][:]
timein = ncin.variables['time'][:]
tair = ncin.variables['t2'][:];
slp  = ncin.variables['slp'][:];
ncin.close();
# -------------------------
tairseries = np.reshape(tair,(np.size(tair),1));
slpseries =  np.reshape(slp,(np.size(slp),1));
# --------------------------
## if you want characters:
#tairseries = np.array([str(val) for val in tairseries]);
#slpseries = np.array([str(val) for val in slpseries]);
# --------------------------
rowdata = np.concatenate((tairseries,slpseries),axis=1);
# if you want characters, do this in the end:
#row_asstrings = [[str(vv) for vv in val] for val in rowdata]
# ---------------------------
# =========================================================
nx = np.size(xin);ny = np.size(yin);ntime = np.size(timein);
xm,ym = np.meshgrid(xin,yin);
xmt = np.tile(xm,(ntime,1,1));ymt = np.tile(ym,(ntime,1,1))
timem = np.tile(timein[:,np.newaxis,np.newaxis],(1,ny,nx));
xvec = np.reshape(xmt,(np.size(tair),1));yvec = np.reshape(ymt,(np.size(tair),1));timevec = np.reshape(timem,(np.size(tair),1)); # to make sure that array's size match, I am using the size of one of the variables
rowdata = np.concatenate((xvec,yvec,timevec,tairseries,slpseries),axis=1);

无论如何，对于可变大小 (744,150,150)，向量化 2 个变量只需不到 2 秒。

【讨论】：

谢谢@msi_gerva，我会试试你的解决方案看看效果如何，我希望它对我有用