【问题标题】:Extracting data from netCDF file using python使用python从netCDF文件中提取数据
【发布时间】:2020-06-25 20:18:00
【问题描述】:

不幸的是我对python很陌生,目前没有时间深入挖掘,所以我无法理解和解决python控制台的错误显示。我正在尝试使用此代码从多个位置的多个 netCDF 文件中提取数据:

#this is for reading the .nc in the working folder
import glob
#this is reaquired ti read the netCDF4 data
from netCDF4 import Dataset 
#required to read and write the csv files
import pandas as pd
#required for using the array functions
import numpy as np


# Record all the years of the netCDF files into a Python list
all_years = []

for file in glob.glob('*.nc'):
    print(file)
    #reading the files
    data = Dataset(file, 'r')
    #saving the data variable time
    time = data.variables['time']
    #saving the year which is written in the file
    year = time.units[11:15]
    #once we have acquired the data for one year then it will combine it for all the years as we are using for loop here
    all_years.append(year)

# Creating an empty Pandas DataFrame covering the whole range of data and then we will read the required data and put it here
year_start = min(all_years) 
end_year = max(all_years)
date_range = pd.date_range(start = str(year_start) + '-01-01', 
                           end = str(end_year) + '-12-31', 
                           freq = 'D')

#an empty having 0.0 values dataframe will be created with two columns date_range and temperature
df = pd.DataFrame(0.0, columns = ['Precipitation'], index = date_range)
    

# Defining the names, lat, lon for the locations of your interest into a csv file
#this will read the file locations
locations = pd.read_csv('stations_locations.csv')

#we would use a for loop as we are interested in aquiring all the information one by one from the rows
for index, row in locations.iterrows():
    # one by one we will extract the information from the csv and put it into temp. variables
    location = row['names']
    location_lat = row['latitude']
    location_lon = row['longitude']

# Sorting the all_years just to be sure that model writes the data correctly
    all_years.sort()
    
    
    #now we will read the netCDF file and here I have used netCDF file from FGOALS model
    for yr in all_years:
        # Reading-in the data 
        data = Dataset('pr_day_CNRM-CM5_historical_r1i1p1_%s0101-%s1231.nc'%(yr,yr), 'r')
    
        # Storing the lat and lon data of the netCDF file into variables 
        lat = data.variables['lat'][:]
        lon = data.variables['lon'][:]
        
        #as we already have the co-ordinates of the point which needs to be downloaded
        #in order to find the closest point around it we need to substract the cordinates
        #and check which ever has the minimun distance
        # Squared difference between the specified lat,lon and the lat,lon of the netCDF 
        sq_diff_lat = (lat - location_lat)**2 
        sq_diff_lon = (lon - location_lon)**2
    
        # Identify the index of the min value for lat and lon
        min_index_lat = sq_diff_lat.argmin()
        min_index_lon = sq_diff_lon.argmin()
    
        # Accessing the average temparature data
        temp = data.variables['pr']
    
        # Creating the date range for each year during each iteration
        start = str(yr) + '-01-01'
        end = str(yr) + '-12-31'
        d_range = pd.date_range(start = start, 
                            end = end, 
                            freq = 'D')
    
        for t_index in np.arange(0, len(d_range)):
            print('Recording the value for: ' + str(location)+'_'+ str(d_range[t_index]))
            df.loc[d_range[t_index]]['Temparature'] = temp[t_index, min_index_lat, min_index_lon]

    df.to_csv(str(location) + '.csv')    

这是显示的错误代码:

File "G:\Selection Cannon\Historical\CNRM-CM5_r1i1p1\pr\extracting data_CNRM-CM5_pr.py", line 62, in <module>
    data = Dataset('pr_day_CNRM-CM5_historical_r1i1p1_%s0101-%s1231.nc'%(yr,yr), 'r')

  File "netCDF4\_netCDF4.pyx", line 2321, in netCDF4._netCDF4.Dataset.__init__

  File "netCDF4\_netCDF4.pyx", line 1885, in netCDF4._netCDF4._ensure_nc_success

FileNotFoundError: [Errno 2] No such file or directory: b'pr_day_CNRM-CM5_historical_r1i1p1_18500101-18501231.nc'

当我检查变量/函数“time.units”时,它显示“自 1850-1-1 以来的天数”,但我的文件夹中只有 1975-2005 年的文件。如果我检查“all_years”,它只会显示“1850”七次。我认为这与“year = time.units[11:15]”行有关,但 youtube 视频中的人就是这样做的。 有人可以帮我解决这个问题,以便这段代码提取 1975 年及以后的文件吗?

最好的问候, 亚历克斯

PS:这是我的第一篇文章,如果您需要任何补充信息和数据,请告诉我:)

【问题讨论】:

  • 编辑:我设法获得了一个 csv.file,但以“01-01-1850”开头并以“12-31-1850”结尾。通常它应该以“01-01-1979”开头并以“12-31-2005”结尾,但我无法让 python 从文件顺序中的文件中提取 1975-2005 年。我有可能为 python 提供从 1850 年开始的整个数据集,以便使用我想要使用的方法?
  • 我试过你的脚本,有一个错误ValueError: could not convert string to Timestamp 与此部分相关:date_range = pd.date_range(start = str(year_start) + '-01-01', end = str(end_year) + '-12-31', freq = 'D')。你解决了吗?

标签: python pandas netcdf


【解决方案1】:

首先,您似乎没有给出正确的路径。应该类似于“G:/path/to/pr_day_CNRM-CM5_historical_r1i1p1_18500101-18501231.nc”。

【讨论】:

  • 感谢您的回复!它似乎在没有给出路径的情况下“工作”,因为脚本与我要提取的文件位于同一个文件夹中。但是我仍然无法让它从 1979 年开始。
  • @BierBaron 这一直有效,直到出现这个错误(d_range)): 30 print('记录值:' + str(location)+'_'+ str(d_range[t_index])) ---> 31 df.loc[d_range[t_index]]['Temparature '] = temp[t_index, min_index_lat, min_index_lon] 32 33 df.to_csv(str(location) + '.csv') netCDF4_netCDF4.pyx in netCDF4._netCDF4.Variable.__getitem__() 索引错误:索引超出维度范围```
猜你喜欢
  • 2021-08-02
  • 1970-01-01
  • 1970-01-01
  • 2022-07-16
  • 2023-03-31
  • 2018-01-16
  • 2021-03-23
  • 1970-01-01
  • 2023-03-07
相关资源
最近更新 更多