【问题标题】:xarray - Use groupby to group by every day over a year's climatological hourly netCDF dataxarray - 使用 groupby 对一年中的每一天的气候每小时 netCDF 数据进行分组
【发布时间】:2020-08-11 19:46:04
【问题描述】:

我有一个地理范围超过一年的每小时 netCDF 气候数据,例如从2017-01-01T00:00:002017-12-31T23:00:00

<xarray.Dataset>
Dimensions:    (latitude: 106, longitude: 193, time: 8760)
Coordinates:
  * latitude   (latitude) float32 -39.2 -39.149525 ... -33.950478 -33.9
  * longitude  (longitude) float32 140.8 140.84792 140.89584 ... 149.95209 150.0
  * time       (time) datetime64[ns] 2017-01-01 ... 2017-12-31T23:00:00
Data variables:
    T_SFC      (time, latitude, longitude) float32 dask.array<shape=(8760, 106, 193), chunksize=(744, 106, 193)>
Attributes:
    creationTime:        1525708833
    creationTimeString:  Mon May  7 09:00:32 PDT 2018
    Conventions:         COARDS

正如它所说的那样,数据具有三个坐标(lat、lng 和 time)和一个变量是每小时温度。

我的代码:

import xarray as xr
mds_temp_path = '../Archive/*/IDV71000_VIC_T_SFC.nc'    # netCDF
mds_temp = xr.open_mfdataset(mds_temp_path)    # open netCDF and read into a dataset object

print(mds_temp.groupby('time.dayofyear').mean('time'))

我得到了什么:

<xarray.Dataset>
Dimensions:    (dayofyear: 365, latitude: 106, longitude: 193)
Coordinates:
  * latitude   (latitude) float32 -39.2 -39.149525 ... -33.950478 -33.9
  * longitude  (longitude) float32 140.8 140.84792 140.89584 ... 149.95209 150.0
  * dayofyear  (dayofyear) int64 1 2 3 4 5 6 7 8 ... 359 360 361 362 363 364 365
Data variables:
    T_SFC   (dayofyear, latitude, longitude) float64 dask.array<shape=(365, 106, 193), chunksize=(1, 106, 193)>

我希望能够获得每天的平均温度值,例如生成的数据集中的时间坐标是“2017-01-01”、“2017-01-02”、“2017-01-03”、……、“2017-12-31”,而不是 1 , 2, 3, ... ..., 365。

【问题讨论】:

    标签: python pandas netcdf python-xarray


    【解决方案1】:

    您应该使用resample 方法而不是groupby

    mds_temp.resample(time='1D').mean()
    

    这些概念在文档的时间序列数据部分中有更全面的描述:http://xarray.pydata.org/en/stable/time-series.html#resampling-and-grouped-operations

    【讨论】:

      【解决方案2】:

      使用@jhamman 的答案很有用,但如果在您的测量之间缺少一天,resample(...) 方法将填补它,请参阅此示例:

      # create a dataset with data on days 1 & 3
      t = ['2000-01-01T00:00:00.000000000', '2000-01-01T01:00:00.000000000', '2000-01-01T02:00:00.000000000', '2000-01-03T00:00:00.000000000', '2000-01-03T01:00:00.000000000', '2000-01-03T02:00:00.000000000']
      t = pd.to_datetime(t)
      ds = xr.Dataset({"foo": ("time", np.arange(len(t))), "time": t})
      
      # reduce to days (but with interpolation on day 2!)
      ds = ds.resample(time='1D').mean()
      print(ds.time)
      
      <xarray.DataArray 'time' (time: 3)>
      array(['2000-01-01T00:00:00.000000000', '2000-01-02T00:00:00.000000000',
             '2000-01-03T00:00:00.000000000'], dtype='datetime64[ns]')
      Coordinates:
        * time     (time) datetime64[ns] 2000-01-01 2000-01-02 2000-01-03
      

      在我的情况下,我不希望这种行为,缺失的天数必须在按天聚合后仍然缺失,所以我使用这种方法:

      # set all dates to have time at 00h so multiple measurements in a day have the same label
      ds.coords['time'] = ds.time.dt.floor('1D')
      
      # group by 'date' using an average (mean)
      ds = ds.groupby('time').mean()
      
      <xarray.Dataset>
      Dimensions:  (time: 2)
      Coordinates:
        * time     (time) datetime64[ns] 2000-01-01 2000-01-03
      Data variables:
          foo      (time) float64 1.0 4.0
      
      print(ds)
      
      # set all dates to have time at 00h so multiple measurements in a day have the same label
      ds.coords['time'] = ds.time.dt.floor('1D')
      
      # group by 'date' using an average (mean)
      ds = ds.groupby('time').mean()
      
      print(ds)
      <xarray.Dataset>
      Dimensions:  (time: 2)
      Coordinates:
        * time     (time) datetime64[ns] 2000-01-01 2000-01-03
      Data variables:
          foo      (time) float64 1.0 4.0
      

      我希望它可以对某人有所帮助:)

      【讨论】:

        猜你喜欢
        • 2019-07-16
        • 1970-01-01
        • 1970-01-01
        • 2021-07-25
        • 2020-05-14
        • 1970-01-01
        • 2020-10-13
        • 2017-07-31
        • 1970-01-01
        相关资源
        最近更新 更多