是否有查找一个 netCDF 文件的最后一个时间戳和下一个 netCDF 文件的第一个时间戳之间差异的功能？答案

【问题标题】：Is there a function for finding the differences between the last time stamp of one netCDF file and the first time stamp of the next netCDF file?是否有查找一个 netCDF 文件的最后一个时间戳和下一个 netCDF 文件的第一个时间戳之间差异的功能？
【发布时间】：2021-05-03 02:49:08
【问题描述】：

我有一个 netCDF 文件列表。我在 xarray 中打开了每个 netCDF 文件，如下所示：

files = ['file_1.nc', 'file_2.nc', 'file_3.nc', 'file_4.nc']

for file in files:
    xarray_object = xr.open_dataset(file)

接下来我想从 file_1.nc 中获取最后一个时间戳，并从 file_2.nc 中的第一个时间戳中减去它，并在整个文件列表中继续这种模式（所以 file_2.nc[first time stamp] - file_1 .nc[最后一个时间戳]、file_3.nc[第一个时间戳] - file_2.nc[最后一个时间戳]，以此类推）。

我开始通过以下方式解决这个问题：

    time_diff = xarray_object['time'][-1] - xarray_object['time'][0]

但这只是从file_1.nc的第一个时间戳中减去最后一个时间戳，然后从file_2.nc的第一个时间戳中减去最后一个时间戳，以此类推。

我不确定让循环同时查看两个单独文件的时间戳的最佳方法。

任何帮助将不胜感激！

【问题讨论】：

标签： python numpy netcdf python-xarray

【解决方案1】：

我认为您主要是在寻找zip？ https://docs.python.org/3.9/library/functions.html#zip

# glob is convenient for getting multiple paths with a wildcard e.g.
import glob
import xarray as xr

paths = sorted(glob.glob("*.nc"))
datasets = [xr.open_dataset(path) for path in paths]

# Get the first time for the second dataset onward
first_times = [ds["time"][0] for ds in datasets[1:]]

# Get the last time, except for the last dataset
last_times [ds["time"][-1] for ds in datasets[:-1]]

# Use zip to go through both lists at once
time_deltas = [t1 - t0 for t0, t1 in zip(first_times, last_times)]

附言

使用 glob，您可能想要检查排序是否按预期进行，您始终可以使用带有函数的 key 参数来确保您是例如按文件名中的数字排序：

unsorted = ["file_02", "file_1"]
print(sorted(unsorted))

更稳健的是，我们可以在下划线处拆分，并将下划线之后的任何内容转换为整数：

print(sorted(unsorted, key=lambda x: int(x.split("_")[-1]))

当然，如果您只是自己提供列表，那么无论如何您都可以完全控制...

【讨论】：