【发布时间】:2019-12-16 13:47:47
【问题描述】:
为了清楚起见,我已经大幅改写了我之前的问题。根据 Ryan 对单独频道的建议,numpy.digitize 看起来是实现我目标的正确工具。
我有一个形状为 x、y 和时间的 xarray.DataArray。我试图弄清楚我应该为apply_ufunc 函数的“input_core_dims”和“output_core_dims”参数提供哪些值,以便将numpy.digitize 应用于时间序列中的每个图像。
直观地说,我希望输出维度为 ['time', 'x', 'y']。我认为输入核心维度应该是x 和y,因为我想沿时间维度广播numpy.digitize 函数。但是,这不起作用。通过将 numpy.digitize 应用于我的时间序列中的第一个 numpy 数组,我得到了正确的结果:
[84]
blues
<xarray.DataArray 'reflectance' (time: 44, y: 1082, x: 1084)>
dask.array<shape=(44, 1082, 1084), dtype=uint16, chunksize=(44, 1082, 1084)>
Coordinates:
band int64 1
* y (y) float64 9.705e+05 9.705e+05 9.705e+05 ... 9.673e+05 9.672e+05
* x (x) float64 4.889e+05 4.889e+05 4.889e+05 ... 4.922e+05 4.922e+05
* time (time) datetime64[ns] 2018-10-12 2018-10-16 ... 2019-05-26
Attributes:
transform: (3.0, 0.0, 488907.0, 0.0, -3.0, 970494.0)
crs: +init=epsg:32630
res: (3.0, 3.0)
is_tiled: 1
nodatavals: (1.0, 1.0, 1.0, 1.0)
scales: (1.0, 1.0, 1.0, 1.0)
offsets: (0.0, 0.0, 0.0, 0.0)
[79]
#correct result
np.digitize(np.array(blues[0]), bin_arr)
array([[14, 15, 15, ..., 16, 17, 16],
[14, 13, 14, ..., 16, 16, 15],
[15, 14, 15, ..., 16, 16, 15],
...,
[16, 18, 18, ..., 15, 16, 15],
[17, 18, 18, ..., 16, 17, 16],
[17, 17, 17, ..., 17, 18, 17]])
但是我对apply_ufunc的理解是不正确的。将 input_core_dims 更改为 [['x','y']] 或 ['time'] 不会产生正确的数字化结果
bin_arr = np.linspace(configs.rmin, configs.rmax, 50)
blues = t_series['reflectance'].sel(band=1).chunk({'time':-1})
result = xr.apply_ufunc(partial(np.digitize, bins=bin_arr), blues, input_core_dims=[['time']], dask="parallelized", output_dtypes=[blues.dtype])
#wrong values, correct shape
np.array(result)[0]
array([[14, 16, 15, ..., 48, 18, 15],
[15, 16, 16, ..., 49, 18, 15],
[15, 16, 16, ..., 49, 18, 14],
...,
[16, 21, 17, ..., 50, 19, 15],
[17, 21, 17, ..., 50, 19, 16],
[16, 21, 18, ..., 50, 20, 17]])
bin_arr = np.linspace(configs.rmin, configs.rmax, 50)
blues = t_series['reflectance'].sel(band=1).chunk({'time':-1})
result = xr.apply_ufunc(partial(np.digitize, bins=bin_arr), blues, input_core_dims=[['x','y']], dask="parallelized", output_dtypes=[blues.dtype])
#wrong values, correct shape
np.array(result)[0]
array([[14, 14, 15, ..., 16, 17, 17],
[15, 13, 14, ..., 18, 18, 17],
[15, 14, 15, ..., 18, 18, 17],
...,
[16, 16, 16, ..., 15, 16, 17],
[17, 16, 16, ..., 16, 17, 18],
[16, 15, 15, ..., 15, 16, 17]])
这些结果中的每一个都具有正确的形状但值错误,这意味着将数字化功能应用于错误的轴,并且将结果重新调整为输入的形状。
同样奇怪的是apply_ufunc 的结果在显示为 xarray 时会丢弃 input_core_dim。但在内部,当您将其转换为 numpy 数组时,维度仍然存在
[85]
result
<xarray.DataArray 'reflectance' (y: 1082, x: 1084)>
dask.array<shape=(1082, 1084), dtype=uint16, chunksize=(1082, 1084)>
Coordinates:
band int64 1
* y (y) float64 9.705e+05 9.705e+05 9.705e+05 ... 9.673e+05 9.672e+05
* x (x) float64 4.889e+05 4.889e+05 4.889e+05 ... 4.922e+05 4.922e+05
[87]
# the shape of the xarray and numpy array do not match after apply_ufunc
np.array(result).shape
(1082, 1084, 44)
此外,当我尝试将 output_core_dims 参数指定为 [['time', 'x', 'y']] 来纠正此问题时,我收到一个错误,看起来您不能同时将维度作为输入核心维度和输出核心维度
[67]
bin_arr = np.linspace(configs.rmin, configs.rmax, 50)
blues = t_series['reflectance'].sel(band=1).chunk({'time':-1})
result = xr.apply_ufunc(partial(np.digitize, bins=bin_arr), blues, input_core_dims=[['time']], output_core_dims=[['time','x','y']], dask="parallelized", output_dtypes=[blues.dtype])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
in
5 bin_arr = np.linspace(configs.rmin, configs.rmax, 50)
6 blues = t_series['reflectance'].sel(band=1).chunk({'time':-1})
----> 7 result = xr.apply_ufunc(partial(np.digitize, bins=bin_arr), blues, input_core_dims=[['time']], output_core_dims=[['time','x','y']], dask="parallelized", output_dtypes=[blues.dtype])
~/miniconda3/envs/pyatsa/lib/python3.7/site-packages/xarray/core/computation.py in apply_ufunc(func, input_core_dims, output_core_dims, exclude_dims, vectorize, join, dataset_join, dataset_fill_value, keep_attrs, kwargs, dask, output_dtypes, output_sizes, *args)
967 join=join,
968 exclude_dims=exclude_dims,
--> 969 keep_attrs=keep_attrs)
970 elif any(isinstance(a, Variable) for a in args):
971 return variables_vfunc(*args)
~/miniconda3/envs/pyatsa/lib/python3.7/site-packages/xarray/core/computation.py in apply_dataarray_vfunc(func, signature, join, exclude_dims, keep_attrs, *args)
215
216 data_vars = [getattr(a, 'variable', a) for a in args]
--> 217 result_var = func(*data_vars)
218
219 if signature.num_outputs > 1:
~/miniconda3/envs/pyatsa/lib/python3.7/site-packages/xarray/core/computation.py in apply_variable_ufunc(func, signature, exclude_dims, dask, output_dtypes, output_sizes, keep_attrs, *args)
539 if isinstance(arg, Variable)
540 else arg
--> 541 for arg, core_dims in zip(args, signature.input_core_dims)]
542
543 if any(isinstance(array, dask_array_type) for array in input_data):
~/miniconda3/envs/pyatsa/lib/python3.7/site-packages/xarray/core/computation.py in (.0)
539 if isinstance(arg, Variable)
540 else arg
--> 541 for arg, core_dims in zip(args, signature.input_core_dims)]
542
543 if any(isinstance(array, dask_array_type) for array in input_data):
~/miniconda3/envs/pyatsa/lib/python3.7/site-packages/xarray/core/computation.py in broadcast_compat_data(variable, broadcast_dims, core_dims)
493 'dimensions %r on an input variable: these are core '
494 'dimensions on other input or output variables'
--> 495 % unexpected_dims)
496
497 # for consistency with numpy, keep broadcast dimensions to the left
ValueError: operand to apply_ufunc encountered unexpected dimensions ['y', 'x'] on an input variable: these are core dimensions on other input or output variables
非常感谢任何帮助,我想了解我是如何滥用 input_core_dim 和 output_core_dim 参数的。
【问题讨论】:
-
您能否提供一个玩具示例,其中包含我们可以复制粘贴到笔记本中并进行实验的相同假数据?同时提供预期的输出。
标签: image-processing time-series python-xarray