【发布时间】:2021-11-04 07:54:08
【问题描述】:
我想使用xarray 功能通过跨命名维度的自定义/外部函数减少数据集。
创建数据集来演示问题
import xarray as xr
import numpy as np
import pandas as pd
time = pd.date_range("2000-01-01", "2001-01-01", freq="D")
sids = np.arange(4)
obs = np.random.random(size=(len(time), len(sids)))
sim = np.random.random(size=(len(time), len(sids)))
original = xr.Dataset({"obs": (("time", "station_id"), obs), "sim": (("time", "station_id"), sim)}, coords={"time": time, "station_id": sids})
我想使用原始的两个变量计算mean_squared_error,通过折叠"time" 维度来计算指标。这应该返回一个xr.Dataset,如下所示:
<xarray.Dataset>
Dimensions: (station_id: 4)
Coordinates:
* station_id (station_id) int64 0 1 2 3
Data variables:
mean_squared_error (station_id) float64 0.4411 0.183 0.06754 0.9662
我尝试过使用reduce 函数
from sklearn.metrics import mean_squared_error
original.reduce(mean_squared_error, dim="time")
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-243-51111f05437b> in <module>
----> 1 original.reduce(mean_squared_error, dim="time")
~/miniconda3/envs/ml/lib/python3.8/site-packages/xarray/core/dataset.py in reduce(self, func, dim, keep_attrs, keepdims, numeric_only, **kwargs)
4915 # the former is often more efficient
4916 reduce_dims = None # type: ignore[assignment]
-> 4917 variables[name] = var.reduce(
4918 func,
4919 dim=reduce_dims,
~/miniconda3/envs/ml/lib/python3.8/site-packages/xarray/core/variable.py in reduce(self, func, dim, axis, keep_attrs, keepdims, **kwargs)
1721 )
1722 if axis is not None:
-> 1723 data = func(self.data, axis=axis, **kwargs)
1724 else:
1725 data = func(self.data, **kwargs)
~/miniconda3/envs/ml/lib/python3.8/site-packages/sklearn/utils/validation.py in inner_f(*args, **kwargs)
70 FutureWarning)
71 kwargs.update({k: arg for k, arg in zip(sig.parameters, args)})
---> 72 return f(**kwargs)
73 return inner_f
74
TypeError: mean_squared_error() got an unexpected keyword argument 'axis'
【问题讨论】:
-
记住,如果你找到了,请接受答案!所以后来的人会过得更轻松:)
-
感谢@Saverio!感谢您的评论和回复!我应该更清楚地指出 RMSE 是一个示例函数。理想情况下,我希望能够从 xarray 中应用许多 sklearn 指标,而不是像@cyril 的回答那样使用外部包或进行 xarray 算术。 IE。使用
.apply或.reduce。我也不喜欢我自己的解决方案! -
好的!您是否已经尝试将这个问题发布到 Xarray 的 GitHub 讨论中?我确信 xarray 的社区在那里更加活跃。如果您在此处发布并找到答案,请在此处重新发布以备将来使用。
标签: python python-xarray