Python Pandas - 将数值均匀分布到最近的行答案

【问题标题】：Python Pandas - Evenly distribute numeric values to nearest rowsPython Pandas - 将数值均匀分布到最近的行
【发布时间】：2018-10-18 19:02:42
【问题描述】：

假设我有一个像这样的数据集：

> NaN NaN NaN 12 NaN NaN NaN NaN 10 NaN NaN NaN NaN 8 NaN 6 NaN

我想在它们周围的NaNs 的值之间尽可能均匀地分配这些值。例如值 12 应该考虑到它们周围的NaNs，并均匀分布它们，直到它触及第二个非NaN 值的NaNs。

例如，1st 12 应该只考虑他最接近的 NaN。

> NaN NaN NaN 12 NaN NaN

输出应该是：

2 2 2 2 2 (Distributed by the 12)

2 2 2 2 2 (Distributed by the 10)

2 2 2 2 (Distributed by the 8)

2 2 2 (Distributed by the 6)

> NaN NaN NaN 12 NaN NaN NaN NaN 10 NaN NaN NaN NaN 8 NaN 6 NaN

> 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

我最初是在考虑使用平滑器，例如 Pandas 中的插值函数。它不一定是无损的，这意味着我们可以失去或得到超过进度的总和。与使用有损平滑器相比，是否有任何库可以执行这种分布？

【问题讨论】：

距离相同时会发生什么？例如如果最后是7 和6，你会如何分配这些值？
那你不分发7，而是分发6。
“使用算法的方法”是什么意思？您更喜欢自己编写流程而不是使用包，是这样吗？
我认为这是一个不好的说法。让我删除它。
是的，你是对的，应该是 2 2 3 3。

标签： python pandas numpy dataframe scipy

【解决方案1】：

您可以使用interpolate(method='nearest')、ffill() 和bfill()，最后是groupby()。

短版：

>> series = pd.Series(x).interpolate(method='nearest').ffill().bfill()
>> series.groupby(series).apply(lambda k: k/len(k))

[2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 3.0, 3.0]

为了说明正在发生的事情，请创建您的 df

df = pd.DataFrame()
df["x"] = x

x 是您提供的系列。现在：

>>> df["inter"] = df.x.interpolate(method='nearest').ffill().bfill()
>>> df["inter"] = df.groupby("inter").inter.apply(lambda k: k/len(k))

>>> df

    x     inter
0   NaN   2.0
1   NaN   2.0
2   NaN   2.0
3   12.0  2.0
4   NaN   2.0
5   NaN   2.0
6   NaN   2.0
7   NaN   2.0
8   10.0  2.0
9   NaN   2.0
10  NaN   2.0
11  NaN   2.0
12  NaN   2.0
13  8.0   2.0
14  NaN   2.0
15  6.0   3.0
16  NaN   3.0

【讨论】：