最高滚动总和熊猫的第二行值答案

【问题标题】：Secondary row value of highest rolling sums pandas最高滚动总和熊猫的第二行值
【发布时间】：2022-01-06 23:05:06
【问题描述】：

我试图根据不同行的累积总和来获取一行的最大值。我的数据框如下所示：

df = pd.DataFrame({'constant': ['a', 'b', 'b', 'c', 'c', 'd', 'a'], 'value': [1, 3, 1, 5, 1, 9, 2]})

indx  constant  value
0        a        1
1        b        3
2        b        1
3        c        5
4        c        1
5        d        9
6        a        2

我正在尝试添加一个新字段，其中 constant 在数据框中具有最高的 value 累积总和。最终的数据框如下所示：

indx constant   value   new_field
0      a          1         NaN
1      b          3          a
2      b          1          b
3      c          5          b
4      c          1          c
5      d          9          c
6      a          2          d

如您所见，在索引 1 处，a 的所有先前行的最大累积总和为 value。在索引 2 处，b 的所有先前行的最大累积总和为 value，依此类推。

谁有解决办法？

【问题讨论】：

我觉得这只是一个转变
我尝试过使用 shift，但似乎仍然无法获取每行的最高累积 constant 值。我的初始/输出 dfs 设置方式可能令人困惑，输出与 df.constant.shift() 对齐的事实是巧合

标签： python pandas dataframe cumulative-sum

【解决方案1】：

如前所述，您只需要换班。但是，请尝试以下其他情况。

步骤求累积最大值

累积最大值等于df['value']的地方，复制'constant'，否则设为NaN

NaN 应该留机会广播对应于最大值的常量

结果

df=df.assign(new_field=(np.where(df['value']==df['value'].cummax(), df['constant'], np.nan))).ffill()
df=df.assign(new_field=df['new_field'].shift())



   constant  value new_field
0        a      1       NaN
1        b      3         a
2        b      1         b
3        c      5         b
4        c      1         c
5        d      9         c
6        a      2         d

【讨论】：

这可能会出错，例如：df = pd.DataFrame({'constant': ['a', 'b', 'b', 'c', 'c', 'd', 'a','a'], 'value': [1, 3, -1, -5, 4, 9, 2,3]})

【解决方案2】：

我认为您应该尝试将其作为数据透视表来处理，这样您就可以在列轴上使用np.argmax。

# this will count cummulative occurences over the ix for each value of `constant`
X = df.pivot_table(
    index=df.index,
    columns=['constant'],
    values='value'
).fillna(0.0).cumsum(axis=0)

# now you get a list of ixs that max the cummulative value over the column axis - i.e., the "winner"
colix = np.argmax(X.values, axis=1)

# you can fetch corresponding column names using this argmax index
df['winner'] = np.r_[[np.nan], X.columns[colix].values[:-1]]

# and there you go
df

constant    value   winner
0   a   1   NaN
1   b   3   a
2   b   1   b
3   c   5   b
4   c   1   c
5   d   9   c
6   a   2   d

【讨论】：

我认为你应该接受@wwnde 的回答。更干净，更蟒蛇。

【解决方案3】：

您应该更加小心（因为值可以是减少 cumsum 的负值），这是您可能需要做的，

df["cumsum"] = df["value"].cumsum()
df["cummax"] = df["cumsum"].cummax()
df["new"] = np.where(df["cumsum"] == df["cummax"], df['constant'], np.nan)
df["new"] = df.ffill()["new"].shift()
df

【讨论】：