将计算的分组列重新分配给原始数据框答案

【问题标题】：Reassigning the calculated group-by column to the original dataframe将计算的分组列重新分配给原始数据框
【发布时间】：2021-03-10 10:31:31
【问题描述】：

希望我以正确的方式提出这个问题 - 感谢之前指出我错误的人。

我有一个带有价格的股票代码数据框 (dft)，例如：

            Date    Open    High    Low Close   Volume  AdjClose    StockCode
37563   2020-08-03  4.63    4.63    4.50    4.51    9602    4.51    ABA
38002   2020-08-04  4.52    4.54    4.51    4.51    4254    4.51    ABA
38374   2020-08-05  4.52    4.52    4.40    4.40    27307   4.40    ABA
38568   2020-08-06  4.41    4.58    4.41    4.58    3412    4.58    ABA
38772   2020-08-07  4.57    4.57    4.45    4.50    16260   4.50    ABA
... ... ... ... ... ... ... ... ...
77232   2021-02-15  11.06   12.76   11.06   12.66   27607862    12.66   Z1P
77632   2021-02-16  13.02   14.53   12.97   13.92   42833861    13.92   Z1P
77929   2021-02-17  13.65   13.66   11.27   11.97   29813500    11.97   Z1P
78103   2021-02-18  11.43   12.37   10.51   11.70   20602054    11.70   Z1P
78424   2021-02-19  12.10   12.59   11.87   12.35   14345435    12.35   Z1P
39741 rows × 8 columns

我正在尝试按股票代码计算技术指标，我在这里为 MA_14（移动平均线，14 个时间段）做的，即拆分为每个股票代码，然后应用移动平均计算：

dft.groupby(["StockCode"]).apply(lambda x: (ta.MA(x["Close"],timeperiod=14, matype=0)))

输出：

StockCode       
ABA        37563          NaN
           38002          NaN
           38374          NaN
           38568          NaN
           38772          NaN
                      ...    
Z1P        77232     9.058571
           77632     9.498571
           77929     9.832143
           78103    10.148571
           78424    10.484286
Length: 39741, dtype: float64

输出符合我的预期，它会返回与原始数据帧 dft 相同的行数。

现在我正在尝试将此 MA_14 分配回原始数据帧 (dft)。

我尝试过的：

转换 - 但收到以下错误消息

dft.groupby(["StockCode"]).transform.apply(lambda x: (ta.MA(x["Close"],timeperiod=14, matype=0)))

AttributeError: 'function' object has no attribute 'apply'

尝试使用 concat 直接进行行到行连接

grouped=dft.groupby(["StockCode"]).transform.apply(lambda x: (ta.MA(x["Close"],timeperiod=14, matype=0)))
concatenated = pd.concat([dft, grouped], axis=1)

它以某种方式提供了大约两倍的行数（dft = 39741 行，连接 = 79482） - 它与索引有关吗？

    Date    Open    High    Low Close   Volume  AdjClose    StockCode   0
37563   2020-08-03  4.63    4.63    4.50    4.51    9602.0  4.51    ABA NaN
38002   2020-08-04  4.52    4.54    4.51    4.51    4254.0  4.51    ABA NaN
38374   2020-08-05  4.52    4.52    4.40    4.40    27307.0 4.40    ABA NaN
38568   2020-08-06  4.41    4.58    4.41    4.58    3412.0  4.58    ABA NaN
38772   2020-08-07  4.57    4.57    4.45    4.50    16260.0 4.50    ABA NaN
... ... ... ... ... ... ... ... ... ...
(Z1P, 77232)    NaN NaN NaN NaN NaN NaN NaN NaN 9.058571
(Z1P, 77632)    NaN NaN NaN NaN NaN NaN NaN NaN 9.498571
(Z1P, 77929)    NaN NaN NaN NaN NaN NaN NaN NaN 9.832143
(Z1P, 78103)    NaN NaN NaN NaN NaN NaN NaN NaN 10.148571
(Z1P, 78424)    NaN NaN NaN NaN NaN NaN NaN NaN 10.484286
79482 rows × 9 columns

尝试简单地分配回 dft，但也收到错误消息：

dft['test'] = (dft.groupby(["StockCode"]).apply(lambda x: (ta.MA(x["Close"],timeperiod=14, matype=0))))

TypeError: incompatible index of inserted column with frame index

如何对齐 'grouped' 和 'dft' 的索引，以便正确执行连接？

我也想过使用 StockCode 加入，但这不正确，因为它会导致 DFT 中的每一行被加入到分组中的 70K 行中。有没有办法将 StockCode 和 Date 都保留在“分组”中？

提前感谢您提供有关如何执行此操作的任何建议。我已经在 StackOverFlow 上搜索了一些主题，但似乎找不到适用于此的解决方案（可能没有使用正确的关键字），如果有的话，请务必将我指向相关帖子。

【问题讨论】：

标签： python pandas group-by pandas-groupby data-manipulation

【解决方案1】：

您可以对每个组中的新列进行分配，如下所示。主要位是.apply(lambda g: g.assign(...))，它为每个组分配正确的值g。注意我没有 ta.MA 包，所以我使用标准的 Pandas 滚动功能，我还设置了 min_periods = 1，所以在这个例子中我们没有得到 NaN。

(df.reset_index()
   .groupby("StockCode",as_index = False)
   .apply(lambda g : g.assign(test = g['Close'].rolling(window = 14, min_periods = 1).mean()))
   .set_index('index')
)

你得到

  index  Date          Open    High    Low    Close    Volume    AdjClose  StockCode        test
-------  ----------  ------  ------  -----  -------  --------  ----------  -----------  --------
  37563  2020-08-03    4.63    4.63   4.5      4.51      9602        4.51  ABA           4.51
  38002  2020-08-04    4.52    4.54   4.51     4.51      4254        4.51  ABA           4.51
  38374  2020-08-05    4.52    4.52   4.4      4.4      27307        4.4   ABA           4.47333
  38568  2020-08-06    4.41    4.58   4.41     4.58      3412        4.58  ABA           4.5
  38772  2020-08-07    4.57    4.57   4.45     4.5      16260        4.5   ABA           4.5
  77232  2021-02-15   11.06   12.76  11.06    12.66  27607862       12.66  Z1P          12.66
  77632  2021-02-16   13.02   14.53  12.97    13.92  42833861       13.92  Z1P          13.29
  77929  2021-02-17   13.65   13.66  11.27    11.97  29813500       11.97  Z1P          12.85
  78103  2021-02-18   11.43   12.37  10.51    11.7   20602054       11.7   Z1P          12.5625
  78424  2021-02-19   12.1    12.59  11.87    12.35  14345435       12.35  Z1P          12.52

【讨论】：