【问题标题】:Get values from current till last column values in pandas groupby从 pandas groupby 中的当前列值到最后一列值获取值
【发布时间】:2023-01-20 18:33:01
【问题描述】:

图片如下熊猫数据框:

+----+------+-------+
| ID | Name | Value |
+----+------+-------+
| 1  | John | 1     |
+----+------+-------+
| 1  | John | 4     |
+----+------+-------+
| 1  | John | 10    |
+----+------+-------+
| 1  | John | 50    |
+----+------+-------+
| 1  | Adam | 6     |
+----+------+-------+
| 1  | Adam | 3     |
+----+------+-------+
| 2  | Jen  | 9     |
+----+------+-------+
| 2  | Jen  | 6     |
+----+------+-------+

我想应用 groupby 函数并创建一个新列,它将 Value 值存储为从当前到最后一个 groupby 值的列表。

像那样:

+----+------+-------+----------------+
| ID | Name | Value | NewCol         |
+----+------+-------+----------------+
| 1  | John | 1     | [1, 4, 10, 50] |
+----+------+-------+----------------+
| 1  | John | 4     | [4, 10, 50]    |
+----+------+-------+----------------+
| 1  | John | 10    | [10, 50]       |
+----+------+-------+----------------+
| 1  | John | 50    | [50]           |
+----+------+-------+----------------+
| 1  | Adam | 6     | [6, 3]         |
+----+------+-------+----------------+
| 1  | Adam | 3     | [3]            |
+----+------+-------+----------------+
| 2  | Jen  | 9     | [9, 6]         |
+----+------+-------+----------------+
| 2  | Jen  | 6     | [9]            |
+----+------+-------+----------------+

这无论如何都可以使用 pandas groupby 函数吗?

【问题讨论】:

    标签: python pandas group-by


    【解决方案1】:

    GroupBy.transform 与自定义 lambda 函数一起使用:

    f = lambda x: [x.iloc[:len(x)-i].tolist() for i, y in enumerate(x)]
    df['new'] = df.groupby(['Name', 'ID'])['Value'].transform(f)
    print (df)
       ID  Name  Value             new
    0   1  John      1  [1, 4, 10, 50]
    1   1  John      4      [1, 4, 10]
    2   1  John     10          [1, 4]
    3   1  John     50             [1]
    4   1  Adam      6          [6, 3]
    5   1  Adam      3             [6]
    6   2   Jen      9          [9, 6]
    7   2   Jen      6             [9]
    

    或者:

    f = lambda x: [y[::-1].tolist() for y in x.expanding()]
    df['new'] = df.iloc[::-1].groupby(['Name', 'ID'])['Value'].transform(f)
    print (df)
       ID  Name  Value             new
    0   1  John      1  [1, 4, 10, 50]
    1   1  John      4     [4, 10, 50]
    2   1  John     10        [10, 50]
    3   1  John     50            [50]
    4   1  Adam      6          [6, 3]
    5   1  Adam      3             [3]
    6   2   Jen      9          [9, 6]
    7   2   Jen      6             [6]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2010-11-05
      相关资源
      最近更新 更多