【发布时间】:2018-12-26 21:48:36
【问题描述】:
我有带有 col1, col2, col3 列的 DataFrame。我想为col1 中的每个组分别创建另一个col4 包含col2[n+3]/col2-1:
+-----+------+-----+
|col1 | col2 | col3|
+-----+------+-----+
| A | 2 | 4 |
+-----+------+-----+
| A | 4 | 5 |
+-----+------+-----+
| A | 7 | 7 |
+-----+------+-----+
| A | 3 | 8 |
+-----+------+-----+
| A | 7 | 3 |
+-----+------+-----+
| B | 8 | 9 |
+-----+------+-----+
| B | 10 | 10 |
+-----+------+-----+
| B | 8 | 9 |
+-----+------+-----+
| B | 20 | 15 |
+-----+------+-----+
输出应该是:
+-----+------+-----+-----+
|col1 | col2 | col3| col4|
+-----+------+-----+-----+
| A | 2 | 4 | 0.5| # (3/2-1)
+-----+------+-----+-----+
| A | 4 | 5 | 0.75| # (7/4-1)
+-----+------+-----+-----+
| A | 7 | 7 | NA |
+-----+------+-----+-----+
| A | 3 | 8 | NA |
+-----+------+-----+-----+
| A | 7 | 3 | NA |
+-----+------+-----+-----+
| B | 8 | 9 | 1.5 |
+-----+------+-----+-----+
| B | 10 | 10 | NA |
+-----+------+-----+-----+
| B | 8 | 9 | NA |
+-----+------+-----+-----+
| B | 20 | 15 | NA |
+-----+------+-----+-----+
我的代码是
df['col4']= df.groupby('col1').apply(lambda x: x['col2'].shift(-3)/x['col2']-1)
导致col4 的所有条目均为“NA”。
我也试过了:
df['col4']= df.groupby('col1').pipe(lambda x: x['col2'].shift(-3)/x['col2']-1)
忽略组“A”和“B”并导致:
+-----+------+-----+-------+
|col1 | col2 | col3| col4 |
+-----+------+-----+-------+
| A | 2 | 4 | 0.5 |
+-----+------+-----+-------+
| A | 4 | 5 | 0.75 |
+-----+------+-----+-------+
| A | 7 | 7 | 0.1428|
+-----+------+-----+-------+
| A | 3 | 8 | 2.33 |
+-----+------+-----+-------+
| A | 7 | 3 | 0.1428|
+-----+------+-----+-------+
| B | 8 | 9 | 1.5 |
+-----+------+-----+-------+
| B | 10 | 10 | NA |
+-----+------+-----+-------+
| B | 8 | 9 | NA |
+-----+------+-----+-------+
| B | 20 | 15 | NA |
+-----+------+-----+-------+
有人知道如何完成这项任务或修复我的代码吗?
【问题讨论】:
-
pandas issue #31063: groupby() apply() gets the shape wrong 在这种情况下 i) groupby 键恰好具有唯一值 ii) apply 函数接受一个 DataFrame 并返回一个 Series
标签: python dataframe pipe apply pandas-groupby