Python根据另一列将值从一行复制到另一行答案

【问题标题】：Python copy values from one row into other based on other columnPython根据另一列将值从一行复制到另一行
【发布时间】：2023-03-17 19:25:01
【问题描述】：

我整天都在尝试这个，所以我真的希望有人能帮助我。

我有一个看起来像这样的数据框（对不起格式，它不允许我将其格式化为表格，因为它随后指出我的代码中会有错误...）：

| ColumnA  | ColumnB |otherColumn    |
| -------- | ------- |-------------- |
| 1        | n.a.    |row x          |
| 1        | n.a.    |row b          |
| 1        | n.a.    |row x          |
| 2        | 23467   |row x          |
| 2        | n.a.    |row y          |
| 3        | n.a.    |row x          |
| 3        | 768345  |row  y         |
| 3        | n.a.    |row  y         |
| 3        | 768345  |row   x        |
| 4        | 95634511|row  x         |
| 4        | n.a.    |row    r       |
| 5        | n.a.    |row    d       |

我现在需要在 ColumnA 中具有相同编号的那些行中填充相同的 ColumnB 值。（在这种情况下不需要 otherColumn，我刚刚添加它以表明还有多个其他列）。如果 ColumnB 中属于 ColumnA 中相同数字的任何行中没有值，则它应保持为 n.a。（例如“1”和“5”）所以期望的输出应该是

| ColumnA  | ColumnB |otherColumn    |
| -------- | --------|-------------- |
| 1        | n.a.    |row x          |
| 1        | n.a.    |row b          |
| 1        | n.a.    |row x          |
| 2        | 23467   |row x          |
| 2        | 23467   |row y          |
| 3        | 768345  |row x          |
| 3        | 768345  |row  y         |
| 3        | 768345  |row  y         |
| 3        | 768345  |row   x        |
| 4        | 95634511|row  x         |
| 4        | 95634511|row    r       |
| 5        | n.a.    |row    d       |

我尝试将两列转换为字典，但无法分别使用键和值（构建 if 语句）；我已经尝试过生成由 ColumnA 和 ColumnB 组成的第二个数据框并尝试合并它（但随后它添加了更多行）；我用 update() 和 combine_first() 试过了，也没有成功。

非常感谢每一个建议！

【问题讨论】：

标签： python pandas

【解决方案1】：

您可以使用groupby 找到有效值并转换first：

print (df.replace('n.a.', np.NaN).groupby("ColumnA")["ColumnB"].transform("first"))

0         None
1         None
2         None
3        23467
4        23467
5       768345
6       768345
7       768345
8       768345
9     95634511
10    95634511
11        None

【讨论】：

完美且非常简洁！谢谢！

【解决方案2】：

试试replace()+groupby()+apply():

替换字符串'n.a.'到实际的 NaN 然后 groupby 'ColumnA' 然后向前填充 'ColumnB' 值最终将其分配回 'ColumnB'

df['ColumnB']=(df.replace('n.a.',float('NaN'))
                 .groupby('ColumnA')['ColumnB']
                 .apply(lambda x:x.ffill().bfill()))

注意：您也可以使用transform 代替apply()

【讨论】：

这正是我想要的！而且我尝试过如此复杂的事情......有时解决方案可以很简单。非常感谢！
ColumnA 3 的第一个值失败，需要一个额外的bfill。
它确实在原始数据集中正常工作。但我也会看看你的解决方案，HenryYik！
感谢您的回答，它运行良好，但亨利的回答更简洁，这就是我选择他的回答的原因。
@athelas np 我也喜欢亨利的方法;)