使用 .loc 用另一列的值更新 pandas 列答案

【问题标题】：update pandas column with another column's values using .loc使用 .loc 用另一列的值更新 pandas 列
【发布时间】：2019-04-03 15:03:39
【问题描述】：

如果 ColX != 0 的值，我需要有条件地更新下面的 ColY。与其他示例的不同之处在于，我需要将 ColY 替换为 ColX 中的值，而不是字符串

当我使用以下代码时，我可以使用 .loc 替换为字符串：

df1.loc[df1.ColX != 0, 'ColY'] = 'Example'

如何将相关的 ColY 值替换为 ColX 中的值？我尝试了以下方法无济于事

df1.loc[df1.ColX != 0, 'ColY'] = df1.ColX

我的原始数据框 df1 是：

ID  ColX   ColY
A   2024   0
B   0      2023
C   2019   0
D   2023   2024

我想要的输出是：

ID  ColX   ColY
A   2024   2024
B   0      2023
C   2019   2019
D   2023   2023

【问题讨论】：

您是要使用.loc 还是愿意接受其他建议？
您的代码 df1.loc[df1.ColX != 0, 'ColY'] = df1.ColX 工作正常，除非您有重复的索引或者它是类别数据
@BENY 如果索引有重复怎么办？ .loc 不会通过仅在 df.ColX 上应用相同的布尔掩码 df1.ColX != 0 来更新值吗？

标签： python python-3.x pandas dataframe pandas-loc

【解决方案1】：

为了您的方便，我认为这是另一种 cleaner 方法，使用 np.where 和 .ne：

df['ColY'] = np.where(df['ColX'].ne(0), df['ColX'], df['ColY'])

print(df)
  ID  ColX  ColY
0  A  2024  2024
1  B     0  2023
2  C  2019  2019
3  D  2023  2023

【讨论】：

当你考虑 np.where 时，你总是可以在 pandas 中使用 df.where :-)
我见过，只是不确定它是否使用了相同的矢量化代码？是一样的吗？ @文本。我使用np.where，因为我可以放心它的速度。
.where in pandas 可以应用于对整个数据框行的更改，例如，您想将所有列更改为 df 中的最后一列，使用 where 更好:-)
谢谢二凡。我已经尝试过了，我也考虑过 .where ，但我收到以下错误：TypeError: 'DataFrame' object is not callable。我假设这是对 .where 调用中的 df['ColY'] 的引用
您使用的是np.where 还是df.where。因为有区别。如果您正确应用它，我提供的代码将正确运行。 @user2845013

【解决方案2】：

df1.loc[df1.ColX != 0, 'ColY'] = df1.ColX 的问题在于您试图用整个df1.ColX 替换df1.ColY（即df1.ColX != 0）的子集，它有更多的值。

要有条件地复制正确的值，您还必须对df1.ColX 应用相同的过滤器：

df1 = pd.DataFrame(data=[[2024, 0], [0, 2023], [2019, 0], [2023, 2023],], columns=['ColX', 'ColY'])

relevant_cols = (df1.ColX != 0)
df1.loc[relevant_cols, 'ColY'] = df1.loc[relevant_cols, 'ColX']
df1
#   ColX  ColY
# 0  2024  2024
# 1     0  2023
# 2  2019  2019
# 3  2023  2023

【讨论】：

他不需要写两次条件