pandas：删除两列中具有相同索引的行中的重复值答案

【问题标题】：pandas: removing duplicate values in rows with same index in two columnspandas：删除两列中具有相同索引的行中的重复值
【发布时间】：2022-01-11 20:42:52
【问题描述】：

我有一个如下的数据框：

import numpy as np
import pandas as pd
df = pd.DataFrame({'text':['she is good', 'she is bad'], 'label':['she is good', 'she is good']})

我想逐行比较，如果两个相同索引的行具有相同的值，请将“标签”列中的重复项替换为“相同”一词。

期望的输出：

           pos        label
0  she is good      same

1   she is bad  she is good

到目前为止，我已经尝试了以下方法，但它返回错误：

ValueError: Length of values (1) does not match length of index (2)

df['label'] =np.where(df.query("text == label"), df['label']== ' ',df['label']==df['label'] )

【问题讨论】：

标签： python-3.x pandas duplicates rowwise

【解决方案1】：

您的语法不正确，请查看numpy.where 的文档。检查两列之间是否相等，并替换标签列中的值：

import numpy as np
df['label'] = np.where(df['text'].eq(df['label']),'same',df['label'])

打印：

          text        label
0  she is good         same
1   she is bad  she is good

【讨论】：