如果从数据框中删除重复的行答案

【问题标题】：Remove duplicate rows from dataframe if如果从数据框中删除重复的行
【发布时间】：2021-02-24 21:11:40
【问题描述】：

我正在使用 pandas 数据框，我想根据列 ID 值删除重复行，但在重复行中，如果 Value 列有值，我想保留该行。

我知道

.drop_duplicates(subset="ID", keep="first")

，但如果值单元格不同，则会保留重复的行。

输入表：

ID	Value
A	qwer
B	asdf
A
C

输出表：

ID	Value
A	qwer
B	asdf
C

谢谢

【问题讨论】：

df.drop_duplicates(subset=['A','qwer'])?
@wwnde 我不能对每一行都这样做...：/
@wwnde I can't do that for every row though...:/ 你为什么不能那样做？有什么我错过的吗
@wwnde 不会只适用于我的示例输入表的第一行吗？其他有重复 ID 但没有重复值的行呢？
问题编辑后编辑df.drop_duplicates(subset=['ID','Value'])

标签： mysql pandas dataframe python-3.7

【解决方案1】：

我相信这是你的逻辑：

# mark the duplicated rows
duplicated = df['ID'].duplicated()

# non-nan rows
# consider `.ne('')` if you are searching for blank value
not_empty = df['Value'].notna()

# keep rows that are not duplicated or non-nan
# that is to remove those that are **both** duplicates and has empty value
df[(~duplicated) | not_empty]

【讨论】：