【发布时间】:2021-12-02 22:59:19
【问题描述】:
我正在尝试从 Python 中的 Pandas DataFrame 中删除重复条目。 DataFrame 由多个 *.csv 文件的垂直连接内容组成。这是数据框:
print(df)
file TestA TestB
One.csv 7513 -643.1
One.csv 15347 NaN
One.csv NaN 22.7
One.csv 46321 NaN
One.csv NaN 156.1
One.csv 2477 52.7
Two.csv 417 1473.5
Two.csv 7513 -643.1
Two.csv 15347 NaN
Two.csv NaN 22.7
Two.csv 46321 NaN
Two.csv NaN 156.1
Three.csv -4341 NaN
Three.csv 34473 437
Three.csv 1349 NaN
Four.csv 17 NaN
Four.csv 107 NaN
Four.csv -931 44536
Four.csv 6285 NaN
Four.csv 119 34722
我想做以下事情: 一个。类似:
print(
f"Rows {[1,2,3,4,5]} of {'One.csv'} are duplicated in rows {[2,3,4,5,6]} of "
f"{'Two.csv'}. Rows from {'One.csv'} will now be removed "
)
我想要print 语句的这个结果:
Rows [1,2,3,4,5] of One.csv are duplicated in rows [2,3,4,5,6] of Two.csv. Rows from One.csv will now be removed from the DataFrame.
我不确定如何识别行并在 print 语句中设置它们。
有没有办法通过列号 1 (FileName) 的行号来识别重复的行?
编辑:
要创建 DataFrame df,请从此处选择 DataFrame 并将其复制到剪贴板。然后使用这个:
import pandas as pd
df = pd.read_clipboard()
print(df)
【问题讨论】:
标签: python pandas duplicates dataframe