如何在python中删除重复项？答案

【问题标题】：How to remove duplicates in python?如何在python中删除重复项？
【发布时间】：2020-09-08 16:59:12
【问题描述】：

我有一个如下的DataFrame：

print(df)

   Product  Color   Weight  
0     A      Red     13.01
1     A      Red     13.04
2     A      Red     13.10
3     A      Red     13.11

我想删除重复项并仅保存重量为 max() 的产品。

print(df)

   Product  Color   Weight  
0     A      Red     13.11

谢谢

【问题讨论】：

标签： python-3.x pandas duplicates

【解决方案1】：

您可以将groupby 与.max 一起使用

#if you don't care about color remove it from the groupby clause.
#df.groupby(['Product'])['Weight'].max().reset_index()
df1 = df.groupby(['Product','Color'])['Weight'].max().reset_index()

print(df1)

  Product Color  Weight
0       A   Red   13.11

【讨论】：

或df.sort_values('Weight').drop_duplicates(['Product', 'Color'], keep='last')
@ansev 我确实想到了这一点，但我认为 goupby 的性能会更好
@ansev 给定一个包含 1000000 行的数据框，其中包含 6 个产品、4 个颜色选项和 5-14 的随机权重：drop_duplicates performance = 197 ms ± 1.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 和 groupby performance = 108 ms ± 1.23 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)。
是的，也许组的数量必须非常大才能使 drop_duplicates 选项更好:) @Trenton McKinney