【发布时间】:2017-05-02 01:54:13
【问题描述】:
我需要在 Pandas 数据框中找到重复的行,然后添加一个带有计数的额外列。假设我们有一个数据框:
>>print(df)
+----+-----+-----+-----+-----+-----+-----+-----+-----+
| | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
|----+-----+-----+-----+-----+-----+-----+-----+-----|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 2 | 2 | 4 | 3 | 4 | 1 | 1 | 4 | 4 |
| 3 | 4 | 3 | 4 | 0 | 0 | 0 | 0 | 0 |
| 4 | 2 | 3 | 4 | 3 | 4 | 0 | 0 | 0 |
| 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 6 | 4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 |
| 7 | 1 | 1 | 4 | 0 | 0 | 0 | 0 | 0 |
| 8 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 9 | 4 | 3 | 4 | 0 | 0 | 0 | 0 | 0 |
| 10 | 3 | 3 | 4 | 3 | 5 | 5 | 5 | 0 |
| 11 | 5 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 12 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 13 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 |
| 14 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 15 | 1 | 3 | 5 | 0 | 0 | 0 | 0 | 0 |
| 16 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 17 | 3 | 3 | 4 | 4 | 0 | 0 | 0 | 0 |
| 18 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+----+-----+-----+-----+-----+-----+-----+-----+-----+
然后上面的帧将变成下面的帧,并带有一个带有计数的附加列。可以看到我们还在保留索引列。
+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
| | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 |
|----+-----+-----+-----+-----+-----+-----+-----+-----|-----|
| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| 1 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 |
| 2 | 2 | 4 | 3 | 4 | 1 | 1 | 4 | 4 | 1 |
| 3 | 4 | 3 | 4 | 0 | 0 | 0 | 0 | 0 | 2 |
| 4 | 2 | 3 | 4 | 3 | 4 | 0 | 0 | 0 | 1 |
| 5 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 3 |
| 6 | 4 | 5 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 7 | 1 | 1 | 4 | 0 | 0 | 0 | 0 | 0 | 1 |
| 10 | 3 | 3 | 4 | 3 | 5 | 5 | 5 | 0 | 1 |
| 11 | 5 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 13 | 0 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 15 | 1 | 3 | 5 | 0 | 0 | 0 | 0 | 0 | 1 |
| 16 | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
| 17 | 3 | 3 | 4 | 4 | 0 | 0 | 0 | 0 | 1 |
+----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
我已经看到了其他解决方案,例如:
df.groupby(list(df.columns.values)).size()
但这会返回一个有间隙且没有初始索引的矩阵。
【问题讨论】:
标签: python pandas group-by aggregate multiple-columns