如何根据以上行选择行 | Python - 熊猫答案

【问题标题】：How to select rows based on above rows | Python - Pandas如何根据以上行选择行 | Python - 熊猫
【发布时间】：2020-12-08 01:50:42
【问题描述】：

0| name1 | name2 | tot |
 +-------+-------+-----+
1|   A   |   B   |  3  |
2|   C   |   A   |  3  |
3|   B   |   D   |  4  |
4|   A   |   E   |  2  |
5|   B   |   C   |  5  |
 +-------+-------+-----+

我想根据前面的行选择行，其中“字母”出现在至少 2 次以上的其他行中（分别在 name1 或 name2 中）并且它们的总数 >= 3。

在这个例子中我要选择：

A    E   2
B    C   5

因为在第 4 行，我们有 A (name1) 出现在第 1 行和第 2 行，tot >= 3; 和 B C 5 行，因为我们有 B 出现在第 1 行和第 3 行，tot >= 3。

ps。我想根据这些新结果创建另一个数据集

【问题讨论】：

标签： python pandas dataset rows

【解决方案1】：

您可以使用collections.defaultdict 构建缓存

from collections import defaultdict

df = pd.DataFrame({'name1': list('ACBAB'), 'name2': list('BADEC'), 'tot': [3, 3, 4, 2, 5]})

seen = defaultdict(int) # every new key will be initialized with 0
keep = []
for row in df.itertuples():
    keep.append(
        (seen[row.name1] > 1) |
        (seen[row.name2] > 1)
    )
    if row.tot >= 3:
        # we can do this safely without risk of KeyError because `seen` is a default dict
        seen[row.name1] += 1 
        seen[row.name2] += 1

out = df[keep]

输出

  name1 name2  tot
3     A     E    2
4     B     C    5

【讨论】：