如何在列表中查找重复元素并将其删除？答案

【问题标题】：How to find duplicate element and remove them in the list?如何在列表中查找重复元素并将其删除？
【发布时间】：2017-09-29 10:21:22
【问题描述】：

我有两个列表如下：

name =[A,    B  ,  C  , D   , E ,  F  ]
cls=[1,    2  ,  3  , 2   , 4 ,  1  ]
score=[0.1, 0.2 , 0.5 , 0.3 , 1 , 0.8 ]

表示A属于1类，得分0.1，B属于2类，得分0.2，以此类推。

我正在寻找一种方法来查找具有相同类的对象，如果该对象的分数小于该类中的另一个对象 (cls)，则将其删除。所以，我的预期结果是

name =[C  , D   , E ,  F  ]
cls  =[3  , 2   , 4 ,  1  ]
score=[0.5 ,0.3 , 1 , 0.8 ]

name、cls 和 score 是列表类型。如何在 python 中实现它？谢谢

这就是我所做的

name_clean=[]
cls_clean=[]
score_clean=[]
for i in range(len(cls)-1):
    cls_i=cls[i]
    max_index = -1
    for j in range(i+1,len(cls)):
        cls_j = cls[j]
        if (cls_i==cls_j):
            if (score[i]<=score[j]):
                max_index=j
            else:
                max_index=i
    if (max_index>=0):
        name_clean.append(name[max_index])
        cls_clean.append(cls[max_index])
        score_clean.append(score[max_index])
    else:
        name_clean.append(name[i])
        cls_clean.append(cls[i])
        score_clean.append(score[i])

【问题讨论】：

您好，这不是代码补全服务。阅读如何提问：describe the problem and what has been done so far to solve it.
我投票结束这个问题，因为 OP 要求其他人做自己的工作。
我尝试使用两个 for 循环，但没有成功。因此，我在这里问。为什么近了？
请展示你的努力...第一步是使用正确的数据结构。
别害怕，需要 5 票才能结束问题。当然，渴望获得 1k 的人会做你的工作以获得一些支持。

标签： python python-2.7 numpy

【解决方案1】：

请注意，您不能使用 class 作为变量名，因为它是 Python 中的保留关键字。

我会考虑使用一个包含namedtuples 或表格的列表，而不是使用 3 个列表，例如pandas.DataFrame.

但是，既然你有 3 个列表，我会这样做：

获取每个班级的最高分并将其存储在字典中

highest_scores = {}
for c, s in zip(cls, score):
    current_max = highest_scores.get(c, None)
    if current_max is None or current_max < s:  # not present or smaller
        highest_scores[c] = s

然后再次遍历列表，只保留那些分数等于该类存储分数的列表：

new_name = []
new_cls = []
new_score = []
for n, c, s in zip(name, cls, score):
    if s == highest_scores[c]:
        new_name.append(n)
        new_cls.append(c)
        new_score.append(s)

这给出了：

>>> new_name
['C', 'D', 'E', 'F']
>>> new_cls
[3, 2, 4, 1]
>>> new_score
[0.5, 0.3, 1, 0.8]

请注意，这将保留每个班级的所有“最高分数”，因此如果您有相同的班级和相同的分数，这将保留两者。要解决此问题，您可以在找到第一个键后立即从字典中删除该键。

for n, c, s in zip(name, cls, score):
    if c in highest_scores and s == highest_scores[c]:
        new_name.append(n)
        new_cls.append(c)
        new_score.append(s)
        del highest_scores[c]

【讨论】：

完美！。我知道了。非常感谢
"我会考虑使用一个包含命名元组或表的列表，而不是使用 3 个列表" => 或者只是简单的元组...
@brunodesthuilliers 是的，列表/字典/自定义类也是如此。我想展示最简单的替代方案，您不会丢失属性的名称。

【解决方案2】：

使用正确的数据结构有很大帮助。在您的情况下，您希望通过 class 重新组合您的数据：

names = ["A",  "B", "C", "D", "E",  "F"]
classes = [1,  2  ,  3  , 2   , 4 ,  1]
scores = [0.1, 0.2 , 0.5 , 0.3 , 1 , 0.8]

byclasses = defaultdict(list)
for name, class_, score in zip(names, classes, scores):
    byclasses[class_].append((score, name))

print byclasses

在这个阶段你得到的是：

{1: [(0.1, 'A'), (0.8, 'F')], 
 2: [(0.2, 'B'), (0.3, 'D')], 
 3: [(0.5, 'C')], 
 4: [(1, 'E')]
}

现在您只需对每个列表进行排序（它们将按分数升序排序）并保留每个列表的最后一项（这将是得分最高的一项）

cleaned = [((k,) + sorted(v)[-1]) for k, v in byclasses.items()]
print cleaned

它会为您提供（类、分数、名称）元组列表：

[(1, 0.8, 'F'), (2, 0.3, 'D'), (3, 0.5, 'C'), (4, 1, 'E')]

并且 - 如果您坚持使用三个列表而不是元组列表 - 将结果解压缩到三个新列表：

cnames, cclasses, cscores = (list(c) for c in zip(*cleaned))
print cnames, cclasses, cscores

我们在这里：

[1, 2, 3, 4] [0.8, 0.3, 0.5, 1] ['F', 'D', 'C', 'E']

【讨论】：

【解决方案3】：

以下解决方案将问题分为 2 个不同的步骤，

找出每类项目的最高分并将其存储在地图中
创建仅包含与最高分对应的项目的新列表

注意使用 set() 来查找不同的类，

name = ['A', 'B', 'C', 'D', 'E', 'F']
cls = [1, 2, 3, 2, 4, 1]
score = [0.1, 0.2, 0.5, 0.3, 1, 0.8]

# find largest score for each class
max_class_scores = {} # key is class, value is max score
for c in set(cls):
    # contains max score for a class
    max_class_scores[c] = max(s for (i,s) in enumerate(score) if cls[i]==c)

new_name = []
new_cls = []
new_score =[]
for n,c,s in zip(name,cls,score):
    max_score = max_class_scores[c]
    if s == max_score :  # only process where the current record is max for the class
        new_name.append(n)
        new_cls.append(c)
        new_score.append(s)

print(new_name,new_cls,new_score)

【讨论】：

【解决方案4】：

from itertools import groupby
from operator import itemgetter

name=['A','B','C','D','E','F']
cls=[1,2,3,2,4,1]
score=[0.1,0.2,0.5,0.3,1,0.8]

f=itemgetter(1)
g=itemgetter(2)
groups=groupby(sorted(zip(name,cls,score), key=f), key=f)

name, cls, score = zip(*map(lambda x: max(x, key=g), ((item for item in data) for (key, data) in groups)))

【讨论】：