Python3：尝试检查列表列表中的重复项，然后检查该列表列表的另一个元素是否也是重复项答案

【问题标题】：Python3: Trying to check for a duplicate within a list of lists, and then check if another element of that list of lists is also a duplicatePython3：尝试检查列表列表中的重复项，然后检查该列表列表的另一个元素是否也是重复项
【发布时间】：2021-08-02 16:33:16
【问题描述】：

我已将 csv 文件转换为列表列表（每一行都是一个列表），我正在尝试查看该行的索引 x（我的脚本中的“2”）处的元素是否重复在任何其他行。如果它是重复的，我需要检查索引y（我的脚本中的'5'）是否也重复。我编写了以下嵌套 for 循环：

def duplicate_twice(list_of_lists):
    temp = []
    for i in row_list:
        for j in row_list:
            if row_list[i][2] == row_list[j][2]:
                if row_list[i][5] != row_list[j][5]:
                    diff_part.append(row_list[j])
    return diff_part

这在逻辑上对我来说是有道理的，但我遇到了TypeError: list indices must be integers or slices, not list

有没有更 Pythonic 的方式来执行我正在努力的目标？
我可以改变什么来绕过 TypeError？

【问题讨论】：

这是蟒蛇。这里 i 和 j 是列表的元素，而不是索引。考虑使用 pandas 来解决您的问题。 pandas.pydata.org/docs/reference/api/…

标签： python python-3.x error-handling

【解决方案1】：

考虑使用 Pandas 库解决此问题。这是一个示例，说明如何解决此问题。

假设我们有以下数据框。

import pandas as pd

df = pd.DataFrame([['John', 'Black', 25], 
                   ['Jack', 'White', 23], 
                   ['Alice', 'Smith', 31], 
                   ['John', 'Black', 44]], 
                  columns=['Name', 'Surname', 'Age'])
print(df)

输出：

    Name Surname  Age
0   John   Black   25
1   Jack   White   23
2  Alice   Smith   31
3   John   Black   44

现在，假设我们要查找同名同姓的人。

df[df.duplicated(['Name', 'Surname'], False)]  # This will return your duplicates.

考虑根据需要更改duplicated 方法的第二个参数（keep）。它可能会标记数据集中第一次出现的重复项、最后一次出现以及所有出现。

【讨论】：

【解决方案2】：

我能想到的最pythonic的方式在这里：

array = [
    ['a1', 'b1', 'xx', 'd1', 'e1', 'yy'],
    ['a2', 'b2', 'c2', 'd2', 'e2', 'yy'],
    ['a3', 'b3', 'xx', 'd3', 'e3', 'yy'],
    ['a4', 'b4', 'xx', 'd4', 'e4', 'f4'],
    ['a5', 'b5', 'xx', 'd5', 'e5', 'yy'],
    ['a6', 'b2', 'c2', 'd2', 'e2', 'yy'],
]

# convert the 2d array into an object
# which keys are sum of 2nd and 5th elements of every row of the array

obj = {}
for row in array:
    try:
        obj[row[2] + row[5]].append(row)
    except:
        obj[row[2] + row[5]] = [row]


# get the keys that contain more than one value

duplicated_rows = [ obj[key] for key in obj if len(obj[key]) > 1 ]

print(duplicated_rows)

输出：

[
    ['a1', 'b1', 'xx', 'd1', 'e1', 'yy'], 
    ['a3', 'b3', 'xx', 'd3', 'e3', 'yy'], 
    ['a5', 'b5', 'xx', 'd5', 'e5', 'yy']
], 
[
    ['a2', 'b2', 'c2', 'd2', 'e2', 'yy'], 
    ['a6', 'b2', 'c2', 'd2', 'e2', 'yy']
]

【讨论】：