比较python中的两个CSV文件并输出答案

【问题标题】：Comparing two CSV files in python and output比较python中的两个CSV文件并输出
【发布时间】：2022-01-24 04:35:38
【问题描述】：

我有两个 CSV 文件。我需要将 file1 与 file2 进行比较，并输出 file1 中不在 file2 中的任何行。问题是当 file1 中有两行具有精确值时。输出应该显示不在 file2 中的值，而是将两个值作为重复项删除。有没有办法做到这一点？

我现在使用的代码是


with open('file1.csv', 'r') as t1, open('file2.csv', 'r') as t2:
            fileone = t1.readlines()
            filetwo = t2.readlines()

with open('addressList.csv', 'w') as outFile:
         for line in fileone:
             if line not in filetwo:
                      outFile.write(line)
            sys.exit()

文件1：

address, value

2ce8e,200

fb0d7,350

fb0d7,225

fb0d7,250

fb0d7,361

fb0d7,175

fb0d7,450

文件2：

address, value

2ce8e,200

fb0d7,350

fb0d7,250

fb0d7,225

fb0d7,175

fb0d7,361

fb0d7,450

输出应该是

address, value

2ce8e,200

【问题讨论】：

您可以使用命令行diff 工具完成此操作，而无需任何编程。
我需要将输出保存为 csv 格式，因为另一个脚本将使用数据
diff 打印出不同的行。我只是指出，使用或调整现有工具通常比编写新工具更好。

标签： python python-3.x

【解决方案1】：

尝试使用 Python 的 set 类型。 set 中的每个元素都必须是不同的，因此它会自动删除其元素的重复数据。 set 类型上提供的操作使比较元素变得非常容易。查阅 Python 的文档和数学集合论以获取更多信息。

例子：

fileone = set(t1.readlines())
filetwo = set(t2.readlines())

# get lines from fileone that are not in filetwo (set difference)
diff12 = fileone - filetwo

# get lines from filetwo that are not in fileone (set difference)
diff21 = filetwo - fileone

# get lines in common between fileone and filetwo (set intersection)
common = fileone & filetwo

您似乎需要保留行的顺序，set 不会自行完成。但是，您仍然可以使用集合来加快此过程。

【讨论】：