如何使用 Python 按列对不同的 csv 文件进行排序并将它们合并为一个？答案

【问题标题】：How sort by a column different csv files and merge them into one, using Python?如何使用 Python 按列对不同的 csv 文件进行排序并将它们合并为一个？
【发布时间】：2021-03-17 14:19:26
【问题描述】：

我有很多由 3 列组成的 csv 文件，如下所示：

fac simile of files: file_1, file_4, file_5, file_7, etc 
(All the same file name, != only the final numbers at the end. Them are not consecutive tho as in the 
example)


the inside

['357', '29384', '0.0031545741324921135']
['357', '29389', '0.0031545741324921135']
['357', '29526', '0.0368574903844921735']
['357', '35516', '0.0036775741324564665']
['357', '35551', '0.0023554341325646453']
['357', '35639', '0.0064467781324766535']
['357', '36238', '0.0067543874132467543']
['357', '37162', '0.0031545746577921135']

让我们将 3 列命名为 [a,b,c]。我想按 c 对它们进行排序，所以最后一列。我必须阅读所有文件并将所有内容分类为一个巨大的文件。例如，我可以用泡菜。

我的第一个想法是：

import csv
from operator import itemgetter
fn = 1
# N as the max number in the really last file
while fn < N:
   newfile = open("file_{fn}.csv","r")
   reader = csv.reader(newfile)

   file = open("BigSortedFile.csv","w")

   for line in sorted(reader, key=itemgetter(2)):
   file.write(line)

   fn = fn +1
file.close()

#after the loop I think I have to sort again the BigSortedFile.

但它不起作用，因为我需要一个字符串，而不是一行。整个过程怎么做？

【问题讨论】：

标签： python python-3.x csv file sorting

【解决方案1】：

要对所有行进行排序，您需要将它们全部读取到一个数据结构中，然后再次写入。

csv 模块需要您使用newline="" 打开文件才能正常工作。当您使用csv.reader 读取数据时，您也可以使用csv.writer 写入数据：

import csv
from operator import itemgetter

fn = 1  # first file has number 1 in filename
N = 42  # last numer in file-names is 42

data = []
while fn < N:
   with open("file_{fn}.csv", "r", newline="") as newfile:
       reader = csv.reader(newfile)
       data.extend(list(reader))

data.sort(key=itemgetter(2))

with open("BigSortedFile.csv", "w", newline="") as bf:
    writer = csv.writer(bf)
    writer.writerows(data)

【讨论】：

好的，谢谢。现在我正在尝试看看这是否有效，即使它需要时间。我还有一些 GB 的数据，我真的不知道这是否适用于这么多东西
@Hugo 你应该提到过 - 我非常怀疑它会起作用 - GBsounds 好像它不适合记忆。您可能需要对内容进行部分排序，并且您肯定应该研究 pandas 或类似的东西来处理那么多数据。
@HugoB: how-do-i-read-a-large-csv-file-with-pandas 和 python-pandas-merge-multiple-csv-files