在python中同时逐行读取n个文件答案

【问题标题】：Simultaneous line-by-line reading of n-number of files in python在python中同时逐行读取n个文件
【发布时间】：2014-04-29 19:34:35
【问题描述】：

我想要对其执行统计的文件夹中有未知数量的测量数据 CSV 文件（它可能会随着时间而改变）。 CSV 共有 5 列数据。我希望能够分别对每一行进行统计分析（多次测量的平均值，stdev 等）。到目前为止，我已经在文件夹中列出了 ATM 文件，将它们存储到列表中并尝试从列表中打开文件。尝试遍历文件的行时会变得非常混乱。现在我只是想将内容附加到列表中并将它们输出到其他文件中。没运气。代码可能不是很干净，我是编程初学者，但我们开始吧：

import re
import os

lines_to_skip = 25
workingdir = os.path.dirname(os.path.realpath(__file__))
file_list = []
templine = []
lineNo = 0

print ("Working in %s" %workingdir)
os.chdir(workingdir)
for file in os.listdir(workingdir):
        if file.endswith('.csv'):
                #list only file name without extension (to be able to use filename as variable later)
                file_list.append(file[0:-4])
#open all files in the folder
print (file_list)
for i, value in enumerate(file_list):
    exec "%s = open (file_list[i] + '.csv', 'r')" % (value)

#open output stats file
fileout = open ('zoutput.csv', 'w')

#assuming that all files are of equal length (as they should be)
exec "for x in len(%s + '.csv'):" % (file_list[0])
for i in xrange(lines_to_skip):
        exec "%s.next()" % (file_list[0])
        for j, value in enumerate(file_list):
                templine[:]=[]
                #exec "filename%s=value" % (j)
                exec "line = %s.readline(x)" % (value)
                templine.extend(line)
        fileout.write(templine)

fileout.close()
#close all files in the folder
for i, value in enumerate(file_list):
    #exec "filename%s=value" % (i)
    exec "%s.close()" % (value)

有什么建议可以用其他方式来做或改进现有方法吗？前 25 行只是信息字段，对于我的目的而言，它们是无用的。我可以分别从每个文件中删除前 25 行（而不是试图跳过它们），但我想这并不重要。请不要推荐使用电子表格或其他统计软件——到目前为止，我尝试过的这些软件都无法消化我拥有的大量数据。谢谢

【问题讨论】：

查看标准模块csv，它将为您做很多工作。另一个建议是，在开始编程之前先完成现有的 Python 教程。这可以为您节省大量“车轮改造工作”

标签： python file statistics

【解决方案1】：

如果我正确理解您的问题，您希望将每个文件的列相互粘贴，并且从 N 个文件中，包含 C 列和 R 行，您希望一次处理一行，其中每行有 N* C 列？

$ cat rowproc.py
import sys

for l in sys.stdin:
    row = map(float, l.split())
# process row

$ paste *.csv | tail -n+25 | python rowproc.py

或者，如果你运气不好，手头没有类似 Unix 的环境，而不得不在 python 中做所有事情：

import sys
from  itertools import izip

filehandles = [ open(fn) for fn in sys.argv[1:] ]
for i, rows in enumerate(izip(*filehandles)):
    if i<25: continue

    cols = [ map(float, row.split()) for row in rows ]
    print cols

结果：

[[150.0, 26.0], [6.0, 8.0], [14.0, 10.0]]
[[160.0, 27.0], [7.0, 9.0], [16.0, 11.0]]
[[170.0, 28.0], [8.0, 10.0], [18.0, 12.0]
...

只要你能够同时打开足够多的文件，这两种方法都可以处理任意大量的数据。

如果您无法通过 argv 传递文件名，请使用 Glob

【讨论】：