处理大文件（20GB+）时，如何在 python 中更快地进行文件解析和 I/O答案

【问题标题】：How can I make file parsing and I/O faster in python when working with huge files (20GB+)处理大文件（20GB+）时，如何在 python 中更快地进行文件解析和 I/O
【发布时间】：2020-08-16 15:30:59
【问题描述】：

这是我下面的基本示例代码：

def process(line):
    data = line.split("-|-")
    print(userpass)
    try:
        data1, data2 = data[2], data[3]
        finalline = f"{data1} some text here {data2}\n"
        with open("parsed.txt", 'a', encoding="utf-8") as wf:
            wf.write(finalline)
    except:
        pass

with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)

这工作得很好。但是有什么方法可以让它使用多个线程或内核运行得更快？

或者在做操作的时候能以某种方式达到我SSD的读写速度？任何帮助将不胜感激！

【问题讨论】：

标签： python parsing file-io bigdata python-multiprocessing

【解决方案1】：

函数调用在 Python 中会产生大量开销。不要在文件的每一行调用函数，而是内联定义。另外，不要重复打开同一个输出文件；打开一次并保持打开状态。

with open("file.txt", "r", encoding="utf-8") as f, \
     open("parsed.txt", "a", encoding="utf-8") as outh:
    for line in f:
        data = line.split("-|-")
        try:
            print(f"{data[2]} some text here {data[3]}", file=outh)
        except Exception:
            pass

【讨论】：