【问题标题】:How can I make file parsing and I/O faster in python when working with huge files (20GB+)处理大文件(20GB+)时,如何在 python 中更快地进行文件解析和 I/O
【发布时间】:2020-08-16 15:30:59
【问题描述】:

这是我下面的基本示例代码:

def process(line):
    data = line.split("-|-")
    print(userpass)
    try:
        data1, data2 = data[2], data[3]
        finalline = f"{data1} some text here {data2}\n"
        with open("parsed.txt", 'a', encoding="utf-8") as wf:
            wf.write(finalline)
    except:
        pass

with open("file.txt", "r", encoding="utf-8") as f:
    for line in f:
        process(line)

这工作得很好。但是有什么方法可以让它使用多个线程或内核运行得更快?

或者在做操作的时候能以某种方式达到我SSD的读写速度? 任何帮助将不胜感激!

【问题讨论】:

    标签: python parsing file-io bigdata python-multiprocessing


    【解决方案1】:

    函数调用在 Python 中会产生大量开销。不要在文件的每一行调用函数,而是内联定义。另外,不要重复打开同一个输出文件;打开一次并保持打开状态。

    with open("file.txt", "r", encoding="utf-8") as f, \
         open("parsed.txt", "a", encoding="utf-8") as outh:
        for line in f:
            data = line.split("-|-")
            try:
                print(f"{data[2]} some text here {data[3]}", file=outh)
            except Exception:
                pass
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2020-11-17
      • 2019-03-18
      • 1970-01-01
      • 1970-01-01
      • 2020-06-26
      • 2013-05-16
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多