【问题标题】:Python string concatenation slow when using join使用连接时Python字符串连接速度很慢
【发布时间】:2018-03-09 10:12:50
【问题描述】:

我有以下代码应该将目录中的所有文本文件连接到一个文件中。即使我使用 join 进行字符串连接,字符串连接也变得越来越慢(14000 个文件后 60 秒而不是 3 秒)。我做错了什么?

# -*- coding: utf-8 -*-
import os
from datetime import datetime

t1 = datetime.now()

directory_in_str = "E:\\Downloads\\WikipediaAF\\Extracted\\"
directory = os.fsencode(directory_in_str)

c = 1
af = ''
for file in os.listdir(directory):
    c = c + 1
    if c % 1000 == 0:
        t2 = datetime.now()
        print('Time now: ' + str(t2 - t1))
        print(str(c) + ' out of 67062')
    #    break
    filename = os.fsdecode(file)
    with open(os.path.join(directory_in_str, filename), encoding="utf8") as f_in:
        af = ''.join([af, '== ', filename, ' ==\n', f_in.read().replace(" 'n ", " ’n ")])

【问题讨论】:

  • 在循环中调用join 并不比在循环中调用+ 好。这个想法是你收集所有你要加入的字符串,并在一次调用中加入它们,而不是一遍又一遍地调用join
  • 我将它们循环收集到一个列表中?
  • data = set();with open(os.path.join(whatever, whatever)) as file:;for line in file.readlines():;data.add(line.strip());return list(data) 类似的东西会给你一个文件中独特元素的列表
  • 当我可以一次读取整个文件时,为什么要 for line(并产生更多的字符串来连接)?
  • 双端队列比附加列表快吗?我可以在连接中使用双端队列吗?

标签: python string concatenation processing-efficiency


【解决方案1】:
# -*- coding: utf-8 -*-
import os
from datetime import datetime
from collections import deque

t1 = datetime.now()

directory_in_str = "E:\\Downloads\\WikipediaAF\\Extracted\\"
directory = os.fsencode(directory_in_str)

c = 1
af = deque()
for file in os.listdir(directory):
    c = c + 1
    if c % 1000 == 0:
        t2 = datetime.now()
        print('Time now: ' + str(t2 - t1))
        print(str(c) + ' out of 67062')
    #    break
    filename = os.fsdecode(file)
    with open(os.path.join(directory_in_str, filename), encoding="utf8") as f_in:
        af.append('== ')
        af.append(filename)
        af.append(' ==\n')
        af.append(f_in.read().replace(" 'n ", " ’n "))

t2 = datetime.now()
print('After read af: ' + str(t2 - t1))

af = ''.join(af)

t2 = datetime.now()
print('After join af: ' + str(t2 - t1))

with open(os.path.join(directory_in_str, 'af_out2.txt'), 'w', encoding='utf-8') as f_out:
    f_out.write(af)

【讨论】:

    猜你喜欢
    • 2012-01-07
    • 1970-01-01
    • 2013-11-10
    • 2021-04-27
    • 1970-01-01
    • 2018-12-08
    • 2012-11-11
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多