使用连接时Python字符串连接速度很慢答案

【问题标题】：Python string concatenation slow when using join使用连接时Python字符串连接速度很慢
【发布时间】：2018-03-09 10:12:50
【问题描述】：

我有以下代码应该将目录中的所有文本文件连接到一个文件中。即使我使用 join 进行字符串连接，字符串连接也变得越来越慢（14000 个文件后 60 秒而不是 3 秒）。我做错了什么？

# -*- coding: utf-8 -*-
import os
from datetime import datetime

t1 = datetime.now()

directory_in_str = "E:\\Downloads\\WikipediaAF\\Extracted\\"
directory = os.fsencode(directory_in_str)

c = 1
af = ''
for file in os.listdir(directory):
    c = c + 1
    if c % 1000 == 0:
        t2 = datetime.now()
        print('Time now: ' + str(t2 - t1))
        print(str(c) + ' out of 67062')
    #    break
    filename = os.fsdecode(file)
    with open(os.path.join(directory_in_str, filename), encoding="utf8") as f_in:
        af = ''.join([af, '== ', filename, ' ==\n', f_in.read().replace(" 'n ", " ’n ")])

【问题讨论】：

在循环中调用join 并不比在循环中调用+ 好。这个想法是你收集所有你要加入的字符串，并在一次调用中加入它们，而不是一遍又一遍地调用join。
我将它们循环收集到一个列表中？
data = set();with open(os.path.join(whatever, whatever)) as file:;for line in file.readlines():;data.add(line.strip());return list(data) 类似的东西会给你一个文件中独特元素的列表
当我可以一次读取整个文件时，为什么要 for line（并产生更多的字符串来连接）？
双端队列比附加列表快吗？我可以在连接中使用双端队列吗？

标签： python string concatenation processing-efficiency

【解决方案1】：

# -*- coding: utf-8 -*-
import os
from datetime import datetime
from collections import deque

t1 = datetime.now()

directory_in_str = "E:\\Downloads\\WikipediaAF\\Extracted\\"
directory = os.fsencode(directory_in_str)

c = 1
af = deque()
for file in os.listdir(directory):
    c = c + 1
    if c % 1000 == 0:
        t2 = datetime.now()
        print('Time now: ' + str(t2 - t1))
        print(str(c) + ' out of 67062')
    #    break
    filename = os.fsdecode(file)
    with open(os.path.join(directory_in_str, filename), encoding="utf8") as f_in:
        af.append('== ')
        af.append(filename)
        af.append(' ==\n')
        af.append(f_in.read().replace(" 'n ", " ’n "))

t2 = datetime.now()
print('After read af: ' + str(t2 - t1))

af = ''.join(af)

t2 = datetime.now()
print('After join af: ' + str(t2 - t1))

with open(os.path.join(directory_in_str, 'af_out2.txt'), 'w', encoding='utf-8') as f_out:
    f_out.write(af)

【讨论】：