来自标准输出的 python read() 比逐行读取慢得多（啜饮？）答案

【问题标题】：python read() from stdout much slower than reading line by line (slurping?)来自标准输出的 python read() 比逐行读取慢得多（啜饮？）
【发布时间】：2014-01-27 16:42:14
【问题描述】：

我有一个 python SubProcess 调用，它运行一个可执行文件并将输出通过管道传输到我的子进程标准输出。

在 stdout 数据相对较小（~2k 行）的情况下，逐行读取和作为块读取 (stdout.read()) 之间的性能是相当的...... stdout.read() 稍微更快。

一旦数据变大（比如 30k+ 行），逐行读取的性能会显着提高。

这是我的比较脚本：

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
for line in (iter(proc.stdout.readline,b'')):
    tmp.append(line)
print("line by line = %.2f"%(time.clock()-tic))

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
fullFile=proc.stdout.read()
print("slurped = %.2f"%(time.clock()-tic))

这些是读取约 96k 行（或 50mb 的磁盘内存）的结果：

line by line = 5.48
slurped = 153.03

我不清楚为什么性能差异如此之大。我的期望是 read() 版本应该比逐行存储结果更快。当然，在实际情况下，我期待更快的逐行结果，在读取期间可以进行大量的每行处理。

谁能让我深入了解 read() 的性能成本？

【问题讨论】：

子进程执行的时间总是一样吗？（例如，缓存对重复运行等没有影响）
重复运行没有观察到明显的收益。
无法使用seq 30000 (linux.die.net/man/1/seq) 进行复制。我认为我们将需要一个 SSCCE (sscce.org)。
我怀疑内存压力。您是否比较了两种情况下的内存使用模式？
@NPE：这是一个 Windows 系统……如果 python 执行与 linux python 有什么不同的话。另外，我不确定 seq 30000 会强制我看到的行为，因为我认为这是数据量问题，而不是行数问题。我的每一行都有大约 400 个字符。

标签： python performance subprocess readline

【解决方案1】：

这不仅仅是 Python，没有缓冲的字符读取总是比读入行或大块慢。

考虑以下两个简单的 C 程序：

[readchars.c]

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main(void) {
        FILE* fh = fopen("largefile.txt", "r");
        if (fh == NULL) {
                perror("Failed to open file largefile.txt");
                exit(1);
        }

        int c;
        c = fgetc(fh);
        while (c != EOF) {
                c = fgetc(fh);
        }

        return 0;
}

[readlines.c]

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main(void) {
        FILE* fh = fopen("largefile.txt", "r");
        if (fh == NULL) {
                perror("Failed to open file largefile.txt");
                exit(1);
        }

        char* s = (char*) malloc(120);
        s = fgets(s, 120, fh);
        while ((s != NULL) && !feof(fh)) {
                s = fgets(s, 120, fh);
        }

        free(s);

        return 0;
}

他们的结果（YMMW，largefile.txt 是 ~200MB 文本文件）：

$ gcc readchars.c -o readchars
$ time ./readchars            
./readchars  1.32s user 0.03s system 99% cpu 1.350 total
$ gcc readlines.c -o readlines
$ time ./readlines            
./readlines  0.27s user 0.03s system 99% cpu 0.300 total

【讨论】：

确实，很抱歉。

【解决方案2】：

尝试在您的 Popen 调用中添加一个 bufsize 选项，看看它是否有所作为：

proc=subprocess.Popen(executable, bufsize=-1, stdout=subprocess.PIPE)

Popen 包含一个选项来设置读取输入的缓冲区大小。 bufsize 默认为 0，表示无缓冲输入。任何其他值表示大约该大小的缓冲区。负值表示使用系统默认值，即完全缓冲的输入。

Python docs 包含此注释：

注意：如果您遇到性能问题，建议您尝试通过将 bufsize 设置为 -1 或大的来启用缓冲足够的正值（例如 4096）。

【讨论】：

带缓冲区的结果：逐行= 2.41 无缓冲= 30.90 带缓冲区= 31.78
FWIW 这为我的特定用例带来了 8 倍的加速，非常感谢！

【解决方案3】：

我完全不明白这种行为。

import subprocess
import time


executable = ["cat", "data"]

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
tmp = []
for line in (iter(proc.stdout.readline,b'')):
    tmp.append(line)
print("line by line = %.2f"%(time.clock()-tic))

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
fullFile=proc.stdout.read()
print("slurped = %.2f"%(time.clock()-tic))

数据是文本。

pts/0$ ll data
-rw-r--r-- 1 javier users 18M feb 21 20:53 data

pts/0$ wc -l data
169866 data

结果：

pts/0$ python3 a.py 
line by line = 0.08
slurped = 0.01

Python 2 比 Python 3 慢得多！

pts/0$ python2 a.py 
line by line = 4.45
slurped = 0.01

也许取决于子流程？

【讨论】：

bufsize 在 Python 2 和 3 中有不同的默认值。如果显式设置；你应该在两个版本中得到相似的结果。使用timeit.default_timer 而不是time.clock() 以获得可移植性。
很高兴知道。谢谢！

【解决方案4】：

我的 bufsize 结果参差不齐，我运行了一个记录回复的连续 ping 脚本，我需要它不间断地运行，这将每隔几天挂起，我的解决方案是编写一个单独的脚本来观察tasklist 并终止任何耗时超过 10 秒的 ping 任务。见下文

import subprocess
import time

CREATE_NO_WINDOW = 0x08000000
previous_id = ''

while 0!=1:
    command = subprocess.Popen(['tasklist'], stdout=subprocess.PIPE, 
              shell=False, creationflags = CREATE_NO_WINDOW)
    reply = str(command.communicate()[0]).split('Ko')
    for item in reply:
        if 'PING.EXE' in item:
            print(item.split(' ')[0][4:]+' '+item.split(' ')[22])
        if item.split(' ')[22] != previous_id:
            previous_id = item.split(' ')[22]
            print('New ping detected, system is healthy')
        else:
            print('Same ping active for 10 seconds, killing')
            command = subprocess.Popen(['taskkill','/f','/im','PING.EXE'], stdout=subprocess.PIPE, shell=False, creationflags = CREATE_NO_WINDOW)
            err_log=open('errors.txt','w')
    time.sleep(10)

这是并行运行的，两个进程同时挂起的可能性很小。您需要做的就是在主脚本中捕获因管道丢失而导致的任何错误。

【讨论】：