Python中检测零填充字节数组的最快方法答案

【问题标题】：Fastest method in Python to detect zero-filled bytearrayPython中检测零填充字节数组的最快方法
【发布时间】：2018-08-19 06:06:49
【问题描述】：

我的代码以 2MB 块读取二进制磁盘映像文件并将每个块保存为单独的文件。

我唯一的特殊要求是如果它包含全零，则跳过保存块；这一切都是为了速度和效率。我担心我目前使用 .count() 的方法可能不是最有效的：

with open("source.img", "rb") as src:
  for addr in range(0, sourcesize, chunksize):
    buf = src.read(chunksize)
    with open("imgdir/"+hex(addr), "wb") as dest:
      if len(buf) > buf.count(b"\x00"): # <---this concerns me
        dest.write(buf)

实践中的表现乏善可陈。我知道 Python 不是为速度而设计的，但它是否提供了更好的选择？也许是一个函数在缓冲区中找到“除 x00 之外的任何内容”，平均而言，它应该以更少的迭代次数更早地返回？

【问题讨论】：

标签： arrays string search binary zero

【解决方案1】：

在下面的测试循环中，当直接将工作缓冲区与零缓冲区进行比较时，我能够将执行时间减少约 25%。我选择这种方式是因为它会导致 Python 在多次迭代中到达缓冲区末尾之前停止检查：

sourcesize = 2**31 # 2GB
chunksize = 2**21 # 2MB
zeros=bytes(chunksize)

with open("source.img","rb") as source:
  for addr in range(0,sourcesize,chunksize):
    with open("/dev/null", "wb") as dest:
      buf=source.read(chunksize)
      #if len(buf) > buf.count(b"\x00"): # old comparison
      if buf != zeros: # <-faster comparison
        dest.write(buf)

这与测试命令dd if=source.img of=/dev/null bs=2M conv=sparse 获得几乎相同的结果，后者具有非常相似的行为，包括检查以跳过全为零的块。因为我假设dd 是用 C 编写的，所以我觉得这是一个很好的结果。

【讨论】：