在 Python 中对多页 TIFF 页面进行平均答案

【问题标题】：Averaging over multipage TIFF pages in Python在 Python 中对多页 TIFF 页面进行平均
【发布时间】：2014-06-30 10:43:20
【问题描述】：

将多帧 16 位 TIFF 图像作为 numpy 数组获取平均值的最快/内存有效方法是什么？

到目前为止，我想出的是下面的代码。令我惊讶的是，方法 2 比方法 1 快。

但是，对于分析永远不要假设，测试它！所以，我想测试更多。值得尝试Wand？我没有在这里包括，因为在安装 ImageMagick-6.8.9-Q16 和 MAGICK_HOME env var 之后它仍然没有导入... Python 中的多页 tiff 的任何其他库？ GDAL 可能有点过分了。

（编辑）我包含了 libtiff。仍然是方法 2 最快且内存效率很高。

from time import time

#import cv2  ## no multi page tiff support
import numpy as np
from PIL import Image
#from scipy.misc import imread  ## no multi page tiff support
import tifffile # http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html
from libtiff import TIFF # https://code.google.com/p/pylibtiff/

fp = r"path/2/1000frames-timelapse-image.tif"

def method1(fp):
    '''
    using tifffile.py by Christoph (Version: 2014.02.05)
    (http://www.lfd.uci.edu/~gohlke/code/tifffile.py.html)
    '''
    with tifffile.TIFFfile(fp) as imfile:
        return imfile.asarray().mean(axis=0)


def method2(fp):
    'primitive peak memory friendly way with tifffile.py'
    with tifffile.TIFFfile(fp) as imfile:

        nframe, h, w = imfile.series[0]['shape']
        temp = np.zeros( (h,w), dtype=np.float64 )

        for n in range(nframe):
            curframe = imfile.asarray(n)
            temp += curframe

        return (temp / nframe)


def method3(fp):
    ' like method2 but using pillow 2.3.0 '
    im = Image.open(fp)

    w, h = im.size
    temp = np.zeros( (h,w), dtype=np.float64 )

    n = 0
    while True:
        curframe = np.array(im.getdata()).reshape(h,w)
        temp += curframe
        n += 1
        try:
            im.seek(n)
        except:
            break

    return (temp / n)


def method4(fp):
    '''
    https://code.google.com/p/pylibtiff/
    documentaion seems out dated.
    '''

    tif = TIFF.open(fp)
    header = tif.info()

    meta = dict()  # extracting meta
    for l in header.splitlines():
        if l:
            if l.find(':')>0:
                parts = l.split(':')
                key = parts[0]
                value = ':'.join(parts[1:])
            elif l.find('=')>0:
                key, value =l.split('=')
            meta[key] = value    

    nframes = int(meta['frames'])
    h = int(meta['ImageLength'])
    w = int(meta['ImageWidth'])

    temp = np.zeros( (h,w), dtype=np.float64 )

    for frame in tif.iter_images():
        temp += frame

    return (temp / nframes)

t0 = time()
avgimg1 = method1(fp)
print time() - t0
# 1.17-1.33 s

t0 = time()
avgimg2 = method2(fp)
print time() - t0
# 0.90-1.53 s  usually faster than method1 by 20%

t0 = time()
avgimg3 = method3(fp)
print time() - t0
# 21 s

t0 = time()
avgimg4 = method4(fp)
print time() - t0
# 1.96 - 2.21 s  # may not be accurate. I got warning for every frame with the tiff file I tested.

np.testing.assert_allclose(avgimg1, avgimg2)
np.testing.assert_allclose(avgimg1, avgimg3)
np.testing.assert_allclose(avgimg1, avgimg4)

【问题讨论】：

pylibtiff 还可以让您在多页 TIFF 文件中的页面上进行迭代，PIL.ImageSequence 也是如此。
除非你真的有很多帧，而且它们非常小，否则在 Python 中循环帧不会成为你运行时间的重要因素。正如您从方法 1 中了解到的那样，将所有帧一次写入内存会变得更慢，即使循环随后发生在 C 中。我认为您不会找到比您的方法 2 更好的工作。
我应该试试 pylibtiff。那么method2似乎已经足够好了。但这是我不确定从其他人那里听到的好消息。谢谢！
我发现 pylibtiff 对我的文件有小问题。文档和项目已过时（最近一次更新是 4 年前？）并且 API 与文档不匹配。而且，对于从 MicroManager（基于 ImageJ 的应用程序）收集的 tiff 文件的每一帧，我都会得到“遇到标记为 51123 (0xc7b3) 的未知字段”。
最简单的方法是从命令行使用 ImageMagick 像这样magick image.tif -evaluate-sequence mean result.tif 如果你想要最快的方法，或者最节省内存的方法，你需要提供有代表性的样本。

标签： python image-processing numpy tiff

【解决方案1】：

简单的逻辑会让我把钱押在方法 1 或 3 上，因为方法 2 和 4 中有 for 循环。 For-loops 如果你有更多的输入，总是让你的代码变慢。

我肯定会选择方法 1：整洁、清晰易读...

为了确定，我会说只是测试它们。如果你不想测试，我会选择方法一。

亲切的问候，

【讨论】：