如何将图像保存为 h5py 文件？答案

【问题标题】：How to save images as h5py file?如何将图像保存为 h5py 文件？
【发布时间】：2019-01-04 06:51:48
【问题描述】：

我有一个火车文件夹。这个文件夹有 2000 张不同大小的图像。我也有labels.csv 文件。在训练网络时，加载和调整这些图像的大小非常耗时。所以我已经阅读了一些关于 h5py 的论文，这是针对这种情况的解决方案。我尝试了以下代码：

PATH = os.path.abspath(os.path.join('Data'))
SOURCE_IMAGES = os.path.join(PATH, "Train")
print "[INFO] images paths reading"
images = glob(os.path.join(SOURCE_IMAGES, "*.jpg"))
images.sort()
print "[INFO] image labels reading"
labels = pd.read_csv('Data/labels.csv')

train_labels=[]

for i in range(len(labels["car"])):

    if(labels["car"][i]==1.0):

        train_labels.append(1.0)
    else:

        train_labels.append(0.0)

data_order = 'tf' 

if data_order == 'th':
    train_shape = (len(images), 3, 224, 224)
else:
    train_shape = (len(images), 224, 224, 3
print "[INFO] h5py file created"

hf=h5py.File('data.hdf5', 'w')

hf.create_dataset("train_img",
                  shape=train_shape,
                  maxshape=train_shape,
                  compression="gzip",
                  compression_opts=9)

hf.create_dataset("train_labels",
            shape=(len(train_labels),),
            maxshape=(None,),
            compression="gzip",
            compression_opts=9)

hf["train_labels"][...] = train_labels


print "[INFO] read and size images"
for i,addr in enumerate(images):

    s=dt.datetime.now()
    img = cv2.imread(images[i])
    img = cv2.resize(img, (224, 224), interpolation=cv2.INTER_CUBIC)
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

    hf["train_img"][i, ...] = img[None]
    e=dt.datetime.now()
    print "[INFO] image",str(i),"is saved time:", e-s, "second"

hf.close()

但是当我运行这段代码时。代码运行时间。起初它非常快，但后来读取非常慢，尤其是在这一行 hf["train_img"][i, ...] = img[None]。这里是这个程序的输出。如您所见，时间在不断增加。我在哪里做错了？感谢您的建议。

【问题讨论】：

您的时间分析包含读取和转换图像的时间。如果图像大小不同，这可能是一个耗时的问题。也许后面的图像只是变大了，因此需要更长的时间来阅读和转换？
不，我确定这与图像大小无关。我试了一下，发现 "hf["train_img"][i, ...] = img[None] 行正在等待。
如果您使用了错误的 chunk_shape（例如选择 (1,3,224,224)）并且 chunk_cache_size 不足，这是典型的行为。看看stackoverflow.com/a/48405220/4045774 stackoverflow.com/a/44961222/4045774

标签： python image-processing deep-learning hdf5 h5py

【解决方案1】：

train_img 是用compression_opts=9 创建的。这是最高的压缩级别，需要最多的工作来压缩/解压缩。

如果压缩图像的时间是一个瓶颈，您可以用它来换取一些空间，请使用较低的压缩级别，如默认 (=4)。甚至完全禁用压缩。

【讨论】：