如何从图像文件夹创建预取数据集？答案

【问题标题】：How do I create a prefetch dataset from a folder of images?如何从图像文件夹创建预取数据集？
【发布时间】：2021-11-03 21:32:11
【问题描述】：

我正在尝试将来自 Kaggle 的数据集输入到来自 Tensorflow 文档的 notebook 中，以训练 CycleGAN 模型。我目前的方法是将文件夹下载到我的笔记本中并遍历每个图像的路径并使用 cv2.imread(path) 将 uint8 图像数据添加到列表中。但这不起作用，我知道我目前的方法是错误的，因为 google 提供的代码需要 Prefetch 数据集。

这是我当前的代码（不包括 opencv 部分）

import os

# specify the img directory path
art_path = "/content/abstract-art-gallery/Abstract_gallery/Abstract_gallery/" 
land_path = "/content/landscape-pictures/"

def grab_path(folder, i_count=100):
  res = []
  for file in range(i_count):
      if os.listdir(folder)[0].endswith(('.jpg', '.png', 'jpeg')):
          img_path = folder + os.listdir(folder)[0]
          res.append(img_path)
  return res
art_path, land_path = grab_path(art_path), grab_path(land_path)
print(art_path)
print(land_path)

代码中的错误出现在这里：

train_horses = train_horses.cache().map(
    preprocess_image_train, num_parallel_calls=AUTOTUNE).shuffle(
    BUFFER_SIZE).batch(BATCH_SIZE)

有没有更简单的方法来解决这个问题？

【问题讨论】：

标签： image tensorflow image-processing tensorflow-datasets kaggle

【解决方案1】：

      import pathlib 
      import tensorflow as tf
      import numpy as np


      @tf.autograph.experimental.do_not_convert
      def read_image(path):
         image_string = tf.io.read_file(path)
         image = DataUtils.decode_image(image_string,(image_size))
         return image
        
      AUTO = tf.data.experimental.AUTOTUNE

      paths = np.array([x for x in pathlib.Path(IMAGE_PATHS_DIR).rglob('*.jpg')])
      dataset = tf.data.Dataset.from_tensor_slices((paths.astype(str)))
      dataset = dataset.map(self.read_image)
      dataset = dataset.shuffle(2048)
      dataset = dataset.prefetch(AUTOTUNE)

【讨论】：