如何使用 tf.data.Dataset 包含文件答案

【问题标题】：how to include files with tf.data.Dataset如何使用 tf.data.Dataset 包含文件
【发布时间】：2020-08-04 07:17:32
【问题描述】：

我正在训练人脸识别模型，因此对于 Triplet Loss，我必须生成批次，使其包含来自每个标签的固定数量的图像。例如。我的意思是每次生成批次进行训练时从 3 个随机标签中获取 8 张图像，正如 Github Issue 中所建议的那样。

在我的数据集文件夹中我有一个子文件夹，它被重命名为标签并包含该文件夹的图像。

在给定的问题中，提出了解决方案，

import numpy as np
import cv2

num_labels = len(path_list)
num_classes_per_batch = 3
num_images_per_class = 8

image_dirs = ["/content/drive/My Drive/smalld_processed/train/{:d}".format(i) for i in 
range(num_labels)]

## Create the list of datasets creating filenames

#datasets = [tf.data.Dataset.list_files(f"{image_dir}/*.jpg" for image_dir in image_dirs)]
datasets = [tf.data.Dataset.list_files(f"{image_dir}/*.jpg") for image_dir in image_dirs]
adk = ["{}/*.jpg".format(image_dir) for image_dir in image_dirs]
print(adk)

def generator():
    while True:
     # Sample the labels that will compose the batch
      labels = np.random.choice(range(num_labels),
                               num_classes_per_batch,
                               replace=False)
      for label in labels:
          for _ in range(num_images_per_class):
              yield label

choice_dataset = tf.data.Dataset.from_generator(generator, tf.int64)
dataset = tf.data.experimental.choose_from_datasets(datasets, choice_dataset)

## Now you read the image content
def load_image(filename):
    image = cv2.imread(filename,1)
    image = dataset.map(image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
    image = image[...,::-1]
    label = int(os.path.split(os.path.dirname(filename))[1])
    image=dataset1.append()
    label=dataset2.append
    return image, label
   
dataset = dataset.map(load_image, num_parallel_calls=tf.data.experimental.AUTOTUNE)
batch_size = num_classes_per_batch * num_images_per_class
dataset = dataset.batch(batch_size)
dataset = dataset.prefetch(None)

有了这个我无法加载图像，它显示了这个错误。

    SystemError: <built-in function imread> returned NULL without setting an error

您能否帮助我修复错误或有关如何加载图像的任何其他建议。提前致谢！！

【问题讨论】：

标签： python tensorflow opencv deep-learning path

【解决方案1】：

我认为在这种情况下，您的 cv2.imread 正在发挥作用。我将首先构建一个简单的程序，它不会“即时”进行读取，而是预先加载图像以在小型数据集上进行训练。

你也觉得你在滥用 dataset.map 函数。我会推荐这个关于 tf.data.Dataset 函数的教程：http://tensorexamples.com/2020/07/27/Using-the-tf.data.Dataset.html，也许这个关于增强的教程，这样你就可以看到你应该如何正确使用 map 函数：http://tensorexamples.com/2020/07/28/Augmentation.html。

祝你好运！

【讨论】：