tensorflow cifar10 代码修改用于读取图像答案

【问题标题】：tensorflow cifar10 code modification for reading imagestensorflow cifar10 代码修改用于读取图像
【发布时间】：2016-04-11 18:06:27
【问题描述】：

我正在尝试修改 cifar10.py 的代码，以便能够将图像提供给网络。

我实际上能够运行代码并开始训练过程，但一段时间后，如果我运行 tensorboard，在“图像”部分下我总是有相同的图像。此外，交叉熵变为零。我认为我加载的图片有误。

这是代码

   def distorted_inputs():
   #Reading the dirs file where all the directories of the images are stored
   filedirs = [line.rstrip('\n') for line in open('image_dirs.txt')]

   #create a list of files 
   filenames = []
   i = 0

   for f in filedirs:   
      png_files_path = glob.glob(os.path.join(f, '*.[pP][nN][gG]')) 
      print('found ' + str(len(png_files_path)) + ' files in ' + f)
      for filename in png_files_path:
         #storing file_name label
         s = filename + " " + str(i)
         filenames.append(s)
      i = i+1

   # Create a queue that produces the filenames to read and the labels
   filename_queue = tf.train.string_input_producer(filenames)

   my_img, label = read_my_file_format(filename_queue.dequeue())         
   label = tf.string_to_number(label, tf.int32)
   init_op = tf.initialize_all_variables()
   with tf.Session() as sess:
      sess.run(init_op)

      # Start populating the filename queue.
      coord = tf.train.Coordinator()
      threads = tf.train.start_queue_runners(coord=coord)

      image = my_img.eval()

      coord.request_stop()
      coord.join(threads)

   reshaped_image = tf.cast(image, tf.float32)

   resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,IMAGE_SIZE, IMAGE_SIZE)

   distorted_image = tf.image.random_crop(reshaped_image, [24, 24])

   # Randomly flip the image horizontally.
   distorted_image = tf.image.random_flip_left_right(distorted_image)

   # Because these operations are not commutative, consider randomizing
   # randomize the order their operation.
   distorted_image = tf.image.random_brightness(distorted_image,max_delta=63)
   distorted_image = tf.image.random_contrast(distorted_image,lower=0.2, upper=1.8)

   # Subtract off the mean and divide by the variance of the pixels.
   float_image = tf.image.per_image_whitening(distorted_image)

   # Ensure that the random shuffling has good mixing properties.
   min_fraction_of_examples_in_queue = 0.4
   min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *min_fraction_of_examples_in_queue)
   print ('Filling queue with ITSD images before starting to train. ''This will take a few minutes.')

   # Generate a batch of images and labels by building up a queue of examples.
   return _generate_image_and_label_batch(float_image, label, min_queue_examples)

图片读取部分来自https://github.com/HamedMP/ImageFlow 自定义阅读器来自Tensorflow read images with labels，相关函数实现如下

 def read_my_file_format(filename_and_label_tensor):
  """Consumes a single filename and label as a ' '-delimited string.

  Args:
    filename_and_label_tensor: A scalar string tensor.

  Returns:
    Two tensors: the decoded image, and the string label.
  """
  filename, label = tf.decode_csv(filename_and_label_tensor, [[""], [""]], " ")

  file_contents = tf.read_file(filename)
  example = tf.image.decode_png(file_contents)
  return example, label

谢谢

【问题讨论】：

当你说一段时间后，它是多长时间？您需要知道 cifar10_train 仅每 100 步更新一次图像。
@jkschin 我知道，还是谢谢你，

标签： machine-learning classification deep-learning tensorflow

【解决方案1】：

您可以使用我创建的这段代码来解决我的分类问题：

        resized_image = cv2.resize(image, (WIDTH, HEIGHT))
        label = np.uint8(nclass)

        arr = np.uint8([0 for x in range(image_bytes)])
        #  fill the label:
        arr[0] = label
        arr_cnt = 1

        #  fill the image (row-major order). first R values, then G values then B values
        for y in range(0, HEIGHT):
            for x in range(0, WIDTH):
                arr[arr_cnt] = np.uint8(resized_image[x, y, 2])  # R
                arr[arr_cnt + 1024] = np.uint8(resized_image[x, y, 1])  # G
                arr[arr_cnt + 2048] = np.uint8(resized_image[x, y, 0])  # B

                arr_cnt += 1

        print "train arr:", arr[0], arr[3072]
        train_arr = np.append(train_arr, arr)
        #print train_arr[file_in_dir*3073]
    else:
        invalids_cnt += 1
        #print "image", files_in_dir[file_in_dir], "is invalid"

    #  Write array to train.bin file:
with open('data_batch_%d.bin' % nclass, 'wb') as f:
        f.write(train_arr)
        f.close()

这里，调整大小的图像是一个输入图像“图像”的调整大小版本。接下来，我创建一个 3073 字节的数组：第一个字节 = 标签，接下来的 1024 字节 = 图片的红色值，接下来的 1024 字节 = 图片的绿色值，接下来的 1024 字节 = 图片的蓝色值。

我对每个输入图像执行此操作，然后将其连接成一个大二进制数组，该数组写入二进制文件“data_batch_%d”

我已在此要点中发布了我的完整脚本（出于一般目的可能更难理解）：gist

【讨论】：

感谢您的回答。我一直在寻找一种方法来加载图像格式，而不是将它们转换为二进制然后将它们提供给网络。不过，对于我的项目，我最终采用了与您相同的解决方案。
我相信将它们转换为二进制格式会提高模型的性能，因为它是一种更“原始”的格式。