【问题标题】:How to use dataset.map() method with Tensorflow 2 and nested lists?如何将 dataset.map() 方法与 Tensorflow 2 和嵌套列表一起使用?
【发布时间】:2020-01-05 19:31:36
【问题描述】:

我正在尝试将 tf.data.Dataset.map 方法应用于我的数据集,在使用 Tensorflow 2 的图像字幕上下文中。

运行以下代码

def map_func(img_name, cap_train, target):
  img_tensor = np.load(img_name+'.npy')
  return img_tensor, cap_train, target

train = [
  [
    'image_1.jpg',
    [[0,1,2], [1,2,3], [2,3,4]],
    [[3], [4], [5]]
  ],
  [
    'image_2.jpg',
    [[5,6,7], [6,7,8], [7,8,9]],
    [[8], [9], [10]]
  ],
  ...
]

dataset = tf.data.Dataset.from_tensor_slices(train)

# Use map to load the numpy files in parallel
dataset = dataset.map(lambda img_name, cap_train, target: tf.numpy_function(
          map_func, [img_name, cap_train, target], [tf.float32, tf.int32, tf.int32]),
          num_parallel_calls=tf.data.experimental.AUTOTUNE)

将返回ValueError: Can't convert Python sequence with mixed types to Tensor. 我猜这与数据集中的每个元素的形式为[image, list of integers, list of integers] 的事实有关,但我使用tf.float32, tf.int32, tf.int32 映射它,它没有考虑嵌套列表。

我应该如何修改dataset.map() 方法以使用上述train 数据集?

谢谢

【问题讨论】:

标签: python tensorflow neural-network tensorflow2.0


【解决方案1】:

您可以使用from_generator() 来做到这一点

def map_func(img_name, cap_train, target):
  img_tensor = np.load(img_name.decode('ascii')+'.npy').astype(np.float32)
  return img_tensor, cap_train, target


def gen():

  train = [
    [
      'image_1.jpg',
      [[0,1,2], [1,2,3], [2,3,4]],
      [[3], [4], [5]]
    ],
    [
      'image_2.jpg',
      [[5,6,7], [6,7,8], [7,8,9]],
      [[8], [9], [10]]
    ],

  ]  
  for item in train:
    yield item[0], item[1], item[2]

dataset = tf.data.Dataset.from_generator(gen, (tf.string, tf.int32, tf.int32))#, (tf.TensorShape([]), tf.TensorShape([3,3]), tf.TensorShape([3,1])))

# Use map to load the numpy files in parallel
dataset = dataset.map(lambda img_name, cap_train, target: tf.numpy_function(
          map_func, [img_name, cap_train, target], [tf.float32, tf.int32, tf.int32])).batch(1)

for item in iter(dataset):
  print(item)

编辑:产生批量数据,

def map_func(img_name, cap_train, target):  
  img_tensor = np.stack([np.load(img.decode('ascii')+'.npy').astype(np.float32) for img in img_name])
  return img_tensor, cap_train, target


def gen(batch_size):

  train = [
    [
      'image_1.jpg',
      [[0,1,2], [1,2,3], [2,3,4]],
      [[3], [4], [5]]
    ],
    [
      'image_2.jpg',
      [[5,6,7], [6,7,8], [7,8,9]],
      [[8], [9], [10]]
    ],

  ]  
  grp1, grp2, grp3 = zip(*train)
  for i in range(0, len(train), batch_size):
    yield grp1[i*batch_size:(i+1)*batch_size], grp2[i*batch_size:(i+1)*batch_size], grp3[i*batch_size:(i+1)*batch_size]

dataset = tf.data.Dataset.from_generator(gen, (tf.string, tf.int32, tf.int32), 
                                         (tf.TensorShape([None]), tf.TensorShape([None, 3,3]), tf.TensorShape([None, 3,1])), args=[2])

# Use map to load the numpy files in parallel
dataset = dataset.map(lambda img_name, cap_train, target: tf.numpy_function(
          map_func, [img_name, cap_train, target], [tf.float32, tf.int32, tf.int32]))

【讨论】:

  • 感谢您的反馈。有什么方法可以在不使用 1 的批量大小的情况下完成这项工作?
  • @crash,编辑了我的答案。不幸的是,您不能为此目的使用.batch()
猜你喜欢
  • 2017-06-30
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2013-03-22
  • 1970-01-01
  • 1970-01-01
  • 2019-03-17
  • 2020-06-27
相关资源
最近更新 更多