如何为图像创建双输入 TPU 模型？答案

【问题标题】：How to create a bi-input TPU model for images?如何为图像创建双输入 TPU 模型？
【发布时间】：2020-06-27 08:13:08
【问题描述】：

我想将我的 GPU 模型转换为 TPU 模型。我的 GPU 模型采用两个输入图像，并且两个图像具有相同的输出。我为此使用自定义数据生成器。有两个并行网络；每个输入一个。

从这个StackOverflow question，我试图解决这个问题，但我失败了。这是我尝试过的

dataset_12 = tf.data.Dataset.from_tensor_slices((left_train_paths, right_train_paths))
dataset_label = tf.data.Dataset.from_tensor_slices(train_labels) 
dataset = tf.data.Dataset.zip((dataset_12, dataset_label)).batch(2).repeat()

我面临的问题是我无法解码双输入图像。这是解码器功能

def decode_image(filename, label=None, image_size=(IMG_SIZE_h, IMG_SIZE_w)):
    bits = tf.io.read_file(filename)
    image = tf.image.decode_jpeg(bits, channels=3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.image.resize(image, image_size)
    
    #convert to numpy and do some cv2 staff mb?
    
    if label is None:
        return image
    else:
        return image, label

问题是我无法同时将两个图像传递给解码器函数。我该如何解决这个问题？

我也尝试按以下方式解码图像

 def decode(img,image_size=(IMG_SIZE_h, IMG_SIZE_w)):
    bits = tf.io.read_file(img)
    image = tf.image.decode_jpeg(bits, channels=3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.image.resize(image, image_size)
    return image
def decode_image(left, right,labels=None ):
    if labels is None:
        return decode(left),decode(right)
    else:
        return decode(left),decode(right),labels 
    
image=tf.data.Dataset.from_tensor_slices((left_train_paths,right_train_paths,train_labels ))
dataset=image.map(decode_image, num_parallel_calls=AUTO).repeat().shuffle(512).batch(BATCH_SIZE).prefetch(AUTO)
dataset

dataset 变量的输出现在为 <PrefetchDataset shapes: ((None, 760, 760, 3), (None, 760, 760, 3), (None, 8)), types: (tf.float32, tf.float32, tf.int64)>

我现在如何将它传递给模型？

型号

def get_model():
    
    left_tensor = Input(shape=(IMG_SIZE_h,IMG_SIZE_w,3))
    right_tensor = Input(shape=(IMG_SIZE_h,IMG_SIZE_w,3))

    left_model =  EfficientNetB3(input_shape =  (img_shape,img_shape,3), include_top = False, weights = 'imagenet',input_tensor=left_tensor)
    right_model = EfficientNetB3(input_shape =  (img_shape,img_shape,3), include_top = False, weights = 'imagenet',input_tensor=right_tensor)
    con = concatenate([left_model.output, right_model.output])
    GAP= GlobalAveragePooling2D()(con)
    out = Dense(8, activation = 'sigmoid')(GAP)
    model =Model(inputs=[left_input, right_input], outputs=out)

    return model

【问题讨论】：

我知道你已经解决了解码两个图像的问题，不是吗？至少数据集的形状看起来不错。您提到您想一次向模型提供两批图像，但除非您显示当前模型或至少您模型的输入层，否则我们无法帮助您。
@Guillem 我已经更新了问题，请检查型号
我认为问题在于model.fit(dataset,.... 应该是两个有一批训练图像和一批相应的标签。但是我有两批训练数据

标签： tensorflow keras tpu

【解决方案1】：

我找到了一个非常优雅的解决方案。我会一步一步解释，因为可能和你想的有点不同：

解码图像时，将两个图像堆叠在一个张量中，因此输入张量的形状为 [2, IMAGE_H, IMAGE_W, 3]

def decode_single(im_path, image_size):
    bits = tf.io.read_file(im_path)
    image = tf.image.decode_jpeg(bits, channels=3)
    image = tf.cast(image, tf.float32) / 255.0
    image = tf.image.resize(image, image_size)
    return image

# Note that the image paths are packed in a tuple, and we unpack them inside the function
def decode(paths, label=None, image_size=(128, 128)):
    image_path1, image_path2 = paths
    im1 = decode_single(image_path1, image_size)
    im2 = decode_single(image_path2, image_size)
    images = tf.stack([im1, im2])

    if label is not None:
        return images, label

    return images

我声明了数据管道，因此路径被打包在一个元组中。

label_ds = ...
ds = tf.data.Dataset.from_tensor_slices((left_paths, right_paths))
ds = tf.data.Dataset.zip((ds, label_ds)) # returns as ((im_path1, im_path2), label)) not (im_path1, im_path2, label)
ds = ds.map(decode).batch(4)
print(ds)
# Out: <BatchDataset shapes: ((None, 2, 128, 128, 3), ((None,),)), types: (tf.float32, (tf.int32,))>

因为我们要分批输入两张图像（无、2、128、128、3）。使用形状 (2, HEIGHT, WIDTH, 3) 的单个输入声明模型，然后我们将输入拆分为两个图像：

def get_model():
    input_layer = Input(shape=(2, IMAGE_H,IMAGE_W,3))
    # Split into two images
    right_image, left_image = Lambda(lambda x: tf.split(x, 2, axis=1))(input_layer)
    
    right_image = Reshape([IMAGE_H, IMAGE_W, 3])(right_image)
    left_image = Reshape([IMAGE_H, IMAGE_W, 3])(left_image)
    # Replace by EfficientNets
    left_model =  Conv2D(64, 3)(left_image)
    right_model = Conv2D(64, 3)(right_image)
    con = Concatenate(-1)([left_model, right_model])
    GAP = GlobalAveragePooling2D()(con)
    out = Dense(8, activation = 'sigmoid')(GAP)
    model = tf.keras.Model(inputs=input_layer, outputs=out)

    return model

最后像往常一样编译和训练模型：

model = get_model()
model.compile(...)
model.fit(ds, epochs=10)

【讨论】：

只需添加 lambda 层即可解决问题，并将 decode(left),decode(right) 包含在列表中。非常感谢