如何在张量流中实现动态增强？答案

【问题标题】：How Implement on the fly augmentation in the tensorflow?如何在张量流中实现动态增强？
【发布时间】：2021-05-09 19:34:25
【问题描述】：

我想使用 tensorflow 模型在 3D 数据集中实现增强。

增广函数是这样的：

    def augmentation(img, label):
            
            p = .5
            print('augmentation')
            
            if random.random() > p:
                img = tf.numpy_function(augment_noise, [img], tf.double)
                
            if random.random() > p:
                img = tf.numpy_function(flip_x, [img], tf.double)
    
            if random.random() > p:
                img = tf.numpy_function(augment_scale, [img], tf.double)
    
            if random.random() > p:
                img = tf.numpy_function(distort_elastic_cv2, [img], tf.double)    
            
           
            img = tf.image.convert_image_dtype(img, tf.float32)
            
            return img, label

tensorflow 中没有实现增广函数。

使用该功能的Tensorflow代码如下：

ds_train = tf.data.Dataset.from_tensor_slices((image_train, label_train))
ds_valid = tf.data.Dataset.from_tensor_slices((image_val, label_val))


batch_size = 16
repeat_count = int((1000 * batch_size)/len(image_train))
# AUTOTUNE =  tf.data.experimental.AUTOTUNE # tf.data.AUTOTUNE
AUTOTUNE = 16

# Augment the on the fly during training.
ds_train = (
    ds_train.shuffle(len(ds_train)).repeat(repeat_count)
    .map(augmentation, num_parallel_calls=AUTOTUNE)
    .batch(batch_size)
    .prefetch(buffer_size=AUTOTUNE)
)


ds_valid = (
    ds_valid.batch(batch_size)
    .prefetch(buffer_size=AUTOTUNE)
)

initial_epoch = 0
epochs = 1000
H = model.fit(ds_train, validation_data=ds_valid,initial_epoch=initial_epoch,
               epochs = epochs,
              callbacks = chkpts, use_multiprocessing=False, workers=1, verbose=2)

我想在每个 epoch 从数据集中随机选择大约 1000 个批次，然后对它们进行扩充。我计算 repeat_count 以创建 1000 个大小为 batch_size 的批次。

问题是我不知道每个时期中的模型调用增强功能并将其暗示到批次的每个图像（我的意思是每个时期中有 161000 个图像），所以我在中添加了 print augmentation 函数，它只打印一次，而不是在每个时期或每个图像中。增广函数是否在每个 epoch 调用 161000 次？

此外，cpu 和 gpu 的利用率在每次运行代码时都会有所不同。有时 cpu 的利用率约为 25%，gpu 为 30，但几乎在运行中它是 100% 和 5。

如何解决这两个问题？

【问题讨论】：

标签： python tensorflow keras image-augmentation

【解决方案1】：

你的字符串被打印一次，因为它调用一次来制作一个 Tensorflow 图。如果您使用tf.print 打印，它将成为图表的一部分，因此每次都会打印。

复制/粘贴：

import tensorflow as tf
import matplotlib.pyplot as plt
from sklearn.datasets import load_sample_image
import numpy as np
import random

imgs = np.stack([load_sample_image('flower.jpg') for i in range(4*4)], axis=0)

def augmentation(img):      
        p = .5
        tf.print('augmentation successful!')  
        img = tf.image.convert_image_dtype(img, tf.float32)
        return img 
    

ds_train = tf.data.Dataset.from_tensor_slices(imgs)


batch_size = 16
repeat_count = 10
AUTOTUNE = 16

ds_train = (
    ds_train.shuffle(len(ds_train)).repeat(repeat_count)
    .map(augmentation, num_parallel_calls=AUTOTUNE)
    .batch(batch_size)
    .prefetch(buffer_size=AUTOTUNE)
)

for i in ds_train:
    pass

augmentation successful!
augmentation successful!
augmentation successful!
augmentation successful!
augmentation successful!
augmentation successful!
augmentation successful!
augmentation successful!
augmentation successful!

【讨论】：