张量流推理期间的内存溢出答案

【问题标题】：Memory overflow during inference in tensorflow张量流推理期间的内存溢出
【发布时间】：2019-09-23 09:45:21
【问题描述】：

我编写了这些函数来使用经过训练的二元分类器保存的权重进行推理。我有大约 120k 图像进行推断。但是 GPU 在达到 82k 图像后冻结。请问有什么我需要在我的代码中修复以解决此内存问题。模型是否可以在每个推理实例期间保存前向传递节点的检查点？请各位朋友，我急需解决这个问题，我有大量文件需要紧急排序。

def fully_frozen_MobileNetV2(inference = False, n_class=2):
    image_size = 192
    image_channels = 3
    IMG_SHAPE = (image_size, image_size, image_channels)

    # Create the base model from the pre-trained model MobileNet V2
    base_model = tf.keras.applications.MobileNetV2(input_shape=IMG_SHAPE,
                                               include_top=False,
                                               weights='imagenet')

    base_model.trainable = False

    #create the top layers
    global_average_layer = tf.keras.layers.GlobalAveragePooling2D()
    prediction_layer = tf.keras.layers.Dense(n_class)

    #add the top layers 
    model_fully_frozen = tf.keras.Sequential([
      base_model,
      global_average_layer,
      prediction_layer
    ])

    if inference:
        optimizer = tf.keras.optimizers.Adam(learning_rate = 0.001) 
   compute_loss=tf.keras.losses.CategoricalCrossentropy(from_logits=True) 
        compute_accuracy = tf.keras.metrics.CategoricalAccuracy()
        model_fully_frozen.compile(optimizer, loss=compute_loss, metrics= 
    [compute_accuracy])

    return model_fully_frozen



def read_and_preprocess_single_image_from_path(single_path, 
    image_size=192, image_channels=3):
    #first read the image file
    img_raw = tf.io.read_file(single_path)
    image = tf.io.decode_jpeg(img_raw, channels=image_channels)
    image = tf.image.resize(image, [image_size, image_size])
    image /= 255.0  # normalize to [0,1] range
    return image


def get_path_list(path_to_image_folder):
    data_root = pathlib.Path(path_to_image_folder)
    #merge the folders, create a list of image paths and randomize
    image_paths = list(data_root.glob('*/*'))
    image_paths = [str(path) for path in image_paths]
    return image_paths


def classify_and_collect_images_with_bags(model, path_to_image_folder, 
    destination_folder, prnt_progr=True):
    path_list = get_path_list(path_to_image_folder)
    counter = 0
    paths_of_images_with_bags = []
    for path in path_list:
        #create a single batch from the path
        dataset = tf.data.Dataset.from_tensor_slices([path])
        dataset = 
dataset.map(read_and_preprocess_single_image_from_path).batch(1)
        image_class = np.argmax(model.predict(dataset))
        print(image_class) ### test only
        if int(image_class) > 0:
            copy_files(path, destination_folder)
            paths_of_images_with_bags.append(path)
        #print progress after each 1k steps if prnt_progr is true   
        counter+=1
        if prnt_progr and counter%1000==0:
            print(counter)
    return paths_of_images_with_bags


def copy_files(path, destination_folder):
    #import shutil
    shutil.copy(path, destination_folder)
    return None

使用in_folder 中的图像进行推理，并将属于class 1 的文件复制到out_folder。

classify_and_collect_images_with_bags(classifier, 'in_folder', 'out_folder')

【问题讨论】：

标签： python-3.x tensorflow tf.keras

【解决方案1】：

我终于找到了解决这个问题的方法。我不得不将 classify_and_collect_image_with_bags 分解为这 3 个更简单的函数：

def create_inference_dataset(image_paths):
    inference_dataset =tf.data.Dataset.from_tensor_slices(image_paths)
    inference_dataset = 
inference_dataset.map(read_and_preprocess_single_image_from_path).batch(1)
    return inference_dataset

def classify_single_image(model, single_batch):
    image_class = int(np.argmax(model.predict(single_batch)))
    return image_class

def classify_and_collect_images_with_bags(model, path_to_image_folder, 
destination_folder, prnt_progr=True):
    counter = 0
    #collect image paths into a list and inference create dataset
    path_list = get_path_list(path_to_image_folder)
    inference_dataset = create_inference_dataset(path_list)
    start_time = time.time()
    for batch, path in zip(inference_dataset, path_list) :
        image_class = classify_single_image(model, batch)
        if image_class > 0:
            #print(path)
            copy_files(path, destination_folder)
        counter+=1
        if prnt_progr and counter%1000==0:
            duration = time.time() - start_time
            print(counter, duration)

我没有为每个图像创建一个新的 tf.dataset，而是创建了一个带有 bs = 1 的数据集并对其进行了迭代。使用此解决方案，每个批次一次进行一个实例的推理。

除了解决内存问题，推理时间也从 75s/1000 张图像减少到 20s/1000 张图像。

我相信这个问题有更好的解决方案，但我不相信在循环内创建多个 tf.dataset 会使 GPU 接地。

【讨论】：