【问题标题】:Use multiple GPUs in TensorFlow to inference with pb model在 TensorFlow 中使用多个 GPU 使用 pb 模型进行推理
【发布时间】:2019-06-12 07:30:03
【问题描述】:

我使用带有 8 个 Titan X 的服务器,试图比使用单个 GPU 更快地预测图像。 我这样加载 PB 模型:

model_dir = "./model"
    model = "nasnet_large_v1.pb"
    model_path = os.path.join(model_dir, model)
    model_graph = tf.Graph()
    with model_graph.as_default():
        with tf.gfile.GFile(model_path, 'rb') as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
            _ = tf.import_graph_def(graph_def, name='')
            input_layer = model_graph.get_tensor_by_name("input:0")
            output_layer = model_graph.get_tensor_by_name('final_layer/predictions:0')

然后我开始迭代./data_input 目录中的文件,如下所示:

with tf.Session(graph = model_graph, config=config) as inference_session:
        # Initialize session
        initializer = np.zeros([1, 331, 331, 3])
        print("Initialing session...")
        inference_session.run(output_layer, feed_dict={input_layer: initializer})
        print("Done initialing.")

        # Prediction
        file_list = []
        processed_files = []

        for path, dir, files in os.walk('./model_output/processed_files'):
            for file in files:
                processed_files.append(file.split('_')[0]+'.tfrecord')

        print("Processed files: ")
        for f in processed_files:
            print('\t', f)

        while True:
            for path, dir, files in os.walk("./data_input"):
                for file in files:
                    if file == '.DS_Store': continue
                    if file in processed_files: continue
                    print("Reading file {}".format(file))
                    file_path = os.path.join('./data_input', file)
                    file_list.append(file_path)
                    res = predict(file_path)
                    processed_files.append(file)

                    with open('./model_output/processed_files/{}_{}_processed_files.json'.format(file.split('.')[0], model.split('.')[0]), 'w') as f:
                        f.write(json.dumps(processed_files))

                    with open('./model_output/classify_result/{}_{}_classify_result.json'.format(file.split('.')[0], model.split('.')[0]), 'w') as f:
                        f.write(json.dumps(res, indent=4, separators=(',',':')))

            time.sleep(1)

predict()函数中,我写了这样的代码:

label_map = get_label()
    # read tfrecord file by tf.data
    dataset = get_dataset(filename)
    # dataset.apply(tf.contrib.data.prefetch_to_device("/gpu:0"))
    # load data
    iterator = dataset.make_one_shot_iterator()
    features = iterator.get_next()

    result = []
    content = {}
    count = 0
    # session
    with tf.Session() as sess:
        tf.global_variables_initializer()
        t1 = time.time()
        try:
            while True:
                [_image, _label, _filepath] = sess.run(fetches=features)
                _image = np.asarray([_image])
                _image = _image.reshape(-1, 331, 331, 3)

                predictions = inference_session.run(output_layer, feed_dict={input_layer: _image})
                predictions = np.squeeze(predictions)

                # res = []
                for i, pred in enumerate(predictions):
                    count += 1
                    overall_result = np.argmax(pred)
                    predict_result = label_map[overall_result].split(":")[-1]

                    if predict_result == 'unknown': continue

                    content['prob'] = str(np.max(pred))
                    content['label'] = predict_result
                    content['filepath'] = str(_filepath[i], encoding='utf-8')
                    result.append(content)

        except tf.errors.OutOfRangeError:
            t2 = time.time()
            print("{} images processed, average time: {}s".format(count, (t2-t1)/count))
    return result

我尝试在加载模型部分或推理会话部分或会话部分中使用with tf.device('/gpu:{}'.format(i))nvidia-smi 显示只有 GPU0 使用到 100%,而其他 GPU 即使在内存正在加载。

如何让所有 GPU 同时运行以提高预测速度?

我的代码在https://github.com/tzattack/image_classification_algorithms 下。

【问题讨论】:

    标签: python-3.x tensorflow gpu


    【解决方案1】:

    可以这样做:

    def get_frozen_graph(graph_file):
        """Read Frozen Graph file from disk."""
        with tf.gfile.GFile(graph_file, "rb") as f:
            graph_def = tf.GraphDef()
            graph_def.ParseFromString(f.read())
        return graph_def
    
    trt_graph1 = get_frozen_graph('/home/ved/ved_1/frozen_inference_graph.pb')
    
    with tf.device('/gpu:1'):
        [tf_input_l1, tf_scores_l1, tf_boxes_l1, tf_classes_l1, tf_num_detections_l1, tf_masks_l1] = tf.import_graph_def(trt_graph1, 
                        return_elements=['image_tensor:0', 'detection_scores:0', 
                        'detection_boxes:0', 'detection_classes:0','num_detections:0', 'detection_masks:0'])
        
    tf_sess1 = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
    
    trt_graph2 = get_frozen_graph('/home/ved/ved_2/frozen_inference_graph.pb')
    
    with tf.device('/gpu:0'):
        [tf_input_l2, tf_scores_l2, tf_boxes_l2, tf_classes_l2, tf_num_detections_l2] = tf.import_graph_def(trt_graph2, 
                        return_elements=['image_tensor:0', 'detection_scores:0', 
                        'detection_boxes:0', 'detection_classes:0','num_detections:0'])
        
    tf_sess2 = tf.Session(config=tf.ConfigProto(allow_soft_placement=True))
    

    【讨论】:

    • 这看起来不像是在并行运行任何东西。看起来您正在 GPU 上运行会话,一个接一个地依次运行。
    • @jjschuh 上面创建了两个单独的会话。在不同的线程中使用这些会话。
    • 感谢您的提示。通过线程,您是指向 TF 图添加一些东西,还是使用单独的并行计算工具箱,如“多处理”?我以为 TF 处理并行计算。
    【解决方案2】:

    您可以通过以下方式强制图表中每个节点的设备:

    def load_network(graph, i):
        od_graph_def = tf.GraphDef()
        with tf.gfile.GFile(graph, 'rb') as fid:
            serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        for node in od_graph_def.node:
            node.device = '/gpu:{}'.format(i) if i >= 0 else '/cpu:0'
        return {"od_graph_def": od_graph_def}
    

    然后您可以将获得的多个图形(每个 gpu)合并到一个
    如果您对所有 gpus 使用相同的模型,还可以更改张量名称
    并在一个会话中运行所有内容

    非常适合我

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2020-02-11
      • 2020-03-06
      • 2020-12-16
      • 1970-01-01
      • 2020-10-25
      • 2018-05-27
      • 2019-10-12
      相关资源
      最近更新 更多