【问题标题】:What is the proper way to benchmark part of tensorflow graph?对张量流图的部分进行基准测试的正确方法是什么?
【发布时间】:2020-07-25 11:01:05
【问题描述】:

我想对图表的某些部分进行基准测试,为了简单起见,我使用 conv_block 即 conv3x3。

  1. 循环中使用的x_np可以相同还是每次都需要重新生成?
  2. 在运行实际基准测试之前是否需要进行一些“热身”运行(似乎这是 GPU 基准测试所必需的)?如何正确地做到这一点? sess.run(tf.global_variables_initializer())够了吗?
  3. 在 python 中测量时间的正确方法是什么,即更精确的方法。
  4. 在运行脚本之前我是否需要在 linux 上重置一些系统缓存(也许禁用 np.random.seed 就足够了)?

示例代码:

import os
import time

import numpy as np
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)


def conv_block(x, kernel_size=3):
    # Define some part of graph here

    bs, h, w, c = x.shape
    in_channels = c
    out_channels = c

    with tf.variable_scope('var_scope'):
        w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.contrib.layers.xavier_initializer())
        x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')

    return x


def get_data_batch(spatial_size, n_channels):
    bs = 1
    h = spatial_size
    w = spatial_size
    c = n_channels

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    #print('x_np.shape', x_np.shape)

    return x_np


def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
    print('=' * 60)
    print(f_name.__name__)

    tf.reset_default_graph()
    with tf.Session() as sess:
        x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
        z_tf = f_name(x_tf)
        sess.run(tf.global_variables_initializer())

        x_np = get_data_batch(spatial_size, n_channels)
        start_time = time.time()
        for _ in range(n_iter):
            z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        avr_time = (time.time() - start_time) / n_iter
        print('z_np.shape', z_np.shape)
        print('avr_time', round(avr_time, 3))

        n_total_params = 0
        for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
            n_total_params += np.prod(v.get_shape().as_list())
        print('Number of parameters:', format(n_total_params, ',d'))


if __name__ == '__main__':
    run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)

【问题讨论】:

  • 我认为您的做法是合理的,尽管您也可以考虑使用timeit 模块。我只需要更改几件事:1)在开始测量时间之前至少运行一次计算,因为 TF 在第一次评估中通常需要更长的时间 2)在开始循环之前将 [z_tf]{x_tf: x_np} 保存到变量中在每次调用中重复使用它们,以节省创建列表和字典的时间。

标签: python linux tensorflow time benchmarking


【解决方案1】:

回答您的主要问题“对张量流图的部分进行基准测试的正确方法是什么?”:

Tensorflow 包含一个为 TensorFlow 基准测试提供帮助器的抽象类:Benchmark

因此,可以制作一个Benchmark 对象并用于在张量流图的一部分上执行基准测试。在下面的代码中,一个基准对象被实例化,然后,run_op_benchmark 方法被调用。 run_op_benchmark 被传递会话,conv_block 张量(在这种情况下),feed_dict,烧伤迭代次数,所需的最小迭代次数,布尔标志,以防止基准测试也计算内存使用和一个方便的名字。该方法返回一个包含基准测试结果的字典:

benchmark = tf.test.Benchmark()
results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                     feed_dict={x_tf: x_np}, burn_iters=2, 
                                     min_iters=n_iter, 
                                     store_memory_usage=False, name='example')

可以将这段代码插入到您的代码中,以比较两个基准测试:

import os
import time

import numpy as np
import tensorflow as tf

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)

np.random.seed(2020)


def conv_block(x, kernel_size=3):
    # Define some part of graph here

    bs, h, w, c = x.shape
    in_channels = c
    out_channels = c

    with tf.compat.v1.variable_scope('var_scope'):
        w_0 = tf.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.keras.initializers.glorot_normal())
        x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')

    return x


def get_data_batch(spatial_size, n_channels):
    bs = 1
    h = spatial_size
    w = spatial_size
    c = n_channels

    x_np = np.random.rand(bs, h, w, c)
    x_np = x_np.astype(np.float32)
    #print('x_np.shape', x_np.shape)

    return x_np


def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
    print('=' * 60)
    print(f_name.__name__)

    tf.reset_default_graph()
    with tf.Session() as sess:
        x_tf = tf.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
        z_tf = f_name(x_tf)
        sess.run(tf.global_variables_initializer())

        x_np = get_data_batch(spatial_size, n_channels)
        start_time = time.time()
        for _ in range(n_iter):
            z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
        avr_time = (time.time() - start_time) / n_iter
        print('z_np.shape', z_np.shape)
        print('avr_time', round(avr_time, 3))

        n_total_params = 0
        for v in tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
            n_total_params += np.prod(v.get_shape().as_list())
        print('Number of parameters:', format(n_total_params, ',d'))

        # USING TENSORFLOW BENCHMARK
        benchmark = tf.test.Benchmark()
        results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                             feed_dict={x_tf: x_np}, burn_iters=2, min_iters=n_iter,
                                             store_memory_usage=False, name='example')

        return results


if __name__ == '__main__':
    results = run_graph_part(conv_block, spatial_size=128, n_channels=32, n_iter=100)

在 tensorflow 库中实现的基准测试类本身提供了有关其他问题答案的提示。由于 tensorflow 实现不需要为每个基准迭代使用新的feed_dict,因此问题 1) 的答案似乎是“循环中使用的x_np 是否相同或者我需要重新生成它是否可以”每一次?'是可以在每个循环中使用相同的x_np。关于问题 2),似乎有必要进行一些“热身”。 tensorflow 库实现建议的默认烧录迭代次数为 2。关于问题 3),timeit 是测量小代码 sn-ps 执行时间的绝佳工具。但是,tensorflow 库本身使用time.time() 的方式与您所做的类似:run_op_benchmark (source)。有趣的是,张量流基准实现报告的是中值而不是操作壁时间的平均值(大概是为了使基准对异常值更加稳健)。

【讨论】:

    【解决方案2】:

    除了精彩解释的 Steve 的 answer,以下内容在 TensorFlow-GPU v2.3 上对我有用

    import tensorflow as tf
    
    tf.config.experimental.set_memory_growth(tf.config.experimental.list_physical_devices('GPU')[0], True)
    
    import os
    import time
    
    import numpy as np
    
    os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
    tf.compat.v1.logging.set_verbosity(tf.compat.v1.logging.ERROR)
    
    np.random.seed(2020)
    
    
    
    def conv_block(x, kernel_size=3):
        # Define some part of graph here
    
        bs, h, w, c = x.shape
        in_channels = c
        out_channels = c
    
        with tf.compat.v1.variable_scope('var_scope'):
            w_0 = tf.compat.v1.get_variable('w_0', [kernel_size, kernel_size, in_channels, out_channels], initializer=tf.keras.initializers.glorot_normal())
            x = tf.nn.conv2d(x, w_0, [1, 1, 1, 1], 'SAME')
    
        return x
    
    
    def get_data_batch(spatial_size, n_channels):
        bs = 1
        h = spatial_size
        w = spatial_size
        c = n_channels
    
        x_np = np.random.rand(bs, h, w, c)
        x_np = x_np.astype(np.float32)
        #print('x_np.shape', x_np.shape)
    
        return x_np
    
    
    def run_graph_part(f_name, spatial_size, n_channels, n_iter=100):
        print('=' * 60)
        print(f_name.__name__)
    
    #     tf.reset_default_graph()
        tf.compat.v1.reset_default_graph()
        
        
        with tf.compat.v1.Session() as sess:
            x_tf = tf.compat.v1.placeholder(tf.float32, [1, spatial_size, spatial_size, n_channels], name='input')
            z_tf = f_name(x_tf)
            
            sess.run(tf.compat.v1.global_variables_initializer())
    
            x_np = get_data_batch(spatial_size, n_channels)
            
            start_time = time.time()
            
            for _ in range(n_iter):
                z_np = sess.run(fetches=[z_tf], feed_dict={x_tf: x_np})[0]
            avr_time = (time.time() - start_time) / n_iter
            
            print('z_np.shape', z_np.shape)
            print('avr_time', round(avr_time, 3))
    
            n_total_params = 0
            
            for v in tf.compat.v1.get_collection(tf.compat.v1.GraphKeys.TRAINABLE_VARIABLES, scope='var_scope'):
                n_total_params += np.prod(v.get_shape().as_list())
            
            print('Number of parameters:', format(n_total_params, ',d'))
    
            # USING TENSORFLOW BENCHMARK
            benchmark = tf.test.Benchmark()
            results = benchmark.run_op_benchmark(sess=sess, op_or_tensor=z_tf, 
                                                 feed_dict={x_tf: x_np}, burn_iters=2, min_iters=n_iter,
                                                 store_memory_usage=False, name='example')
    
            return results
    
    
    if __name__ == '__main__':
        results = run_graph_part(conv_block, spatial_size=512, n_channels=32, n_iter=100)
    
    

    在我的情况下会输出类似 -

    ============================================================
    conv_block
    z_np.shape (1, 512, 512, 32)
    avr_time 0.072
    Number of parameters: 9,216
    entry {
      name: "TensorFlowBenchmark.example"
      iters: 100
      wall_time: 0.049364686012268066
    }
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2016-11-24
      • 2017-03-06
      • 2010-09-13
      • 1970-01-01
      • 2020-05-10
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多