Tensorflow 对象检测 API - 高 RAM/CPU 使用率 - 无 GPU 使用率答案

【问题标题】：Tensorflow Object Detection API - High RAM/CPU usage - no GPU usageTensorflow 对象检测 API - 高 RAM/CPU 使用率 - 无 GPU 使用率
【发布时间】：2021-07-20 14:55:54
【问题描述】：

我正在使用 tensorflow 对象检测 API，但在模型训练方面遇到了一些问题。尤其是CPU和Ram使用率很高，而GPU基本不用（根据Windows任务管理器）：

我已经按照this guide安装了TF对象检测API，验证GPU识别成功：

python -c "import tensorflow as tf;print(tf.reduce_sum(tf.random.normal([1000, 1000])))"

2021-07-20 15:36:37.630320: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-20 15:36:49.683811: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library nvcuda.dll
2021-07-20 15:36:49.990907: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.605GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-07-20 15:36:50.017685: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudart64_110.dll
2021-07-20 15:36:50.142257: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublas64_11.dll
2021-07-20 15:36:50.158525: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cublasLt64_11.dll
2021-07-20 15:36:50.173970: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cufft64_10.dll
2021-07-20 15:36:50.183516: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library curand64_10.dll
2021-07-20 15:36:50.196516: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusolver64_11.dll
2021-07-20 15:36:50.213625: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cusparse64_11.dll
2021-07-20 15:36:50.231417: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library cudnn64_8.dll
2021-07-20 15:36:50.234253: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-20 15:36:50.238133: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  AVX AVX2
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2021-07-20 15:36:50.245602: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1733] Found device 0 with properties:
pciBusID: 0000:01:00.0 name: NVIDIA GeForce RTX 3080 Laptop GPU computeCapability: 8.6
coreClock: 1.605GHz coreCount: 48 deviceMemorySize: 8.00GiB deviceMemoryBandwidth: 417.29GiB/s
2021-07-20 15:36:50.265550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1871] Adding visible gpu devices: 0
2021-07-20 15:36:54.162700: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1258] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-07-20 15:36:54.168506: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1264]      0
2021-07-20 15:36:54.169910: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1277] 0:   N
2021-07-20 15:36:54.176538: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1418] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 5177 MB memory) -> physical GPU (device: 0, name: NVIDIA GeForce RTX 3080 Laptop GPU, pci bus id: 0000:01:00.0, compute capability: 8.6)
tf.Tensor(1222.1837, shape=(), dtype=float32)

编辑：我只在 centernet_hg104_512x512_coco17_tpu-8（我使用下面显示的 pipeline.config 文件）遇到这个问题，而其他模型（ssd_resnet 或efficientdet）实际上使用 gpu。

model {
  center_net {
    num_classes: 1
    feature_extractor {
      type: "hourglass_104"
      channel_means: 104.01361846923828
      channel_means: 114.03422546386719
      channel_means: 119.91659545898438
      channel_stds: 73.60276794433594
      channel_stds: 69.89082336425781
      channel_stds: 70.91507720947266
      bgr_ordering: true
    }
    image_resizer {
      keep_aspect_ratio_resizer {
        min_dimension: 512
        max_dimension: 512
        pad_to_max_dimension: true
      }
    }
    object_detection_task {
      task_loss_weight: 1.0
      offset_loss_weight: 1.0
      scale_loss_weight: 0.10000000149011612
      localization_loss {
        l1_localization_loss {
        }
      }
    }
    object_center_params {
      object_center_loss_weight: 1.0
      classification_loss {
        penalty_reduced_logistic_focal_loss {
          alpha: 2.0
          beta: 4.0
        }
      }
      min_box_overlap_iou: 0.6
      max_box_predictions: 50
    }
  }
}
train_config {
  batch_size: 2
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_aspect_ratio: 0.5
      max_aspect_ratio: 1.7000000476837158
      random_coef: 0.25
    }
  }
  data_augmentation_options {
    random_adjust_hue {
    }
  }
  data_augmentation_options {
    random_adjust_contrast {
    }
  }
  data_augmentation_options {
    random_adjust_saturation {
    }
  }
  data_augmentation_options {
    random_adjust_brightness {
    }
  }
  data_augmentation_options {
    random_absolute_pad_image {
      max_height_padding: 200
      max_width_padding: 200
      pad_color: 0.0
      pad_color: 0.0
      pad_color: 0.0
    }
  }
  optimizer {
    adam_optimizer {
      learning_rate {
        manual_step_learning_rate {
          initial_learning_rate: 0.0010000000474974513
          schedule {
            step: 1000
            learning_rate: 9.999999747378752e-05
          }
          schedule {
            step: 5000
            learning_rate: 9.999999747378752e-06
          }
        }
      }
      epsilon: 1.0000000116860974e-07
    }
    use_moving_average: false
  }
  fine_tune_checkpoint: "pre-trained-models/centernet_hg104_512x512_coco17_tpu-8/checkpoint/ckpt-0"
  num_steps: 5000
  max_number_of_boxes: 50
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  tf_record_input_reader {
    input_path: "annotations/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
  batch_size: 1
}
eval_input_reader {
  label_map_path: "annotations/label_map.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "annotations/test.record"
  }
}

需要 2 的小批量大小，因为使用任何更大的数字会使用于训练模型的脚本获得一点堆栈然后退出（这对我来说也很令人惊讶，但我刚刚开始玩这些东西所以也许这实际上是正常的）。

我正在使用：

Windows 10

CPU：i9-10980HK

内存：32GB

GPU：GTX3080 8GB 专用内存

张量流 = 2.5

CUDA = 11.3.1

cuDNN = 8.2.1.32

这是预期的低 GPU/高 CPU 使用率吗？我在这里错过了什么吗？感谢您的帮助，如果我可以提供任何其他有用的信息，请告诉我。

【问题讨论】：

标签： python tensorflow object-detection-api

【解决方案1】：

在\research\object_detection\legacy\train.py中

添加下面的sn-p

import os
os.environ["CUDA_VISIBLE_DEVICES"]="0"

也面临同样的问题，它对我有用。

【讨论】：