Detectron2 加速推理实例分割答案

【问题标题】：Detectron2 Speed up inference instance segmentationDetectron2 加速推理实例分割
【发布时间】：2021-07-06 04:40:01
【问题描述】：

我有工作实例分割，我正在使用“mask_rcnn_R_101_FPN_3x”模型。当我推断图像时，GPU 上大约需要 3 秒/图像。我怎样才能加快速度？

我在 Google Colab 中编码

这是我的设置配置：

cfg = get_cfg()

cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x.yaml"))

cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 

cfg.OUTPUT_DIR = "/content/drive/MyDrive/TEAM/save/"

cfg.DATASETS.TRAIN = (train_name,)
cfg.DATASETS.TEST = (test_name, )
cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth")

这是推理：

torch.backends.cudnn.benchmark = True
start = time.time()

predictor = DefaultPredictor(cfg) 

im = cv2.imread("/content/drive/MyDrive/TEAM/mcocr_val_145114ixmyt.jpg")

outputs = predictor(im) 

print(f"Inference time per image is : {(time.time() - start)} s")

返回时间：

每张图像的推理时间为：2.7835421562194824 s

图像 I 推断大小为 1024 x 1024 像素。我改变了不同的大小，但它仍然推断出 3 秒/图像。我是否缺少有关 Detectron2 的任何信息？

更多信息 GPU enter image description here

【问题讨论】：

请用文字替换图片链接，因为它有助于引擎引用 S.O.帖子和读者也是。
K80 对于今天的标准来说是有点慢的 GPU。我认为这是意料之中的，尤其是因为您不仅要测量推理，还要测量模型设置和图像加载。

标签： performance pytorch detectron

【解决方案1】：

这是减少推理时间的两种最佳方法：

使用更好的 GPU
使用浅层网络 - 例如 R50 - 在此处查看推理时间：https://github.com/facebookresearch/detectron2/blob/master/MODEL_ZOO.md

减小图像大小不会减少推理时间，因为无论图像大小如何，mask-rcnn 具有相同数量的参数 - 因此推理时间没有变化。

【讨论】：

您能否发布一个使用 Detectron2 的浅层网络的示例？