无法让 AWS SageMaker 读取 RecordIO 文件答案

【问题标题】：Unable to get AWS SageMaker to read RecordIO files无法让 AWS SageMaker 读取 RecordIO 文件
【发布时间】：2019-11-23 17:16:02
【问题描述】：

我正在尝试将对象检测 lst 文件转换为 rec 文件并在 SageMaker 中使用它进行训练。我的列表如下所示：

10  2   5   9.0000  1008.0000   1774.0000   1324.0000   1953.0000   3.0000  2697.0000   3340.0000   948.0000    1559.0000   0.0000  0.0000  0.0000  0.0000  0.0000  IMG_1091.JPG
58  2   5   11.0000 1735.0000   2065.0000   1047.0000   1300.0000   6.0000  2444.0000   2806.0000   1194.0000   1482.0000   1.0000  2975.0000   3417.0000   1739.0000   2139.0000   IMG_7000.JPG
60  2   5   12.0000 1243.0000   1861.0000   1222.0000   1710.0000   6.0000  2423.0000   2971.0000   1205.0000   1693.0000   0.0000  0.0000  0.0000  0.0000  0.0000  IMG_7061.JPG
80  2   5   1.0000  1865.0000   2146.0000   818.0000    969.0000    14.0000 1559.0000   1918.0000   1658.0000   1914.0000   6.0000  2638.0000   3042.0000   2125.0000   2490.0000   IMG_9479.JPG
79  2   5   13.0000 1556.0000   1812.0000   1440.0000   1637.0000   7.0000  2216.0000   2452.0000   1595.0000   1816.0000   0.0000  0.0000  0.0000  0.0000  0.0000  IMG_9443.JPG

列在哪里

index, header length, object length, class id, xmin, ymin, xmax, ymax, (repeat any other ids...), image path

然后我通过im2rec 运行列表

$ /incubator-mxnet/tools/im2rec.py my_lst.lst my_image_folder

然后我将生成的 .rec 文件上传到 s3。

然后我从this AWS sample notebook. 中提取必要的部分

我认为唯一的关键部分可能是这样的：

def set_hyperparameters(num_epochs, lr_steps):
    num_classes = 16
    num_training_samples = 227
    print('num classes: {}, num training images: {}'.format(num_classes, num_training_samples))

    od_model.set_hyperparameters(base_network='resnet-50',
                                 use_pretrained_model=1,
                                 num_classes=num_classes,
                                 mini_batch_size=16,
                                 epochs=num_epochs,               
                                 learning_rate=0.001, 
                                 lr_scheduler_step=lr_steps,      
                                 lr_scheduler_factor=0.1,
                                 optimizer='sgd',
                                 momentum=0.9,
                                 weight_decay=0.0005,
                                 overlap_threshold=0.5,
                                 nms_threshold=0.45,
                                 image_shape=512,
                                 label_width=350,
                                 num_training_samples=num_training_samples)

set_hyperparameters(100, '33,67')

最终我得到了错误：Not enough label packed in img_list or rec file.

有人可以帮我确定我缺少哪些部分，以便正确使用 SageMaker 和 RecordIO 文件进行训练吗？

感谢您的帮助！

另外，如果我改为使用

$ /incubator-mxnet/tools/im2rec.py my_lst.lst my_image_folder --pass-through --pack-label

我得到错误：

Expected number of batches: 14, did not match the number of batches processed: 5. This may happen when some images or annotations are invalid and cannot be parsed. Please check the dataset and ensure it follows the format in the documentation.

【问题讨论】：

标签： object-detection mxnet amazon-sagemaker

【解决方案1】：

这可能来晚了，但您是否在 .lst 文件中从 0 开始标记您的类？

在您发布的链接中：

类应该用连续的数字标记，并从 0 开始。

【讨论】：