【问题标题】:RecordIO: "The header of the MXNet RecordIO record...does not start with a valid magic number"RecordIO:“MXNet RecordIO 记录的标头...不是以有效的幻数开头”
【发布时间】:2020-11-23 18:52:11
【问题描述】:

在 Sagemaker 中使用带有 MXNet RecordIO 的线性学习器,在 fit() 运行 38 分钟后,我得到了 "The header of the MXNet RecordIO record at position 5,089,840 in the dataset does not start with a valid magic number"

文件是使用此代码生成的。请注意,我尝试了两种上传到 S3 的方法。我还尝试了直接上传BytesIO 以及上传文件,如下所示。

train_file = 'linear_train.data'

f = io.BytesIO()
smac.write_numpy_to_dense_tensor(f, train_X.astype('float32'), train_y.astype('float32'))
f.seek(0)
 
# Write the stuff
with open(train_file, "wb") as fl:
    fl.write(f.getvalue())

# Alternative for upload
# boto3.Session().resource('s3').Bucket(bucket).Object(os.path.join(prefix, 'train', train_file)).upload_fileobj(f)
boto3.client('s3').upload_file(train_file,
                                  Bucket=bucket,
                                  Key=os.path.join(prefix, 'train', train_file),
                                  ExtraArgs={'ACL': 'bucket-owner-full-control'})

为了检查文件是否损坏,我从 S3 下载它并简单地阅读如下。

   record = MXRecordIO(fl, 'r')
   while True:
      item = record.read()
      # Here we print the item, break when we reach the end, etc. This confirms that the RecordIO is valid.

所以,文件看起来没问题。

如何运行线性学习器?

这是错误信息:

Failure reason if any: ClientError: Unable to read data channel 'train'. Requested content-type is 'application/x-recordio-protobuf'. Please verify the data matches the requested content-type. (caused by MXNetError)

Caused by: [17:04:49] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.3446.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100: (Input Error) The header of the MXNet RecordIO record at position 5,089,840 in the dataset does not start with a valid magic number.

Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libaialgs.so(+0xbca0) [0x7f337885cca0]
[bt] (1) /opt/amazon/lib/libaialgs.so(+0xbffa) [0x7f337885cffa]
[bt] (2) /opt/amazon/lib/libaialgs.so(aialgs::iterator_base::Next()+0x4a6) [0x7f33788675e6]
[bt] (3) /opt/amazon/lib/libmxnet.so(MXDataIterNext+0x21) [0x7f3367272141]
[bt] (4) /opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f3378893958]
[bt] (5) /opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x15f) [0x7f33
Failed 39.0 min; Failure reason if any: ClientError: Unable to read data channel 'train'. Requested content-type is 'application/x-recordio-protobuf'. Please verify the data matches the requested content-type. (caused by MXNetError)

Caused by: [17:04:49] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.3446.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp:100: (Input Error) The header of the MXNet RecordIO record at position 5,089,840 in the dataset does not start with a valid magic number.

Stack trace returned 10 entries:
[bt] (0) /opt/amazon/lib/libaialgs.so(+0xbca0) [0x7f337885cca0]
[bt] (1) /opt/amazon/lib/libaialgs.so(+0xbffa) [0x7f337885cffa]
[bt] (2) /opt/amazon/lib/libaialgs.so(aialgs::iterator_base::Next()+0x4a6) [0x7f33788675e6]
[bt] (3) /opt/amazon/lib/libmxnet.so(MXDataIterNext+0x21) [0x7f3367272141]
[bt] (4) /opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call_unix64+0x4c) [0x7f3378893958]
[bt] (5) /opt/amazon/python2.7/lib/python2.7/lib-dynload/_ctypes.so(ffi_call+0x15f) [0x7f33

【问题讨论】:

    标签: machine-learning amazon-s3 amazon-sagemaker mxnet


    【解决方案1】:

    是因为 CSV 文件位于与 RecordIO 相同的 S3 文件夹中。

    【讨论】:

      猜你喜欢
      • 2021-05-29
      • 1970-01-01
      • 2019-09-19
      • 2020-01-08
      • 1970-01-01
      • 1970-01-01
      • 2019-01-13
      • 2017-07-15
      • 1970-01-01
      相关资源
      最近更新 更多