【发布时间】:2019-09-19 17:54:30
【问题描述】:
我正在尝试使用 amazon sagemaker 线性学习算法,它支持“application/x-recordio-protobuf”的内容类型。在预处理阶段,我使用 scikit-learn 预处理来对我的特征进行一次热编码。然后我使用线性学习器估计器来记录 io 转换的输入数据。
我使用了package,预处理转换成功。
from sagemaker.amazon.common import write_spmatrix_to_sparse_tensor
def output_fn(prediction, accept):
"""Format prediction output
The default accept/content-type between containers for serial inference is JSON.
We also want to set the ContentType or mimetype as the same value as accept so the next
container can read the response payload correctly.
"""
if accept == 'text/csv':
return worker.Response(encoders.encode(prediction.todense(), accept), mimetype=accept)
elif accept == 'application/x-recordio-protobuf':
buf = BytesIO()
write_spmatrix_to_sparse_tensor(buf, prediction)
buf.seek(0)
return worker.Response(buf, accept, mimetype=accept)
else:
raise RuntimeError("{} accept type is not supported by this script.".format(accept))
但是当线性学习器获取输入记录时,它会失败并出现以下错误
引起:[15:53:30] /opt/brazil-pkg-cache/packages/AIAlgorithmsCppLibs/AIAlgorithmsCppLibs-2.0.774.0/AL2012/generic-flavor/src/src/aialgs/io/iterator_base.cpp: 100:
(Input Error) The header of the MXNet RecordIO record at position 810 in the dataset does not start with a valid magic number.
【问题讨论】:
-
所以我遇到了类似的问题,但它与 如何 我将数据保存到 S3 有关。这是对我有用的代码:
bucket = 'my-bucket-name' buffer = io.BytesIO() smac.write_spmatrix_to_sparse_tensor(buffer, testVectors, testLabels) buffer.seek(0) key = 'my-key-name' boto3.client('s3').upload_fileobj(buffer, Bucket=bucket, Key=key, ExtraArgs={'ACL': 'bucket-owner-full-control'}) -
@matt 考虑更新答案。
-
好建议@MikeF
标签: scikit-learn protocol-buffers linear-regression mxnet amazon-sagemaker