【问题标题】:No Module named "Fastai" when trying to deploy fastai model on sagemaker尝试在 sagemaker 上部署 fastai 模型时没有名为“Fastai”的模块
【发布时间】:2022-02-08 21:45:33
【问题描述】:

我已经训练并构建了一个 Fastai(v1) 模型并将其导出为 .pkl 文件。 现在我想在 Amazon Sagemaker 中部署这个模型进行推理

遵循 Pytorch 模型的 Sagemaker 文档 [https://sagemaker.readthedocs.io/en/stable/frameworks/pytorch/using_pytorch.html#write-an-inference-script][1]

采取的步骤
文件夹结构

贤者/ 导出.pkl 代码/ 推理.py 需求.txt 需求.txt 空间==2.3.4 火炬==1.4.0 火炬视觉==0.5.0 fastai==1.0.60 麻木的

我用来创建 zip 文件的命令

cd贤者/ tar -czvf /tmp/model.tar.gz ./export.pkl ./code

这将生成一个 model.tar.gz 文件,然后我将其上传到 S3 存储桶

为了部署它,我使用了 python sagemaker SDK


    from sagemaker.pytorch import PyTorchModel
        role = "sagemaker-role-arn"
        model_path = "s3 key for the model.tar.gz file that i created above"
        pytorch_model = PyTorchModel(model_data=model_path,role=role,`entry_point='inference.py',framework_version="1.4.0", py_version="py3")
    
        predictor = pytorch_model.deploy(instance_type='ml.c5.large', initial_instance_count=1)

执行上述代码后,我看到模型是在 sagemaker 中创建并部署的,但我最终在运行推理时遇到错误


    botocore.errorfactory.ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received server error (500) from primary with message "No module named 'fastai'
    Traceback (most recent call last):
      File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 110, in transform
        self.validate_and_initialize(model_dir=model_dir)
      File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 157, in validate_and_initialize
        self._validate_user_module_and_set_functions()
      File "/opt/conda/lib/python3.6/site-packages/sagemaker_inference/transformer.py", line 170, in _validate_user_module_and_set_functions
        user_module = importlib.import_module(user_module_name)
      File "/opt/conda/lib/python3.6/importlib/__init__.py", line 126, in import_module
        return _bootstrap._gcd_import(name[level:], package, level)
      File "<frozen importlib._bootstrap>", line 994, in _gcd_import
      File "<frozen importlib._bootstrap>", line 971, in _find_and_load
      File "<frozen importlib._bootstrap>", line 955, in _find_and_load_unlocked
      File "<frozen importlib._bootstrap>", line 665, in _load_unlocked
      File "<frozen importlib._bootstrap_external>", line 678, in exec_module
      File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
      File "/opt/ml/model/code/inference.py", line 2, in <module>
        from fastai.basic_train import load_learner, DatasetType, Path
    ModuleNotFoundError: No module named 'fastai'

很明显,fastai 模块没有被下载这是什么原因,在这种情况下我做错了什么

【问题讨论】:

    标签: python machine-learning amazon-sagemaker fast-ai


    【解决方案1】:

    要解决此类问题,您应该检查端点的CloudWatch logs

    你应该先检查日志,看看是否找到并安装了requirements.txt,或者是否有任何依赖错误。

    为了打包模型和推理脚本,建议有两个文件:

    1. model.tar.gz 有模型和模型文件。
    2. sourcedir.tar.gz 并使用 SageMaker 环境变量 SAGEMAKER_SUBMIT_DIRECTORY 指向 S3 上的文件位置 s3://bucket/prefix/sourcedir.tar.gz。您可以使用SAGEMAKER_PROGRAM 将文件名指向为inference.py

    注意:当你在PyTorchModel中使用source_dir时,SDK会将source_dir打包,上传到s3并定义SAGEMAKER_SUBMIT_DIRECTORY

    【讨论】:

      猜你喜欢
      • 2019-11-17
      • 2020-01-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-09-20
      • 1970-01-01
      • 1970-01-01
      • 2018-08-30
      相关资源
      最近更新 更多