【问题标题】:How to parse stepfunction executionId to SageMaker batch transform job name?如何将 stepfunction executionId 解析为 SageMaker 批量转换作业名称?
【发布时间】:2021-04-19 14:16:59
【问题描述】:

我创建了一个stepfunction,下面这个状态机的定义(step-function.json)在terraform中使用(使用本页中的语法:https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_CreateTransformJob.html

如果我第一次执行这个状态机,它会创建一个名为 example-jobname 的 SageMaker 批量转换作业,但我需要每天执行这个状态机,然后它会给我错误 "error": "SageMaker.ResourceInUseException", "cause": "Job name must be unique within an AWS account and region, and a job with this name already exists

原因是因为作业名被硬编码为example-jobname所以如果状态机在第一次之后执行,由于作业名需要唯一,所以任务会失败,只是想知道如何添加一个字符串(类似于作业名称末尾的 ExecutionId)。这是我尝试过的:

  1. 我在 json 文件的Parameters 部分添加了"executionId.$": "States.Format('somestring {}', $$.Execution.Id)",但是当我执行任务时出现错误 "error": "States.Runtime", "cause": "An error occurred while executing the state 'SageMaker CreateTransformJob' (entered at the event id #2). The Parameters '{\"BatchStrategy\":\"SingleRecord\",..............\"executionId\":\"somestring arn:aws:states:us-east-1:xxxxx:execution:xxxxx-state-machine:xxxxxxxx72950\"}' could not be used to start the Task: [The field \"executionId\" is not supported by Step Functions]"}

  2. 我将json文件中的jobname修改为"TransformJobName": "example-jobname-States.Format('somestring {}', $$.Execution.Id)",,执行状态机时报错:"error": "SageMaker.AmazonSageMakerException", "cause": "2 validation errors detected: Value 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' failed to satisfy constraint: Member must satisfy regular expression pattern: ^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}; Value 'example-jobname-States.Format('somestring {}', $$.Execution.Id)' at 'transformJobName' failed to satisfy constraint: Member must have length less than or equal to 63

我真的没有想法了,有人可以帮忙吗?非常感谢。

【问题讨论】:

    标签: amazon-web-services terraform state-machine amazon-sagemaker aws-step-functions


    【解决方案1】:

    所以按照documentation,我们应该按照以下格式传递参数

            "Parameters": {
                "ModelName.$": "$$.Execution.Name",  
                ....
            },
    

    如果你仔细看看,这是你的定义中缺少的东西,所以你的步骤函数定义应该如下所示:

    要么

          "TransformJobName.$": "$$.Execution.Id",
    

          "TransformJobName.$: "States.Format('mytransformjob{}', $$.Execution.Id)"
    

    完整的状态机定义:

        {
            "Comment": "Defines the statemachine.",
            "StartAt": "Generate Random String",
            "States": {
                "Generate Random String": {
                    "Type": "Task",
                    "Resource": "arn:aws:lambda:eu-central-1:1234567890:function:randomstring",
                    "ResultPath": "$.executionid",
                    "Parameters": {
                    "executionId.$": "$$.Execution.Id"
                    },
                    "Next": "SageMaker CreateTransformJob"
                },
            "SageMaker CreateTransformJob": {
                "Type": "Task",
                "Resource": "arn:aws:states:::sagemaker:createTransformJob.sync",
                "Parameters": {
                "BatchStrategy": "SingleRecord",
                "DataProcessing": {
                    "InputFilter": "$",
                    "JoinSource": "Input",
                    "OutputFilter": "xxx"
                },
                "Environment": {
                    "SAGEMAKER_MODEL_SERVER_TIMEOUT": "300"
                },
                "MaxConcurrentTransforms": 100,
                "MaxPayloadInMB": 1,
                "ModelName": "${model_name}",
                "TransformInput": {
                    "DataSource": {
                        "S3DataSource": {
                            "S3DataType": "S3Prefix",
                            "S3Uri": "${s3_input_path}"
                        }
                    },
                    "ContentType": "application/jsonlines",
                    "CompressionType": "Gzip",
                    "SplitType": "Line"
                },
                "TransformJobName.$": "$.executionid",
                "TransformOutput": {
                    "S3OutputPath": "${s3_output_path}",
                    "Accept": "application/jsonlines",
                    "AssembleWith": "Line"
                },    
                "TransformResources": {
                    "InstanceType": "xxx",
                    "InstanceCount": 1
                }
            },
                "End": true
            }
            }
        }
    

    在上面的定义中,lambda 可以是一个函数,它解析我通过参数部分传递的执行 id arn:

     def lambda_handler(event, context):
        return(event.get('executionId').split(':')[-1])
    

    或者如果你不想传递执行id,它可以简单地返回随机字符串,如

     import string
     def lambda_handler(event, context):
        return(string.ascii_uppercase + string.digits)
    

    您可以生成各种随机字符串或在 lambda 中生成任何内容并将其传递给转换作业名称。

    【讨论】:

    • 嘿,我使用了"TransformJobName.$": "$$.Execution.Id",,但是当我执行状态机时,它给了我一个错误:```“error”:“SageMaker.AmazonSageMakerException”,“cause”:“检测到2个验证错误: 'transformJobName' 处的值 'arn:aws:states:us-east-1:xxx:execution:xxx-state-machine:070xxx3-xxxx-xxx-xxxx-xxxxxxxxx_xxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx' 未能满足约束: 成员必须满足正则表达式模式:^[a-zA-Z0-9](-*[a-zA-Z0-9]){0,62}; 'transformJobName' 处的值 'arn:xxxxxx' 未能满足约束:成员的长度必须小于或等于 63```
    • 正如我们之前在线程中讨论的那样,我试图在gist 中描述您将获得带有执行 ID 的 arn,因此您需要从 arn 中提取 ID。
    • @Cecilia 在她的回答中,我已经添加了两种方式的代码请看一下
    • @Cecilia 很高兴它成功了。你有一个美好的一天!
    • @Cecilia 您可以使用moto 编写单元测试用例。
    【解决方案2】:

    我想提出另一个想法。 如果适用,您还可以使用上一个任务中的另一个 executionId 或其他唯一标识符。

    我在 GlueJob 成功后触发 BatchTransform 作业。 因此,我可以在 BatchTransform 作业中获取输出变量并连接以使用新的 TransformJobName。

    "TransformJobName.$": "States.Format('scoring-titanic-{}', $.CompletedOn)"

    【讨论】:

      猜你喜欢
      • 2021-04-13
      • 2020-03-02
      • 1970-01-01
      • 1970-01-01
      • 2021-05-15
      • 1970-01-01
      • 2020-10-06
      • 2020-10-25
      • 2020-05-16
      相关资源
      最近更新 更多