【发布时间】:2020-02-25 20:04:38
【问题描述】:
我创建了一个状态机来并行运行一些 Glue/ETL 作业。我正在试验 Map 状态以利用动态并行性。下面是阶跃函数定义:
{
"StartAt": "Map",
"States": {
"Map": {
"Type": "Map",
"InputPath": "$.data",
"ItemsPath": "$.array",
"MaxConcurrency": 2,
"Iterator": {
"StartAt": "glue job",
"States": {
"glue Job": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"End": true,
"Parameters": {
"JobName": "glue-etl-job",
"Arguments": {
"--db": "db-dev",
"--file": "$.file",
"--bucket": "$.bucket"
}
}
}
}
},
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "NotifyError"
}
],
"Next": "NotifySuccess"
},
}
}
传递给step函数的输入格式是这样的:
{
"data": {
"array": [
{"file": "path-to-file1", "bucket": "bucket-name1"},
{"file": "path-to-file2", "bucket": "bucket-name2"},
]
}
}
问题是file 和bucket 作业参数没有得到解决,它们被传递给像$.file 和$.bucket 这样的胶水作业。如何从输入中传递参数实际值?
【问题讨论】:
标签: aws-glue aws-step-functions