【问题标题】:CloudFormation not able to launch ECS ServiceCloudFormation 无法启动 ECS 服务
【发布时间】:2021-09-29 00:14:09
【问题描述】:

在通宵运行的 Buildkite 管道期间,CloudFormation 无法创建 ECS 服务,超时。

超时的原因可能是什么?

CFN 日志

cfn INFO 03:08:51   CREATE_IN_PROGRESS  build-amis-564-registrations: User Initiated
cfn INFO 03:08:56   CREATE_IN_PROGRESS  TaskDefinition
cfn INFO 03:08:59   CREATE_IN_PROGRESS  TaskDefinition: Resource creation Initiated
cfn INFO 03:08:59   CREATE_COMPLETE TaskDefinition
cfn INFO 03:09:02   CREATE_IN_PROGRESS  ECSService
ecs INFO 03:09:10   Waiting for cluster to scale up. Please wait...
Traceback (most recent call last):
  File ".buildkite/scripts/deploy/ecs_deploy.py", line 282, in <module>
    main()
  File ".buildkite/scripts/deploy/ecs_deploy.py", line 232, in main
    ecs.wait_for_service_steady(cluster, stack_name, project_name, desired_count)
  File "/app/ecs/__init__.py", line 680, in wait_for_service_steady
    raise Exception("Timed out waiting for service deployment")
Exception: Timed out waiting for service deployment

Python 脚本

15 分钟后生成错误的 python 脚本摘录(CloudFormation 本身在尝试创建服务失败三个小时后继续超时)。

...
            for event in filter_events_response(response, last_event_id) or []:
                if "insufficient memory" in event["message"]:
                    message = info("Waiting for cluster to scale up. Please wait...")
                else:
                    message = event["message"]

                if log_progress:
                    logger.info(
                        "%s\t%s", event["createdAt"].strftime("%H:%M:%S"), message
                    )

                last_event_id = event["id"]
                waited = 0

                if "steady" in event["message"]:
                    logger.debug(event)
                    return

                if "deregistered" in event["message"]:
                    killed_tasks += 1

                    if killed_tasks > allowed_killed_tasks:
                        raise ServiceUnstableException(
                            "%s-%s service tasks are failing to start"
                            % (stack, service)
                        )

            time.sleep(20)
            waited += 20
            if waited > 900:
                raise Exception("Timed out waiting for service deployment")
...

【问题讨论】:

    标签: python amazon-web-services amazon-ec2 amazon-cloudformation


    【解决方案1】:

    解决方案:删除之前创建的 CloudFormation 堆栈。

    一旦之前由 Buildkite 管道创建的所有堆栈都被清除,管道就会正常运行。

    最好的建议是在您的管道中添加一个步骤,以在构建失败时删除所有堆栈。

    【讨论】:

      猜你喜欢
      • 2018-05-18
      • 2017-08-14
      • 1970-01-01
      • 1970-01-01
      • 2019-12-14
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多