【发布时间】:2021-11-26 02:28:46
【问题描述】:
我正在为该项目使用 golang 和 Google App Engine。我有一个任务,我收到一个巨大的文件,将它分成几行,然后将这些行一一发送到队列中以进行解析。我在 app.yaml 文件中进行缩放的初始设置如下:
instance_class: F1
automatic_scaling:
min_instances: 0
max_instances: 4
min_idle_instances: 0
max_idle_instances: 1
target_cpu_utilization: 0.8
min_pending_latency: 15s
它工作正常,但它有一个问题 - 因为确实有很多任务,10 分钟后它会失败(当然,根据文档)。所以我决定使用B1 实例类而不是F1 - 这就是问题所在。
我的 B1 设置如下所示:
instance_class: B1
basic_scaling:
max_instances: 4
现在,我创建了一个非常简单的演示来演示这个想法:
r.GET("foo", func(c *gin.Context) {
_, err := tm.CreateTask(&tasks.TaskOptions{
QueueID: "bar",
Method: "method",
PostBody: "foooo",
})
if err != nil {
lg.LogErrorAndChill("failed, %v", err)
}
})
r.POST("bar/method", func(c *gin.Context) {
data, err := c.GetRawData()
if err != nil {
lg.LogErrorAndPanic("failed", err)
}
fmt.Printf("data is %v \n", string(data))
})
解释其背后的逻辑:我向“foo”发送了一个请求,该请求创建了一个任务,该任务与一些正文一起添加到队列中。在任务内部,基于 queueId 和 method 参数调用 post 方法,该方法接收一些文本,在这个简单的示例中只是将其注销。
现在,当我运行请求时,我收到 500 错误,如下所示:
[GIN] 2021/10/05 - 19:38:29 | 500 | 301.289µs | 0.1.0.3 | GET "/_ah/start"
在日志中我可以看到:
Process terminated because it failed to respond to the start request with an HTTP status code of 200-299 or 404.
并且在任务队列里面(重试的原因):
INTERNAL(13): Instance Unavailable. HTTP status code 500
现在,我已阅读文档并了解以下内容:
Manual, basic, and automatically scaling instances startup differently. When you start a manual scaling instance, App Engine immediately sends a /_ah/start request to each instance. When you start an instance of a basic scaling service, App Engine allows it to accept traffic, but the /_ah/start request is not sent to an instance until it receives its first user request. Multiple basic scaling instances are only started as necessary, in order to handle increased traffic. Automatically scaling instances do not receive any /_ah/start request.
When an instance responds to the /_ah/start request with an HTTP status code of 200–299 or 404, it is considered to have successfully started and can handle additional requests. Otherwise, App Engine terminates the instance. Manual scaling instances are restarted immediately, while basic scaling instances are restarted only when needed for serving traffic
但这并没有真正的帮助 - 我不明白为什么 /_ah/start 请求没有正确响应,我不确定如何调试或修复它,特别是因为 F1 实例正在工作好的。
【问题讨论】:
标签: google-app-engine autoscaling