零停机 K8S 部署：等到探测器知道后再真正停止 Pod？答案

【问题标题】：Zero-downtime K8S rollout: wait until probes know before actually stop pod?零停机 K8S 部署：等到探测器知道后再真正停止 Pod？
【发布时间】：2021-05-11 02:43:49
【问题描述】：

我正在尝试在 k8s 上实现零停机部署。我的部署有一个副本。 pod 探针如下所示：

apiVersion: apps/v1
kind: Deployment
metadata:
  name: app
  namespace: ${KUBE_NAMESPACE}
spec:
  selector:
    matchLabels:
      app: app
  replicas: 1
  template:
    metadata:
      labels:
        app: app
    spec:
      containers:
        - name: app-container
          imagePullPolicy: IfNotPresent
          image: ${DOCKER_IMAGE}:${IMAGE_TAG}
          ports:
            - containerPort: 80
          livenessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 5
          readinessProbe:
            httpGet:
              path: /health
              port: 80
            initialDelaySeconds: 5
            periodSeconds: 10
      terminationGracePeriodSeconds: 130

但是，每次kubectl rollout status 返回并报告部署完成后。我经历了bad gateway的一小段时间。

然后我添加一个测试，让 /health 在 prestop 返回 500，并在实际停止 pod 之前至少等待 20 秒。

# If the app test the /tmp/prestop file exists, it will return 500.
          lifecycle:
            preStop:
              exec:
                command: ["/bin/bash", "-c", "touch /tmp/prestop && sleep 20"]

然后我发现k8s停止pod后，流量仍然可以流到旧的pod（如果我访问/health可以得到500的结果）。

所以看起来负载均衡器决定了哪些 pod 可以仅由探测结果使用。由于探针有一段时间，因此总会有一个小窗口，其中 pod 停止但负载均衡器仍然不知道并且可以将流量引导到它，因此用户会遇到停机时间。

所以我的问题是：为了实现零停机部署，似乎必须在实际停止 Pod 之前让探针知道 Pod 正在停止。这是正确的吗？还是我做错了什么？

【问题讨论】：

你使用什么负载均衡器？
@Jonas 应用程序负载均衡器在 Amazon EKS 上。
通常要实现零停机，解决方案通常是让多个 Pod 服务于同一个应用程序，这样在推出时，一次重启一个 Pod，这意味着没有停机时间。如果有问题的应用程序是无状态的，这很容易做到。如果应用程序有状态，那就不太容易了
@AndD 我认为在这种情况下，吊舱的数量并不重要。只要 pod 停止但负载均衡器不知道，客户端就有可能直接访问停止的 pod，从而导致停机。我认为绿/蓝部署不会有这个问题。
ALB 是否将流量发送到 IP 或 NodePort？（两者都是可能的配置）

标签： kubernetes

【解决方案1】：

在搜索了 Google 并进行了一些测试之后。我发现 prestop 后不需要手动回复 500 到探针。

根据documentation

在 kubelet 开始正常关闭的同时，控制平面从 Endpoints（以及，如果启用的话，EndpointSlice）对象中删除正在关闭的 Pod，这些对象表示具有配置选择器的服务。 ReplicaSet 和其他工作负载资源不再将关闭的 Pod 视为有效的、在用的副本。缓慢关闭的 Pod 无法继续为流量提供服务，因为负载均衡器（如服务代理）会在终止宽限期开始后立即将 Pod 从端点列表中删除。

Pod 开始关闭后不会获得流量。但我也发现 issue 表示在开始关闭 pod 到实际将其从端点移除之间确实存在延迟。

因此，我没有在 prestop 中将 500 返回给探针，而是在 prestop 中简单地休眠 60 秒。同时让 /health 检查返回 200 状态，告诉节点处于运行或停止状态。然后我做了一个推出，得到了以下结果：

b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717529.114602
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717530.59488
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717532.094305
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717533.5859041
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717535.086944
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717536.757241
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"running"}' at 1612717538.57626
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"prestop"}' at 1612717540.3773062
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"prestop"}' at 1612717543.2204192
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"prestop"}' at 1612717544.7196548
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"prestop"}' at 1612717546.550169
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"prestop"}' at 1612717548.01408
b'{"node_id":"a5c387f5df30","node_start_at":1612706851,"status":"prestop"}' at 1612717549.471266
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717551.387528
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717553.49984
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717555.404394
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717558.1528351
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717559.64011
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717561.294955
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717563.366436
b'{"node_id":"17733ca118f4","node_start_at":1612717537,"status":"running"}' at 1612717564.972768

调用 prestop 钩子后，a5c387f5df30 节点仍然有流量。大约 10 秒后，它再也没有收到流量。所以这与我在 prestop 中所做的任何事情都没有关系，这纯粹是延迟。

我使用 fargate 在 AWS EKS 上进行了此测试。不知道其他k8s平台情况如何。

【讨论】：

【解决方案2】：

这完全取决于您的应用在收到来自 kubernetes 的 SIGTERM 信号时正在做什么。为了优雅地关闭您的应用程序，您应该监听 SIGTERM 事件并干燥所有连接，除此之外，您应该开始从您的就绪端点回复 500，这将使 kubernetes 停止发送您的新请求。

有很多文章涉及到这个主题，你可以在谷歌上找到

https://www.driftrock.com/blog/kubernetes-zero-downtime-rolling-updates https://learnk8s.io/graceful-shutdown

【讨论】：