【发布时间】:2020-01-03 18:00:39
【问题描述】:
由于kubernetes官方文档
Rolling updates allow Deployments' update to take place with zero downtime by incrementally updating Pods instances with new ones
我试图使用Rolling Update 策略执行零停机时间更新,这是在 kube 集群中更新应用程序的推荐方法。
官方参考:
https://kubernetes.io/docs/tutorials/kubernetes-basics/update/update-intro/
但是我在执行它时对定义有点困惑:应用程序的停机时间仍然会发生。这是我一开始的集群信息,如下图:
liguuudeiMac:~ liguuu$ kubectl get all
NAME READY STATUS RESTARTS AGE
pod/ubuntu-b7d6cb9c6-6bkxz 1/1 Running 0 3h16m
pod/webapp-deployment-6dcf7b88c7-4kpgc 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-4vsch 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-7xzsk 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-jj8vx 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-qz2xq 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s7rtt 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-s88tb 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-snmw5 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-v287f 1/1 Running 0 3m52s
pod/webapp-deployment-6dcf7b88c7-vd4kb 1/1 Running 0 3m52s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3h16m
service/tc-webapp-service NodePort 10.104.32.134 <none> 1234:31234/TCP 3m52s
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/ubuntu 1/1 1 1 3h16m
deployment.apps/webapp-deployment 10/10 10 10 3m52s
NAME DESIRED CURRENT READY AGE
replicaset.apps/ubuntu-b7d6cb9c6 1 1 1 3h16m
replicaset.apps/webapp-deployment-6dcf7b88c7 10 10 10 3m52s
deployment.apps/webapp-deployment是一个基于tomcat的webapp应用,映射到Pods的Servicetc-webapp-service包含tomcat容器(完整的部署配置文件在文末)。 deployment.apps/ubuntu 只是集群中的一个独立应用程序,它将每秒对tc-webapp-service 执行无限http 请求,这样我就可以跟踪webapp-deployment 的所谓滚动更新状态,在ubuntu 容器中运行的命令是可能如下(每0.01秒无限循环curl命令):
for ((;;)); do curl -sS -D - http://tc-webapp-service:1234 -o /dev/null | grep HTTP; date +"%Y-%m-%d %H:%M:%S"; echo ; sleep 0.01 ; done;
ubuntu app 的输出(一切正常):
...
HTTP/1.1 200
2019-08-30 07:27:15
...
HTTP/1.1 200
2019-08-30 07:27:16
...
然后我尝试将tomcat图像的标签从8-jdk8更改为8-jdk11。注意deployment.apps/webapp-deployment的滚动更新策略已经正确配置,maxSurge0和maxUnavailable9。(如果这两个属性是默认的,结果相同)
...
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8 -> tomcat:8-jdk11
...
然后,ubuntu app的输出:
HTTP/1.1 200
2019-08-30 07:47:43
curl: (56) Recv failure: Connection reset by peer
2019-08-30 07:47:43
HTTP/1.1 200
2019-08-30 07:47:44
如上图,一些http请求失败,这无疑是在对kube集群中的应用进行滚动更新时应用的中断。
不过,我也可以在Scaling down中回放上面提到的情况(中断),命令如下(从10到2):
kubectl scale deployment.apps/tc-webapp-service --replicas=2
在进行了上述测试之后,我想知道所谓的Zero downtime 是否真的意味着。虽然模拟 http 请求的方式有点棘手,但对于一些设计为能够在一秒钟内处理数千、数百万个请求的应用程序来说,这种情况非常正常。
环境:
liguuudeiMac:cacheee liguuu$ minikube version
minikube version: v1.3.1
commit: ca60a424ce69a4d79f502650199ca2b52f29e631
liguuudeiMac:cacheee liguuu$ kubectl version
Client Version: version.Info{Major:"1", Minor:"14", GitVersion:"v1.14.3", GitCommit:"5e53fd6bc17c0dec8434817e69b04a25d8ae0ff0", GitTreeState:"clean", BuildDate:"2019-06-06T01:44:30Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"15", GitVersion:"v1.15.2", GitCommit:"f6278300bebbb750328ac16ee6dd3aa7d3549568", GitTreeState:"clean", BuildDate:"2019-08-05T09:15:22Z", GoVersion:"go1.12.5", Compiler:"gc", Platform:"linux/amd64"}
部署和服务配置:
# Service
apiVersion: v1
kind: Service
metadata:
name: tc-webapp-service
spec:
type: NodePort
selector:
appName: tc-webapp
ports:
- name: tc-svc
protocol: TCP
port: 1234
targetPort: 8080
nodePort: 31234
---
# Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: webapp-deployment
spec:
replicas: 10
selector:
matchLabels:
appName: tc-webapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 9
# Pod Templates
template:
metadata:
labels:
appName: tc-webapp
spec:
containers:
- name: tc-part
image: tomcat:8-jdk8
ports:
- containerPort: 8080
livenessProbe:
tcpSocket:
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
readinessProbe:
httpGet:
scheme: HTTP
port: 8080
path: /
initialDelaySeconds: 5
periodSeconds: 1
【问题讨论】:
-
在滚动更新期间,会创建新的 Pod,当它们准备就绪时,流量会转移到新的 Pod,而旧的 Pod 会被终止。如果您提交了一个由旧 pod 处理的请求,并且在 pod 完成之前,请求流量被转移到新 pod 并且旧 pod 被杀死,那么该请求将收到连接重置错误。我想知道这是否是你所看到的。从技术上讲,那里没有停机时间,系统永远不会停止接受请求。
标签: deployment kubernetes replicaset