【发布时间】:2021-09-15 10:49:13
【问题描述】:
我为 pod 中长时间运行的应用程序设置了一个活跃度探针。它在一天内失败了几次,导致 Pod 重新启动了几次。没有就绪探测。
livenessProbe:
httpGet:
path: /
port: http
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 20
periodSeconds: 20
successThreshold: 1
failureThreshold: 3
进一步检查应用程序代码或 docker 映像没有发现异常。所以我禁用了活性探测,并使用连接到网络的 PC 上的 python 脚本每 10 秒手动探测一次 NodePort 服务。手动探测虽然比活性探测更频繁、更严格,但成功并没有失败。每次ping大约持续200~400ms
手动探测与设置的活性探测大致相同
timeoutSeconds: 500ms
periodSeconds: 10
successThreshold: 1
failureThreshold: 1
为什么在 liveness 探测失败时它成功了?这是否表明存在 k8s 网络问题?
吊舱清单:
kind: Pod
apiVersion: v1
metadata:
name: pypi-pypiserver-74b689df7-rh9bm
namespace: default
labels:
app.kubernetes.io/instance: pypi
app.kubernetes.io/name: pypiserver
spec:
volumes:
- name: secrets
secret:
secretName: pypi-pypiserver
defaultMode: 420
- name: packages
persistentVolumeClaim:
claimName: pypi-pypiserver
- name: default-token-cx7m7
secret:
secretName: default-token-cx7m7
defaultMode: 420
containers:
- name: pypiserver
image: 'registry.digitalocean.com/evergreen/pypiserver:latest'
args:
- run
- '--passwords=.'
- '--authenticate=.'
- '--port=8080'
- '--welcome=/dev/null'
- '--server=wsgiref'
- /data/packages
ports:
- name: http
containerPort: 8080
protocol: TCP
resources:
limits:
cpu: 1600m
memory: 1Gi
requests:
cpu: 400m
memory: 256Mi
volumeMounts:
- name: packages
mountPath: /data/packages
mountPropagation: None
- name: secrets
readOnly: true
mountPath: /config
- name: default-token-cx7m7
readOnly: true
mountPath: /var/run/secrets/kubernetes.io/serviceaccount
livenessProbe:
httpGet:
path: /
port: http
scheme: HTTP
initialDelaySeconds: 30
timeoutSeconds: 10
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
nodeSelector:
doks.digitalocean.com/node-pool: k8s-node-pool-hive-dev-2
serviceAccountName: default
serviceAccount: default
nodeName: k8s-node-pool-hive-dev-2-8adyc
securityContext:
runAsUser: 9898
runAsGroup: 9898
fsGroup: 9898
imagePullSecrets:
- name: evergreen
schedulerName: default-scheduler
tolerations:
- key: node.kubernetes.io/not-ready
operator: Exists
effect: NoExecute
tolerationSeconds: 300
- key: node.kubernetes.io/unreachable
operator: Exists
effect: NoExecute
tolerationSeconds: 300
priority: 0
enableServiceLinks: true
preemptionPolicy: PreemptLowerPriority
【问题讨论】:
-
尝试在一个 pod 中运行相同的脚本,最好是在同一个 pod 的 liveness 探测失败。检查您是否有相同的结果。网络应该不是问题,因为 pod 基本上是 ping 自己。
-
@PawełGrondal 如果 Pod 内部自 ping 失败而节点端口 ping 成功,这意味着什么?
-
探测失败的确切日志是什么? pod 是否有一个名为“http”的端口,您可以在此处粘贴 pod yaml 吗?
-
常见的:liveness probe failed context 超过了最后期限,你一定已经看过一千次了。编辑 Q,添加 pod yaml
-
.spec.containers.ports.protocol是 TCP,但.spec.containers.livenessProbe.httpGet.scheme是 HTTP。你确定这是正确的吗?
标签: kubernetes kubernetes-pod livenessprobe