【发布时间】:2019-09-11 21:49:41
【问题描述】:
我在 K8s 上有一个多部署应用程序,突然 DNS 随机失败,因为其中一个组件(部署程序)。如果我使用另一个组件(网桥)的服务名称或服务 IP 随机运行 curl 命令,则从部署程序 pod 内部得到:
curl -v http://bridge:9998
* Could not resolve host: bridge
* Expire in 200 ms for 1 (transfer 0x555f0636fdd0)
* Closing connection 0
curl: (6) Could not resolve host: bridge
但如果我使用桥接 pod 的 IP,它可以解析并连接:
curl -v http://10.36.0.25:9998
* Expire in 0 ms for 6 (transfer 0x558d6c3eadd0)
* Trying 10.36.0.25...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x558d6c3eadd0)
* Connected to 10.36.0.25 (10.36.0.25) port 9998 (#0)
> GET / HTTP/1.1
> Host: 10.36.0.25:9998
> User-Agent: curl/7.64.0
> Accept: */*
>
< HTTP/1.1 200 OK
< X-Powered-By: Express
< Accept-Ranges: bytes
< Cache-Control: public, max-age=0
< Last-Modified: Mon, 08 Apr 2019 14:06:42 GMT
< ETag: W/"179-169fd45c550"
< Content-Type: text/html; charset=UTF-8
< Content-Length: 377
< Date: Wed, 11 Sep 2019 08:25:24 GMT
< Connection: keep-alive
还有我的部署器 yaml 文件:
---
apiVersion: v1
kind: Service
metadata:
annotations:
Process: deployer
creationTimestamp: null
labels:
io.kompose.service: deployer
name: deployer
spec:
ports:
- name: "8004"
port: 8004
targetPort: 8004
selector:
io.kompose.service: deployer
status:
loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
Process: deployer
creationTimestamp: null
labels:
io.kompose.service: deployer
name: deployer
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: deployer
spec:
containers:
- args:
- bash
- -c
- lttng create && python src/rest.py
env:
- name: CONFIG_OVERRIDE
value: {{ .Values.CONFIG_OVERRIDE | quote}}
- name: WWS_RTMP_SERVER_URL
value: {{ .Values.WWS_RTMP_SERVER_URL | quote}}
- name: WWS_DEPLOYER_DEFAULT_SITE
value: {{ .Values.WWS_DEPLOYER_DEFAULT_SITE | quote}}
image: {{ .Values.image }}
name: deployer
readinessProbe:
exec:
command:
- ls
- /tmp
initialDelaySeconds: 5
periodSeconds: 5
ports:
- containerPort: 8004
resources:
requests:
cpu: 0.1
memory: 250Mi
limits:
cpu: 2
memory: 5Gi
restartPolicy: Always
imagePullSecrets:
- name: deployersecret
status: {}
正如我提到的,这仅发生在这个组件上,我从其他 pod 中运行了完全相同的命令,它可以正常工作。知道如何解决这个问题吗?
更新
由于人们犯了这个错误,我更多地描述了这种情况:上面的 yaml 文件属于面临这个问题的组件(其他组件正常工作),而 curl 命令是我从这个问题的 pod 中运行的命令。如果我在另一个 pod 中运行完全相同的命令,它会解决。 以下是目标的部署和服务供您参考:
apiVersion: v1
kind: Service
metadata:
annotations:
Process: bridge
creationTimestamp: null
labels:
io.kompose.service: bridge
name: bridge
spec:
ports:
- name: "9998"
port: 9998
targetPort: 9998
- name: "9226"
port: 9226
targetPort: 9226
- name: 9226-udp
port: 9226
protocol: UDP
targetPort: 9226
selector:
io.kompose.service: bridge
status:
loadBalancer: {}
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
annotations:
Process: bridge
creationTimestamp: null
labels:
io.kompose.service: bridge
name: bridge
spec:
replicas: 1
strategy: {}
template:
metadata:
creationTimestamp: null
labels:
io.kompose.service: bridge
spec:
containers:
- args:
- bash
- -c
- npm run startDebug
env:
- name: NODE_ENV
value: {{ .Values.NODE_ENV | quote }}
image: {{ .Values.image }}
name: bridge
readinessProbe:
httpGet:
port: 9998
initialDelaySeconds: 3
periodSeconds: 15
ports:
- containerPort: 9998
- containerPort: 9226
- containerPort: 9226
protocol: UDP
resources:
requests:
cpu: 0.1
memory: 250Mi
limits:
cpu: 2
memory: 5Gi
restartPolicy: Always
imagePullSecrets:
- name: bridgesecret
status: {}
【问题讨论】:
标签: kubernetes dns