【发布时间】:2020-11-09 00:38:06
【问题描述】:
我正在尝试解决我们的 K8 集群 v1.19 中的 DNS 问题。有 3 个节点(1 个控制器,2 个工作器)都运行 vanilla Ubuntu 20.04,使用带有 Metallb 的 Calico 网络进行入站负载平衡。这一切都是在本地托管的,并且可以完全访问互联网。在它前面还有一个代理服务器 (Traefik),负责处理通往 K8 集群和基础设施中其他服务的 SSL。
当我升级已经/正在连接到 redis pod 的 helm chart 时发生了这个问题,但过去 36 天一直很高兴运行。
在其中一个 pod 的日志中,它显示一个错误,它无法确定 redis pod(s) 在哪里:
2020-11-09 00:00:00 [1] [verbose]: [Cache] Attempting connection to redis.
2020-11-09 00:00:00 [1] [verbose]: [Cache] Successfully connected to redis.
2020-11-09 00:00:00 [1] [verbose]: [PubSub] Attempting connection to redis.
2020-11-09 00:00:00 [1] [verbose]: [PubSub] Successfully connected to redis.
2020-11-09 00:00:00 [1] [warn]: Secret key is weak. Please consider lengthening it for better security.
2020-11-09 00:00:00 [1] [verbose]: [Database] Connecting to database...
2020-11-09 00:00:00 [1] [info]: [Database] Successfully connected .
2020-11-09 00:00:00 [1] [verbose]: [Database] Ran 0 migration(s).
2020-11-09 00:00:00 [1] [verbose]: Sending request for public key.
Error: getaddrinfo EAI_AGAIN oct-2020-redis-master
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26) {
errno: -3001,
code: 'EAI_AGAIN',
syscall: 'getaddrinfo',
hostname: 'oct-2020-redis-master'
}
[ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN oct-2020-redis-master
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26)
Error: connect ETIMEDOUT
at Socket.<anonymous> (/app/node_modules/ioredis/built/redis/index.js:307:37)
at Object.onceWrapper (events.js:421:28)
at Socket.emit (events.js:315:20)
at Socket.EventEmitter.emit (domain.js:486:12)
at Socket._onTimeout (net.js:483:8)
at listOnTimeout (internal/timers.js:554:17)
at processTimers (internal/timers.js:497:7) {
errorno: 'ETIMEDOUT',
code: 'ETIMEDOUT',
syscall: 'connect'
}
我已经完成了https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/中列出的步骤
ubuntu@k8-01:~$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
ubuntu@k8-01:~$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-lfm5t 1/1 Running 17 37d
coredns-f9fd979d6-sw2qp 1/1 Running 18 37d
ubuntu@k8-01:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 10.244.210.238:34288 - 28733 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.001300712s
[INFO] 10.244.210.238:44532 - 12032 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.001279312s
[INFO] 10.244.210.235:44595 - 65094 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.000163001s
[INFO] 10.244.210.235:55945 - 20758 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.000141202s
ubuntu@k8-01:~$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default oct-2020-api ClusterIP 10.107.89.213 <none> 80/TCP 37d
default oct-2020-nginx-ingress-controller LoadBalancer 10.110.235.175 192.168.2.150 80:30194/TCP,443:31514/TCP 37d
default oct-2020-nginx-ingress-default-backend ClusterIP 10.98.147.246 <none> 80/TCP 37d
default oct-2020-redis-headless ClusterIP None <none> 6379/TCP 37d
default oct-2020-redis-master ClusterIP 10.109.58.236 <none> 6379/TCP 37d
default oct-2020-webclient ClusterIP 10.111.204.251 <none> 80/TCP 37d
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 37d
kube-system coredns NodePort 10.101.104.114 <none> 53:31245/UDP 15h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 37d
当我进入吊舱时:
/app # grep "nameserver" /etc/resolv.conf
nameserver 10.96.0.10
/app # nslookup
BusyBox v1.31.1 () multi-call binary.
Usage: nslookup [-type=QUERY_TYPE] [-debug] HOST [DNS_SERVER]
Query DNS about HOST
QUERY_TYPE: soa,ns,a,aaaa,cname,mx,txt,ptr,any
/app # ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10): 56 data bytes
^C
--- 10.96.0.10 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
/app # nslookup oct-20-redis-master
;; connection timed out; no servers could be reached
任何有关故障排除的想法将不胜感激。
【问题讨论】:
-
你的 kube-proxy 工作了吗? kubernetes.io/docs/tasks/debug-application-cluster/…
-
@MariuszK。 kube 代理正在工作。从集群外部,我可以访问端口 443 上的 Web 客户端 Pod,它可以正确呈现。
标签: kubernetes redis coredns