Pod 无法连接到内部 Kubernetes 服务答案

【问题标题】：Pods are unable to connect to internal Kubernetes servicePod 无法连接到内部 Kubernetes 服务
【发布时间】：2019-04-15 18:07:21
【问题描述】：

由于尝试访问 kubernetes 内部服务时出错，我在某些节点上遇到了 CoreDNS 问题，处于 Crashloopback 状态。

这是一个使用 Kubespray 部署的新 K8s 集群，网络层是 Weave，在 Openstack 上使用 Kubernetes 版本 1.12.5。我已经测试了与端点的连接，例如到达 10.2.70.14:6443 没有问题。但是从 pod 到 10.233.0.1:443 的 telnet 失败了。

提前感谢您的帮助

kubectl describe svc kubernetes
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.233.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         10.2.70.14:6443,10.2.70.18:6443,10.2.70.27:6443 + 2 more...
Session Affinity:  None
Events:            <none>

并且来自 CoreDNS 日志：

E0415 17:47:05.453762       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.233.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:05.456909       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.233.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused
E0415 17:47:06.453258       1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.233.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.233.0.1:443: connect: connection refused

此外，从一个有问题的节点检查 kube-proxy 的日志发现以下错误：

I0415 19:14:32.162909       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.36:6443
I0415 19:14:32.162979       1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.36:6443: 1 ActiveConn, 0 InactiveConn
I0415 19:14:32.162989       1 graceful_termination.go:160] Trying to delete rs: 10.233.0.1:443/TCP/10.2.70.18:6443
I0415 19:14:32.163017       1 graceful_termination.go:171] Not deleting, RS 10.233.0.1:443/TCP/10.2.70.18:6443: 1 ActiveConn, 0 InactiveConn
E0415 19:14:32.215707       1 proxier.go:430] Failed to execute iptables-restore for nat: exit status 1 (iptables-restore: line 7 failed
)

【问题讨论】：

您能否通过kubectl get pods -all-namespaces检查主服务器。请检查 coredns-pods 的状态。如果 STATUS 是 ContainerCreating 您可能必须删除它们，从而生成新的。
coredns 的状态是 Crashloopback，我的 Pod 都没有在 ContainerCreating 中
我也有同样的问题？你是怎么解决这个问题的？
知道了，添加我的解决方案作为答案

标签： kubernetes kubespray

【解决方案1】：

我遇到了完全相同的问题，结果证明我的 kubespray 配置错误。尤其是nginx入口设置ingress_nginx_host_network

你必须设置ingress_nginx_host_network: true（默认为false）

如果您不想重新运行整个 kubespray 脚本，请编辑 nginx 入口守护程序集

$ kubectl -n ingress-nginx edit ds ingress-nginx-controller

在命令行中添加--report-node-internal-ip-address：

spec:
  container:
      args:
       - /nginx-ingress-controller
       - --configmap=$(POD_NAMESPACE)/ingress-nginx
       - --tcp-services-configmap=$(POD_NAMESPACE)/tcp-services
       - --udp-services-configmap=$(POD_NAMESPACE)/udp-services
       - --annotations-prefix=nginx.ingress.kubernetes.io
       - --report-node-internal-ip-address # <- new

将以下两个属性设置在同一级别，例如serviceAccountName: ingress-nginx：

serviceAccountName: ingress-nginx
hostNetwork: true # <- new
dnsPolicy: ClusterFirstWithHostNet  # <- new

然后保存退出:wq，查看pod状态kubectl get pods --all-namespaces。

来源： https://github.com/kubernetes-sigs/kubespray/issues/4357

【讨论】：