通过集群 API 构建的 Rancher Kube API 健康检查失败答案

【问题标题】：Rancher Kube API Health Check Failure via Cluster API Build通过集群 API 构建的 Rancher Kube API 健康检查失败
【发布时间】：2019-10-17 16:19:33
【问题描述】：

我正在尝试使用 API 调用 + Ansible 创建一个新集群，并将 AWS 作为云提供商。我已经生成了所需的节点模板并开始触发构建。

当我使用已构建的节点模板从 UI 触发集群创建时，集群创建按预期成功。
当我通过代码触发集群创建时，集群部署了大部分集群，但运行状况检查失败。

我曾尝试通过 UI 构建 - 每次都能正常工作。

我也试过更改API调用参数，但都没有生效。

      shell: "`curl -s 'https://{{ rancher_server }}/v3/cluster' -H 'content-type: application/json' -H 'Authorization: Bearer {{ racherlogintoken.stdout }}' --data-binary '{\"dockerRootDir\":\"/var/lib/docker\",\"enableNetworkPolicy\":false,\"type\":\"cluster\",\"rancherKubernetesEngineConfig\":{\"addonJobTimeout\":30,\"ignoreDockerVersion\":true,\"kubernetesVersion\": \"v1.11.5-rancher1-1\",\"sshAgentAuth\":false,\"type\":\"rancherKubernetesEngineConfig\",\"authentication\":{\"type\":\"authnConfig\",\"strategy\":\"x509\"},\"network\":{\"type\":\"networkConfig\",\"plugin\":\"calico\"}, \"cloudProvider\":{\"awsCloudProvider\":{\"type\":\"/v3/schemas/awsCloudProvider\"}, \"name\":\"aws\", \"type\":\"/v3/schemas/cloudProvider\"},\"monitoring\":{\"type\":\"monitoringConfig\",\"provider\":\"metrics-server\"}, \"services\":{\"type\":\"rkeConfigServices\",\"kubeApi\":{\"podSecurityPolicy\":false,\"type\":\"kubeAPIService\"},\"etcd\":{\"snapshot\":false,\"type\":\"etcdService\",\"extraArgs\":{\"heartbeat-interva\":500,\"election-timeout\":5000}}}},\"name\":\"{{ mdio_cluster_name }}\"}' --insecure` | jq -r .data[].id"

Errors:

2019/06/01 07:40:28 [ERROR] cluster [c-sgd2w] provisioning: [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [x.x.x.x]: Get https://localhost:6443/healthz: read tcp [::1]:60288->[::1]:6443: read: connection reset by peer, log: I0601 07:40:24.813709       1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
2019/06/01 07:40:28 [ERROR] ClusterController c-sgd2w [cluster-provisioner-controller] failed with : [controlPlane] Failed to bring up Control Plane: Failed to verify healthcheck: Failed to check https://localhost:6443/healthz for service [kube-apiserver] on host [x.x.x.x]: Get https://localhost:6443/healthz: read tcp [::1]:60288->[::1]:6443: read: connection reset by peer, log: I0601 07:40:24.813709       1 plugins.go:161] Loaded 6 validating admission controller(s) successfully in the following order: LimitRanger,ServiceAccount,Priority,PersistentVolumeClaimResize,ValidatingAdmissionWebhook,ResourceQuota.
2019/06/01 07:40:30 [INFO] 2019/06/01 07:40:30 http: multiple response.WriteHeader calls
2019/06/01 07:40:40 [INFO] 2019/06/01 07:40:40 http: multiple response.WriteHeader calls
2019/06/01 07:40:50 [INFO] 2019/06/01 07:40:50 http: multiple response.WriteHeader calls

【问题讨论】：

标签： rancher

【解决方案1】：

看起来是网络“印花布”导致了问题。用过的“运河”，一切都变得更好了。

【讨论】：