【问题标题】:TLS handshake timeout error while validating cluster on gcp using kops使用 kops 在 gcp 上验证集群时出现 TLS 握手超时错误
【发布时间】:2021-10-24 08:48:11
【问题描述】:

我想使用 kops 在 gcp 上创建集群。

为此,我首先创建了 gcs 存储桶。然后将 KOPS_STATE_STORE 的值导出为

export KOPS_STATE_STORE=gs://kubernetes-cluster-dev/

之后,通过执行命令在bucket中创建集群对象和实例组

kops create cluster simple.k8s.local --zones asia-southeast2-a --state ${KOPS_STATE_STORE}/ --project=${PROJECT}

现在我运行命令来创建集群

kops update cluster --name simple.k8s.local --yes --admin

它给了我以下输出

I0823 18:21:49.011726 3198907 featureflag.go:165] FeatureFlag "AlphaAllowGCE"=true
I0823 18:21:49.769208 3198907 gce_cloud.go:125] Will load GOOGLE_APPLICATION_CREDENTIALS from siminvest-3473d78328bd.json
I0823 18:21:52.215128 3198907 apply_cluster.go:483] Gossip DNS: skipping DNS validation
W0823 18:21:52.295506 3198907 external_access.go:36] TODO: Harmonize gcemodel ExternalAccessModelBuilder with awsmodel
W0823 18:21:52.295541 3198907 firewall.go:35] TODO: Harmonize gcemodel with awsmodel for firewall - GCE model is way too open
W0823 18:21:52.295554 3198907 firewall.go:64] Adding overlay network for X -> node rule - HACK
W0823 18:21:52.295568 3198907 firewall.go:118] Adding overlay network for X -> master rule - HACK
W0823 18:21:52.950612 3198907 autoscalinggroup.go:117] enabling storage-rw for etcd backups
I0823 18:21:52.950735 3198907 autoscalinggroup.go:153] VMs using Service Account: default
I0823 18:21:52.950765 3198907 autoscalinggroup.go:161] gsa: default
I0823 18:21:52.950848 3198907 autoscalinggroup.go:153] VMs using Service Account: default
I0823 18:21:52.950873 3198907 autoscalinggroup.go:161] gsa: default
I0823 18:21:58.715950 3198907 executor.go:111] Tasks: 0 done / 58 total; 37 can run
I0823 18:21:59.298555 3198907 executor.go:111] Tasks: 37 done / 58 total; 17 can run
I0823 18:22:00.147597 3198907 executor.go:111] Tasks: 54 done / 58 total; 2 can run
I0823 18:22:02.456113 3198907 executor.go:111] Tasks: 56 done / 58 total; 2 can run
I0823 18:22:02.946997 3198907 executor.go:111] Tasks: 58 done / 58 total; 0 can run
I0823 18:22:02.991855 3198907 update_cluster.go:313] Exporting kubecfg for cluster
kOps has set your kubectl context to simple.k8s.local

Cluster is starting.  It should be ready in a few minutes.

Suggestions:
 * validate cluster: kops validate cluster --wait 10m
 * list nodes: kubectl get nodes --show-labels
 * ssh to the master: ssh -i ~/.ssh/id_rsa ubuntu@api.simple.k8s.local
 * the ubuntu user is specific to Ubuntu. If not using Ubuntu please use the appropriate user based on your OS.
 * read about installing addons at: https://kops.sigs.k8s.io/operations/addons.

现在我在一段时间后运行命令来验证集群为

kops validate cluster --wait 10m

但它给了我错误

I0823 18:22:58.748559 3200157 featureflag.go:165] FeatureFlag "AlphaAllowGCE"=true
Using cluster from kubectl context: simple.k8s.local

I0823 18:22:59.607767 3200157 gce_cloud.go:125] Will load GOOGLE_APPLICATION_CREDENTIALS from siminvest-3473d78328bd.json
Validating cluster simple.k8s.local

W0823 18:23:11.030635 3200157 validate_cluster.go:173] (will retry): unexpected error during validation: error listing nodes: Get "https://34.101.133.0/api/v1/nodes": net/http: TLS handshake timeout

有人可以帮我解决这个问题吗?

【问题讨论】:

    标签: kubernetes google-cloud-platform kops


    【解决方案1】:

    我尝试使用 doc 进行复制,但在使用命令 kops validate cluster --wait 10m 时抛出了类似的错误。
    但是 TLS 超时错误后的错误是“验证期间出现意外错误:错误列出节点:未授权”。
    然后我停止了该命令,并能够在此 stack post 的解决方案的帮助下解决此错误,现在当我运行 kops validate cluster 时,我得到了预期的输出。

    【讨论】:

    • 在运行命令 kops create cluster cname 之后,我尝试运行 kops export kubecfg --admin 命令,但由于需要 --name ,所以我通过了 --name cname,但它抛出错误,因为错误获取入口状态:获取 ForwardingRule“api-simple-k8s-local”时出错:googleapi:错误 404:找不到资源“projects/siminvest/regions/asia-southeast2/forwardingRules/api-simple-k8s-local”
    • 我被抛出了同样的错误。 kops create cluster CNAME --zones <ZONE NAME> --state ${KOPS_STATE_STORE} --project=${PROJECT} 命令实际上并没有在 GCE 中创建任何实例或其他云对象,为此,请运行 kops update cluster CNAME --yes 命令,然后使用 kops export kubecfg CNAME --admin 导出 kubeconfig 文件。默认情况下,配置将保存到用户的 $HOME/.kube/config 文件中。您能否确认上述解决方案是否解决了您的集群验证问题?
    • 我仍然遇到同样的问题。
    • 更新命令后能否查看新创建的负载均衡器here和防火墙规则here
    • 是的,我能够获取新创建的 lb 和防火墙规则,但在验证集群时仍然出现 TLS handshake timeout 错误
    猜你喜欢
    • 2021-11-14
    • 2021-12-27
    • 2021-12-21
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-03-21
    • 2014-07-07
    • 2018-11-08
    相关资源
    最近更新 更多