【问题标题】:Unable to create windows nodepool on GKE cluster with google terraform GKE module无法使用 google terraform GKE 模块在 GKE 集群上创建 Windows 节点池
【发布时间】:2020-12-07 01:30:35
【问题描述】:

我正在尝试使用 google 模块配置带有 windows node_pool 的 GKE 集群,我正在调用模块

  source  = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster-update-variant"
  version = "9.2.0"

我必须为 GKE 所需的 linux 池和我们需要的 windows 定义两个池,terraform 始终成功配置 linux node_pool 但无法配置 windows 1 和错误消息

module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m31s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m41s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m51s elapsed]
module.gke.google_container_cluster.primary: Modifications complete after 24m58s [id=projects/xx-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev]
module.gke.google_container_node_pool.pools["windows-node-pool"]: Creating...

Error: error creating NodePool: googleapi: Error 400: Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA., badRequest

  on .terraform\modules\gke\terraform-google-kubernetes-engine-9.2.0\modules\beta-private-cluster-update-variant\cluster.tf line 341, in resource "google_container_node_pool" "pools":
 341: resource "google_container_node_pool" "pools" {

我尝试了很多地方来设置这个元数据值,但我不明白:

从 terraform 方面:

我尝试了很多地方在模块本身的 node_config 范围内或在我调用模块的 main.tf 文件中添加此元数据我尝试将其添加到 node_pools 列表的 windows node_pool 范围但它没有接受它并显示此处不应设置 WORKLOAD IDENTITY 的消息

我也尝试设置enable_shielded_nodes = false,但这并没有太大帮助。

我尝试测试它是否可行,即使通过命令行这是我的命令行

C:\>gcloud container node-pools --region europe-west2 list
NAME                    MACHINE_TYPE   DISK_SIZE_GB  NODE_VERSION
default-node-pool-d916  n1-standard-2  100           1.17.9-gke.600

 
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2
WARNING: Starting in 1.12, new node pools will be created with their legacy Compute Engine instance metadata APIs disabled by default. To create a node pool with legacy instance metadata endpoints disabled, run `node-pools create` with the flag `--metadata disable-legacy-endpoints=true`.
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA.

C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2 --workload-metadata=GCE_METADATA --metadata disable-legacy-endpoints=true
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Service account "874988475980-compute@developer.gserviceaccount.com" does not exist.

C:\>gcloud auth list
                       Credentialed Accounts
ACTIVE  ACCOUNT
*       tf-xxx-xxx-xx-xxx@xx-xxx-xx-xxx-xxxx.iam.gserviceaccount.com

这个来自运行 gcloud auth list 的服务帐户是我正在运行 terraform 的服务帐户,但我不知道错误消息中的这个服务帐户来自哪里,即使尝试通过命令行创建 Windows 节点池,如上所示也没有工作我有点卡住了,我不知道该怎么办。

由于模块 9.2.0 是我们之前设置的所有基于 linux 的集群的稳定模块,因此我认为这可能是 windows node_pool 的旧版本,我使用11.0.0 来查看这是否会它有任何不同,但最终会出现不同的错误

module.gke.google_container_node_pool.pools["default-node-pool"]: Refreshing state... [id=projects/uk-tix-p1-npe-b821/locations/europe-west2/clusters/gke-nonpci-dev/nodePools/default-node-pool-d916]

Error: failed to execute ".terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh: %1 is not a valid Win32 application.

  on .terraform\modules\gke.gcloud_delete_default_kube_dns_configmap\terraform-google-gcloud-1.4.1\main.tf line 70, in data "external" "env_override":
  70: data "external" "env_override" {

Error: failed to execute ".terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh: %1 is not a valid Win32 application.

  on .terraform\modules\gke.gcloud_wait_for_cluster\terraform-google-gcloud-1.3.0\main.tf line 70, in data "external" "env_override":
  70: data "external" "env_override" {

这就是我设置 node_pools 参数的方式


  node_pools = [
    {
      name               = "linux-node-pool"
      machine_type       = var.nodepool_instance_type
      min_count          = 1
      max_count          = 10
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = "COS"                                  
      auto_repair        = true                                   
      auto_upgrade       = true                                 
      service_account    = google_service_account.gke_cluster_sa.email
      preemptible        = var.preemptible
      initial_node_count = 1
    },
    {
      name               = "windows-node-pool"
      machine_type       = var.nodepool_instance_type
      min_count          = 1
      max_count          = 10
      disk_size_gb       = 100
      disk_type          = "pd-standard"
      image_type         = var.nodepool_image_type                
      auto_repair        = true                                   
      auto_upgrade       = true                                   
      service_account    = google_service_account.gke_cluster_sa.email
      preemptible        = var.preemptible
      initial_node_count = 1
  
    }
  ]

  cluster_resource_labels = var.cluster_resource_labels           

  # health check and webhook firewall rules
  node_pools_tags = {
    all = [
      "xx-xxx-xxx-local-xxx",
    ]
  }

  node_pools_metadata = {
    all = {
//      workload-metadata = "GCE_METADATA"
    }

    linux-node-pool = {
      ssh-keys = join("\n", [for user, key in var.node_ssh_keys : "${user}:${key}"])
      block-project-ssh-keys = true
    }

    windows-node-pool = {
      workload-metadata = "GCE_METADATA"
    }

  }

  • 这是一个共享 VPC,我在其中为集群配置了集群版本:1.17.9-gke.600

【问题讨论】:

    标签: kubernetes terraform google-kubernetes-engine


    【解决方案1】:

    结帐https://github.com/terraform-google-modules/terraform-google-kubernetes-engine/issues/632 以获取解决方案。

    错误消息不明确,GKE 有一个内部错误来跟踪此问题。我们将尽快改进错误消息。

    【讨论】:

      猜你喜欢
      • 2020-03-19
      • 2021-09-30
      • 2022-08-20
      • 2021-06-23
      • 2020-08-20
      • 1970-01-01
      • 2018-10-09
      • 2020-09-15
      • 2022-01-15
      相关资源
      最近更新 更多