【发布时间】:2020-12-07 01:30:35
【问题描述】:
我正在尝试使用 google 模块配置带有 windows node_pool 的 GKE 集群,我正在调用模块
source = "terraform-google-modules/kubernetes-engine/google//modules/beta-private-cluster-update-variant"
version = "9.2.0"
我必须为 GKE 所需的 linux 池和我们需要的 windows 定义两个池,terraform 始终成功配置 linux node_pool 但无法配置 windows 1 和错误消息
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m31s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m41s elapsed]
module.gke.google_container_cluster.primary: Still modifying... [id=projects/uk-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev, 24m51s elapsed]
module.gke.google_container_cluster.primary: Modifications complete after 24m58s [id=projects/xx-xxx-xx-xxx-b821/locations/europe-west2/clusters/gke-nonpci-dev]
module.gke.google_container_node_pool.pools["windows-node-pool"]: Creating...
Error: error creating NodePool: googleapi: Error 400: Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA., badRequest
on .terraform\modules\gke\terraform-google-kubernetes-engine-9.2.0\modules\beta-private-cluster-update-variant\cluster.tf line 341, in resource "google_container_node_pool" "pools":
341: resource "google_container_node_pool" "pools" {
我尝试了很多地方来设置这个元数据值,但我不明白:
从 terraform 方面:
我尝试了很多地方在模块本身的 node_config 范围内或在我调用模块的 main.tf 文件中添加此元数据我尝试将其添加到 node_pools 列表的 windows node_pool 范围但它没有接受它并显示此处不应设置 WORKLOAD IDENTITY 的消息
我也尝试设置enable_shielded_nodes = false,但这并没有太大帮助。
我尝试测试它是否可行,即使通过命令行这是我的命令行
C:\>gcloud container node-pools --region europe-west2 list
NAME MACHINE_TYPE DISK_SIZE_GB NODE_VERSION
default-node-pool-d916 n1-standard-2 100 1.17.9-gke.600
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2
WARNING: Starting in 1.12, new node pools will be created with their legacy Compute Engine instance metadata APIs disabled by default. To create a node pool with legacy instance metadata endpoints disabled, run `node-pools create` with the flag `--metadata disable-legacy-endpoints=true`.
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Workload Identity is not supported on Windows nodes. Create the nodepool without workload identity by specifying --workload-metadata=GCE_METADATA.
C:\>gcloud container node-pools --region europe-west2 create window-node-pool --cluster=gke-nonpci-dev --image-type=WINDOWS_SAC --no-enable-autoupgrade --machine-type=n1-standard-2 --workload-metadata=GCE_METADATA --metadata disable-legacy-endpoints=true
This will disable the autorepair feature for nodes. Please see https://cloud.google.com/kubernetes-engine/docs/node-auto-repair for more information on node autorepairs.
ERROR: (gcloud.container.node-pools.create) ResponseError: code=400, message=Service account "874988475980-compute@developer.gserviceaccount.com" does not exist.
C:\>gcloud auth list
Credentialed Accounts
ACTIVE ACCOUNT
* tf-xxx-xxx-xx-xxx@xx-xxx-xx-xxx-xxxx.iam.gserviceaccount.com
这个来自运行 gcloud auth list 的服务帐户是我正在运行 terraform 的服务帐户,但我不知道错误消息中的这个服务帐户来自哪里,即使尝试通过命令行创建 Windows 节点池,如上所示也没有工作我有点卡住了,我不知道该怎么办。
由于模块 9.2.0 是我们之前设置的所有基于 linux 的集群的稳定模块,因此我认为这可能是 windows node_pool 的旧版本,我使用11.0.0 来查看这是否会它有任何不同,但最终会出现不同的错误
module.gke.google_container_node_pool.pools["default-node-pool"]: Refreshing state... [id=projects/uk-tix-p1-npe-b821/locations/europe-west2/clusters/gke-nonpci-dev/nodePools/default-node-pool-d916]
Error: failed to execute ".terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_delete_default_kube_dns_configmap/terraform-google-gcloud-1.4.1/scripts/check_env.sh: %1 is not a valid Win32 application.
on .terraform\modules\gke.gcloud_delete_default_kube_dns_configmap\terraform-google-gcloud-1.4.1\main.tf line 70, in data "external" "env_override":
70: data "external" "env_override" {
Error: failed to execute ".terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh": fork/exec .terraform/modules/gke.gcloud_wait_for_cluster/terraform-google-gcloud-1.3.0/scripts/check_env.sh: %1 is not a valid Win32 application.
on .terraform\modules\gke.gcloud_wait_for_cluster\terraform-google-gcloud-1.3.0\main.tf line 70, in data "external" "env_override":
70: data "external" "env_override" {
这就是我设置 node_pools 参数的方式
node_pools = [
{
name = "linux-node-pool"
machine_type = var.nodepool_instance_type
min_count = 1
max_count = 10
disk_size_gb = 100
disk_type = "pd-standard"
image_type = "COS"
auto_repair = true
auto_upgrade = true
service_account = google_service_account.gke_cluster_sa.email
preemptible = var.preemptible
initial_node_count = 1
},
{
name = "windows-node-pool"
machine_type = var.nodepool_instance_type
min_count = 1
max_count = 10
disk_size_gb = 100
disk_type = "pd-standard"
image_type = var.nodepool_image_type
auto_repair = true
auto_upgrade = true
service_account = google_service_account.gke_cluster_sa.email
preemptible = var.preemptible
initial_node_count = 1
}
]
cluster_resource_labels = var.cluster_resource_labels
# health check and webhook firewall rules
node_pools_tags = {
all = [
"xx-xxx-xxx-local-xxx",
]
}
node_pools_metadata = {
all = {
// workload-metadata = "GCE_METADATA"
}
linux-node-pool = {
ssh-keys = join("\n", [for user, key in var.node_ssh_keys : "${user}:${key}"])
block-project-ssh-keys = true
}
windows-node-pool = {
workload-metadata = "GCE_METADATA"
}
}
- 这是一个共享 VPC,我在其中为集群配置了集群版本:1.17.9-gke.600
【问题讨论】:
标签: kubernetes terraform google-kubernetes-engine