【问题标题】:Creating AKS cluster with Managed Identity to give it access to a subnet - Error: authorization.RoleAssignmentsClient使用托管标识创建 AKS 群集以授予其访问子网的权限 - 错误:authorization.RoleAssignmentsClient
【发布时间】:2021-03-26 07:05:07
【问题描述】:

我将 AKS 群集配置为使用系统分配的托管标识来访问其他 Azure 资源

resource "azurerm_subnet" "aks" {
  name = var.aks_subnet_name
  resource_group_name = azurerm_resource_group.main.name
  virtual_network_name = module.network.vnet_name
  address_prefix = var.aks_subnet
  service_endpoints = ["Microsoft.KeyVault"]
}

resource "azurerm_kubernetes_cluster" "aks_main" {
  name = module.aks_name.result
  depends_on = [azurerm_subnet.aks]
  location = azurerm_resource_group.main.location
  resource_group_name = azurerm_resource_group.main.name
  dns_prefix = "aks-${local.name}"
  kubernetes_version = var.k8s_version
  addon_profile {
    oms_agent {
      # For monitoring containers
      enabled  = var.addons.oms_agent
      log_analytics_workspace_id = azurerm_log_analytics_workspace.example.id
    }
    kube_dashboard {
      enabled = true
    }
    azure_policy {
      # If we want to enfore policy definitions in the future
      # Check requirements https://docs.microsoft.com/en-ie/azure/governance/policy/concepts/policy-for-kubernetes
      enabled = var.addons.azure_policy
    }
  }
  default_node_pool {
    name = "default"
    orchestrator_version  = var.k8s_version
    node_count            = var.default_node_pool.node_count
    vm_size               = var.default_node_pool.vm_size
    type                  = "VirtualMachineScaleSets"
    availability_zones    = var.default_node_pool.zones
    # availability_zones  = ["1", "2", "3"]
    max_pods              = 250
    os_disk_size_gb       = 128
    vnet_subnet_id        = azurerm_subnet.aks.id
    node_labels           = var.default_node_pool.labels
    enable_auto_scaling   = var.default_node_pool.cluster_auto_scaling
    min_count             = var.default_node_pool.cluster_auto_scaling_min_count
    max_count             = var.default_node_pool.cluster_auto_scaling_max_count
    enable_node_public_ip = false
  }

  # Configuring AKS to use a system-assigned managed identity to access
  identity {
    type = "SystemAssigned"
  }

  network_profile {
    load_balancer_sku  = "standard"
    outbound_type      = "loadBalancer"
    network_plugin     = "azure"
    # if non-azure network policies
    # https://azure.microsoft.com/nl-nl/blog/integrating-azure-cni-and-calico-a-technical-deep-dive/
    network_policy     = "calico"
    dns_service_ip     = "10.0.0.10"
    docker_bridge_cidr = "172.17.0.1/16"
    service_cidr       = "10.0.0.0/16"
  }
  lifecycle {
    ignore_changes = [
      default_node_pool,
      windows_profile,
    ]
  }
}

我想使用该托管标识(在 AKS 群集部分代码中创建的服务主体)在子网上为其赋予类似 Network Contributor 的角色:

resource "azurerm_role_assignment" "aks_subnet" {
  # Giving access to AKS SP identity created to akssubnet by assigning it
  # a Network Contributor role
  scope                = azurerm_subnet.aks.id
  role_definition_name = "Network Contributor"
  principal_id         = azurerm_kubernetes_cluster.aks_main.identity[0].principal_id
  # principal_id = azurerm_kubernetes_cluster.aks_main.kubelet_identity[0].object_id
  # principal_id = data.azurerm_user_assigned_identity.test.principal_id
  # skip_service_principal_aad_check = true
}

但是我在 terraform apply 之后得到的输出是:

Error: authorization.RoleAssignmentsClient#Create: Failure responding 
to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. 
Status=403 Code="AuthorizationFailed" 
Message="The client 'afd5bd09-c294-4597-9c90-e1ee293e5f3a' with object id 
'afd5bd09-c294-4597-9c90-e1ee293e5f3a' does not have authorization 
to perform action 'Microsoft.Authorization/roleAssignments/write' 
over scope '/subscriptions/77dfff95-fbd3-4a15-b97a-b7182939e61a/resourceGroups/rhd-spec-prod-main-6loe4lpkr0hd8/providers/Microsoft.Network/virtualNetworks/rhd-spec-prod-main-wdaht6cn7s3s8/subnets/aks-subnet/providers/Microsoft.Authorization/roleAssignments/8733864c-a5f7-a6a9-a61d-6393989f0ad1' 
or the scope is invalid. If access was recently granted, please refresh your credentials."

  on aks.tf line 23, in resource "azurerm_role_assignment" "aks_subnet":
  23: resource "azurerm_role_assignment" "aks_subnet" {

似乎正在创建的服务主体没有足够的权限在子网上执行角色分配,或者我的scope 属性可能有误。我经过那里,aks 子网 id。

我做错了什么?

更新

检查托管身份分配角色的方式,看起来我们只能为其分配与订阅、资源组、存储服务、SQL 服务和 KeyVault 相关的角色。

阅读here

在您可以使用托管标识之前,必须对其进行配置。有两个步骤:

为身份分配一个角色,将其与将用于运行 Terraform 的订阅相关联。此步骤授予身份访问 Azure 资源管理器 (ARM) 资源的权限。

为一个或多个 Azure 资源配置访问控制。例如,如果您使用密钥保管库和存储帐户,则需要分别配置保管库和容器。

在您可以使用托管标识创建资源并分配 RBAC 角色之前,您的帐户需要足够的权限。您需要是帐户所有者角色的成员,或者具有贡献者和用户访问管理员角色。

尝试相应地进行,我定义了这个部分代码:

resource "null_resource" "wait_for_resource_to_be_ready" {
  provisioner "local-exec" {
    command = "sleep 60"
  }
  depends_on = [
    azurerm_kubernetes_cluster.aks_main
  ]
}

data "azurerm_subscription" "current" {}

# FETCHING THE IDENTITY CREATED ON AKS CLUSTER
data "azurerm_user_assigned_identity" "test" {
  name                = "${azurerm_kubernetes_cluster.aks_main.name}-agentpool"
  resource_group_name = azurerm_kubernetes_cluster.aks_main.node_resource_group
}


data "azurerm_role_definition" "contributor" {
  name = "Network Contributor"
}

resource "azurerm_role_assignment" "aks_subnet" {

  # Giving access to AKS SP identity created to akssubnet by assigning it
  # a Network Contributor role
  # name                 = azurerm_kubernetes_cluster.aks_main.name
  # scope                =  var.aks_subnet_name # azurerm_subnet.aks.id  var.aks_subnet
  scope = data.azurerm_subscription.current.id
  #role_definition_name = "Network Contributor"
  role_definition_id = "${data.azurerm_subscription.current.id}${data.azurerm_role_definition.contributor.id}"
  # principal_id         = azurerm_kubernetes_cluster.aks_main.identity[0].principal_id
  # principal_id = azu rerm_kubernetes_cluster.aks_main.kubelet_identity[0].object_id
  principal_id = data.azurerm_user_assigned_identity.test.principal_id
  skip_service_principal_aad_check = true
  depends_on = [
    null_resource.wait_for_resource_to_be_ready
  ]
}

terraform 工作流尝试创建角色...

> terraform_0.12.29 apply "prod_Infrastructure.plan"
null_resource.wait_for_resource_to_be_ready: Creating...
null_resource.wait_for_resource_to_be_ready: Provisioning with 'local-exec'...
null_resource.wait_for_resource_to_be_ready (local-exec): Executing: ["/bin/sh" "-c" "sleep 60"]
null_resource.wait_for_resource_to_be_ready: Still creating... [10s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [20s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [30s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [40s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [50s elapsed]
null_resource.wait_for_resource_to_be_ready: Still creating... [1m0s elapsed]
null_resource.wait_for_resource_to_be_ready: Creation complete after 1m0s [id=8505830187297683728]
azurerm_role_assignment.aks_subnet: Creating... 

但这次订阅通过了,但最终得到了相同的AuthorizationFailed 错误。

Error: authorization.RoleAssignmentsClient#Create: Failure responding to request: StatusCode=403 -- Original Error: autorest/azure: Service returned an error. Status=403 Code="AuthorizationFailed" Message="The client 'afd5bd09-c294-4597-9c90-e1ee293e5f3a' with object id 'afd5bd09-c294-4597-9c90-e1ee293e5f3a' does not have authorization to perform action 'Microsoft.Authorization/roleAssignments/write' over scope '/subscriptions/77dfff95-fbd3-4a15-b97a-b7182939e61a' or the scope is invalid. If access was recently granted, please refresh your credentials."

  on aks.tf line 145, in resource "azurerm_role_assignment" "aks_subnet":
 145: resource "azurerm_role_assignment" "aks_subnet" {

完全不确定如何验证此声明

在您可以使用托管标识创建资源并分配 RBAC 角色之前,您的帐户需要足够的权限。您需要是帐户所有者角色的成员,或者具有贡献者和用户访问管理员角色。

顺便说一句,我在我正在使用的订阅中拥有所有者角色。

更新 2

上述两条错误消息中引用的对象 ID 属于我的租户内的服务主体。 这是

az ad sp show --id afd5bd09-c294-4597-9c90-e1ee293e5f3a
{
  "accountEnabled": "True",
  "addIns": [],
  "alternativeNames": [],
  "appDisplayName": "Product-xxxx-ServicePrincipal-Production",
  "appId": "ff9c642c-06b9-47e2-9565-e3f6e782e14f",
  "appOwnerTenantId": "xxxxxxxx",
  "appRoleAssignmentRequired": false,
  "appRoles": [],
  "applicationTemplateId": null,
  "deletionTimestamp": null,
  "displayName": "Product-xxxx-ServicePrincipal-Production",
  "errorUrl": null,
  "homepage": null,
  "informationalUrls": {
    "marketing": null,
    "privacy": null,
    "support": null,
    "termsOfService": null
  },
  "keyCredentials": [],
  "logoutUrl": null,
  "notificationEmailAddresses": [],
  "oauth2Permissions": [],

  # THIS IS THE OBJECT ID
  "objectId": "afd5bd09-c294-4597-9c90-e1ee293e5f3a",
  
"objectType": "ServicePrincipal",
  "odata.metadata": "https://graph.windows.net/15f996bf-aad1-451c-8d17-9b95d025eafc/$metadata#directoryObjects/@Element",
  "odata.type": "Microsoft.DirectoryServices.ServicePrincipal",
  "passwordCredentials": [],
  "preferredSingleSignOnMode": null,
  "preferredTokenSigningKeyEndDateTime": null,
  "preferredTokenSigningKeyThumbprint": null,
  "publisherName": "xxxxxxx",
  "replyUrls": [],
  "samlMetadataUrl": null,
  "samlSingleSignOnSettings": null,
  "servicePrincipalNames": [
    "ff9c642c-06b9-47e2-9565-e3f6e782e14f"
  ],
  "servicePrincipalType": "Application",
  "signInAudience": "AzureADMyOrg",
  "tags": [
    "WindowsAzureActiveDirectoryIntegratedApp"
  ],
  "tokenEncryptionKeyId": null
}

关于权限,不确定是否足够,我会说是的,因为它用于订阅中的多个内容

Users Consent 权限呢?我那里什么都没有

但另一方面,为什么进程试图通过使用此服务主体来分配角色? 我的意思是,托管标识的使用旨在摆脱对服务主体的使用,但也许,工作流进程使用此 SP 只是为了将角色分配给托管标识,并且从那以后,访问权限将由托管标识 (?)

【问题讨论】:

  • 托管标识最终是服务主体。在这种情况下,服务主体(称为托管标识)由 Microsoft Azure AD 为您管理。目的是 Azure 为开发人员管理机密和身份,因此他们不必担心令牌、机密等。 docs.microsoft.com/en-us/azure/active-directory/…

标签: terraform azure-aks


【解决方案1】:

来自文档:https://docs.microsoft.com/en-us/azure/role-based-access-control/role-assignments-rest#add-a-role-assignment

要调用此 API,您必须有权访问 Microsoft.Authorization/roleAssignments/write 操作。 在内置角色中,只有所有者和用户访问管理员有权访问此操作。

因此,您的服务主体必须具有角色所有者或用户访问管理员。或者您必须创建具有足够权限的自定义角色。

关于工作流程,我同意。这是安静的反直觉。

旧答案

有一个错误 (?),其中 azure 声明资源已创建但并非所有服务都可以访问它。

你可以用这样的方式让它等待一分钟:

resource "null_resource" "wait_for_resource_to_be_ready" {
  provisioner "local-exec" {
    command = "sleep 60"
  }

  depends_on = [
    azurerm_kubernetes_cluster.aks_main
  ]
}

depends_on 语句添加到您的"azurerm_role_assignment" "aks_subnet" 资源:

  depends_on = [
    null_resource.wait_for_resource_to_be_ready
  ]

现在首先将创建您的集群,然后 terrform 将等待 60 秒。然后您的 role_assignment 将会发生,并希望能够授予该角色。

【讨论】:

  • 这是有道理的,从同步状态资源的角度来看,但不幸的是,我得到了同样的错误,即使输入sleep = 120。看起来local-exec 供应商没有工作。 local-execremote-exec 供应商之间有什么区别?他们说we are invoking that sleep = 60 command on the machine running terraform,也许我们需要的是remote-exec
  • 如果 60 秒不能解决问题,我认为 remote-exec 或更多时间都不会。它一定是别的东西。让我考虑另一种可能的解决方案......用户是否有足够的权利/特权来分配角色?客户afd5bd09-c294-4597-9c90-e1ee293e5f3a 是谁?
  • 是的,访问Microsoft.Authorization/roleAssignments/write 操作是一个问题。我所做的只是转到我拥有的订阅,并将 User Access Administrator 角色分配给 terraform 工作流引用的服务主体我有一个问题,null resource 定义为让工作流等待一分钟,在一天结束时没有必要包括它。哪些情况下需要考虑?
  • 我使用 terraform 为带有 aks 的 rbac 创建了一个具有特定权限的应用注册。给予管理员同意失败,因为应用程序注册尚未准备好,尽管 azure 表示已准备好。在创建应用注册和使用null resource 授予管理员同意之间等待 60 秒解决了该问题。感谢您接受我的回答!
  • 是的,这是有道理的,在这种情况下,服务主体/应用程序注册已经存在。无论如何,申请是个好主意,我记得我在创建存储帐户时发生过这种情况,我需要立即在该存储帐户中创建一个 blob 容器。创建(预配)存储帐户的整个过程需要一些时间(如您所说最多 1 分钟),并且在预配存储帐户之前,不允许对该存储帐户进行任何操作。这确实是一个类似的情况。
猜你喜欢
  • 2019-10-03
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 2020-05-15
  • 1970-01-01
  • 2018-05-11
相关资源
最近更新 更多