【问题标题】:Use a Stackdriver resource group's ID in a GCP Deployment Manager configuration在 GCP Deployment Manager 配置中使用 Stackdriver 资源组的 ID
【发布时间】:2019-08-01 09:24:49
【问题描述】:

我正在尝试使用 Deployment Manager 配置创建 Stackdriver alert policy。相同的配置首先创建一个resource group 和一个notification channel,然后基于这些创建一个策略:

resources:
- name: test-group
  type: gcp-types/monitoring-v3:projects.groups
  properties:
    displayName: A test group
    filter: >-
        resource.metadata.cloud_account="aproject-id" AND
        resource.type="gce_instance" AND
        resource.metadata.tag."managed"="yes"

- name: test-email-notification
  type: gcp-types/monitoring-v3:projects.notificationChannels
  properties:
    displayName: A test email channel
    type: email
    labels:
      email_address: incidents@example.com

- name: test-alert-policy
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    enabled: true
    displayName: A test alert policy
    documentation:
      mimeType: text/markdown
      content: "Test incident"
    notificationChannels:
      - $(ref.test-email-notification.name)
    combiner: OR
    conditions:
    - conditionAbsent:
        aggregations:
        - alignmentPeriod: 60s
          perSeriesAligner: ALIGN_RATE
        duration: 300s
        filter: metric.type="compute.googleapis.com/instance/uptime" group.id="$(ref.test-group.id)"
        trigger:
          count: 1
      displayName: The instance is down

策略的唯一条件具有基于资源组的过滤器,即只有组的成员可以触发此警报。

我正在尝试使用对组 ID 的引用,但它不起作用 - "The reference 'id' is invalid, reason: The field 'id' does not exists on the reference schema.

当我尝试使用$(ref.test-group.selfLink) 时,我得到The reference 'selfLink' is invalid, reason: The field 'selfLink' does not exists on the reference schema.

我可以获得组的名称(例如 "projects/aproject-id/groups/3691870619975147604")但 filters 只接受 group IDs(例如只接受 "3691870619975147604" 部分):

'{"ResourceType":"gcp-types/monitoring-v3:projects.alertPolicies","ResourceErrorCode":"400","ResourceErrorMessage":{"code":400,"message":"Field alert_policy.conditions[0].condition_absent.filter had an invalid value of \"metric.type=\"compute.googleapis.com/instance/uptime\" group.id=\"projects/aproject-id/groups/3691870619975147604\"\": must specify a restriction on \"resource.type\" in the filter; see \"https://cloud.google.com/monitoring/api/resources\" for a list of available resource types.","status":"INVALID_ARGUMENT","statusMessage":"Bad Request","requestPath":"https://monitoring.googleapis.com/v3/projects/aproject-id/alertPolicies","httpMethod":"POST"}}'

【问题讨论】:

  • 您确定要使用组 ID 吗?该错误是抱怨对resource.type没有限制(必须在过滤器中指定对“resource.type”的限制)。
  • 你是对的,我的错 - 正如下面 Aleksi 的回答所示,当 resource.type="gce_instance" 添加到条件的过滤器时,该错误就会消失。

标签: google-cloud-platform stackdriver google-cloud-stackdriver google-deployment-manager


【解决方案1】:

尝试将您的警报政策替换为以下内容:

- name: test-alert-policy
  type: gcp-types/monitoring-v3:projects.alertPolicies
  properties:
    enabled: true
    displayName: A test alert policy
    documentation:
      mimeType: text/markdown
      content: "Test incident"
    notificationChannels:
      - $(ref.test-email-notification.name)
    combiner: OR
    conditions:
    - conditionAbsent:
        aggregations:
        - alignmentPeriod: 60s
          perSeriesAligner: ALIGN_RATE
        duration: 300s
        filter: metric.type="compute.googleapis.com/instance/uptime" $(ref.test-group.filter)
        trigger:
          count: 1
      displayName: The instance is down
  metadata:
    dependsOn:
    - test-group

这增加了 1) 使用 dependsOn 子句对 test-group 的显式依赖和 2) $(ref.test-group.filter) 到度量过滤器,因此它虽然没有严格链接到 test-group,但最终包含所有相同的资源作为test-group

由于 Deployment Manager 资源是并行运行的,因此必须使用 dependsOn 以确保在尝试创建 test-alert-policy 之前实例化 test-group;显然,部署管理器不够聪明,无法仅通过引用来推断这一点。

【讨论】:

  • 向前迈出一步 - 现在部署成功,但创建的策略仍然被破坏,即过滤器包含 group.id="projects/aproject-id/groups/5310387734849288536" 并且它不会生成警报。工作策略的条件过滤器包含相同条件的 group.id="5310387734849288536"
  • 太好了,快到了!嗯...一种解决方法是将部署管理器配置中的策略过滤器设置为metric.type="compute.googleapis.com/instance/uptime" $(ref.test-group.filter);现在创建的策略虽然没有严格链接到组,但最终包含与组相同的所有资源。也就是说,实现的策略过滤器看起来像metric.type="compute.googleapis.com/instance/uptime" resource.metadata.cloud_account="..." AND resource.type="gce_instance" AND resource.metadata.tag."managed"="yes"
  • 这有效 - 策略的过滤器复制了组的过滤器。这不是我想要的(将策略与组绑定),但它实现了相同的目标 - “DRY”。请将此添加到您的答案中,我会接受。
猜你喜欢
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
  • 1970-01-01
相关资源
最近更新 更多