【问题标题】:Alertmanager to send aggregated/consolidated alert to WebhookAlertmanager 将聚合/合并警报发送到 Webhook
【发布时间】:2021-08-13 09:40:10
【问题描述】:

prometheus 根据指标向我们的警报管理器发布了一些警报。 目前,alertmanager 已将以下触发警报发布到 slack-integration:

AlertNo.1 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:K8308, timestamp:2021-08-11 00:46:18
AlertNo.2 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:D3010, timestamp:2021-08-11 00:46:18
AlertNo.3 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:X2008, timestamp:2021-08-11 00:46:18
AlertNo.4 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:X2005, timestamp:2021-08-11 00:46:18
AlertNo.5 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101, device:X2202, timestamp:2021-08-11 00:46:18

由于 prometheus 发布的 5 个唯一设备名称,我们的警报管理器会通知 5 个不同的警报。我们想知道如何根据上述触发数据仅将一个带有 cluster_name 或 site_name 标签值的单个/聚合警报发布到特定的 webhook。有没有办法基于特定标签仅发布一个警报到特定 webhook,即使由于其他警报标签中的其他唯一值而存在多个警报?

预期:

放松:

<as-above-posted>

到第 3 方网络挂钩:

<only-one-alert-as-below>
AlertNo.1 - alertname:Alert_Offline, alertsrc:prometheus, cluster_name:cc100, site_name:PP101 timestamp:2021-08-11 00:46:18

【问题讨论】:

    标签: prometheus-alertmanager


    【解决方案1】:

    这可以使用group_by 参数以及alertmanager.yml 中的group_waitgroup_interval 来实现。

    来自docs

    # To aggregate by all possible labels use the special value '...' as the sole label name, for example:
    # group_by: ['...']
    # This effectively disables aggregation entirely, passing through all
    # alerts as-is. This is unlikely to be what you want, unless you have
    # a very low alert volume or your upstream notification system performs
    # its own grouping.
    [ group_by: '[' <labelname>, ... ']' ]
    
    # How long to initially wait to send a notification for a group
    # of alerts. Allows to wait for an inhibiting alert to arrive or collect
    # more initial alerts for the same group. (Usually ~0s to few minutes.)
    [ group_wait: <duration> | default = 30s ]
    
    # How long to wait before sending a notification about new alerts that
    # are added to a group of alerts for which an initial notification has
    # already been sent. (Usually ~5m or more.)
    [ group_interval: <duration> | default = 5m ]
    

    在您的情况下,请尝试以下操作:

    group_by: ['cluster_name', 'site_name']
    group_wait: 10s
    group_interval: 1m 
    

    group_by 指定用于聚合警报的标签。

    group_wait 指定等待带有标签的警报添加到聚合组的时间量。在您的情况下,警报似乎是同时出现的,因此将此值保持在较低水平应该没问题,但您可以试验一下,看看哪种方法最适合您。

    group_interval 指定在从已发出警报的聚合组发送警报之前等待的时间。

    这样做会按指定标签cluster_namesite_name 聚合您的警报,从而生成一个触发警报,其中payload 包含alerts 部分中的警报列表。

    【讨论】:

      猜你喜欢
      • 2021-08-10
      • 1970-01-01
      • 2022-07-13
      • 2016-03-15
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多