【问题标题】:Prometheus Operator Alertmanager - Custom rule - field groups not found in type config.plainPrometheus Operator Alertmanager - 自定义规则 - 在 config.plain 类型中找不到字段组
【发布时间】:2020-03-24 10:54:34
【问题描述】:

我已经通过 helm 安装了 prometheus-operator,现在想设置自定义警报规则,设置电子邮件通知,目前我正在收到每个通知,我想“使其静音”,这样我就可以获得自定义警报的电子邮件。

alertmanager.yaml:

global:
  resolve_timeout: 5m
route:
  receiver: 'email-alert'
  group_by: ['job']


  routes:
  - receiver: 'email-alert'
    match:
      alertname: etcdInsufficientMembers
    group_wait: 30s
    group_interval: 5m
    repeat_interval: 12h  

receivers:
- name: email-alert
  email_configs:
  - to: receiver@example.com
    from: sender@example.com
    # Your smtp server address
    smarthost: smtp.office365.com:587
    auth_username: sender@example.com
    auth_identity: sender@example.com
    auth_password: pass

以上文件应用成功,

我在上述文件的末尾添加了以下行,引用为here

# Example group with one alert

groups:
- name: example-alert
  rules:
    # Alert about restarts
  - alert: RestartAlerts
    expr: count(kube_pod_container_status_restarts_total) > 0
    for: 1s
    annotations:
      summary: "More than 5 restarts in pod {{ $labels.pod-name }}"
      description: "{{ $labels.container-name }} restarted (current value: {{ $value }}s) times in pod {{ $labels.pod-namespace }}/{{ $labels.pod-name }}

然后在 pod 日志中我得到了这个:

="Loading configuration file failed" file=/etc/alertmanager/config/alertmanager.yaml err="yaml: unmarshal errors:\n  line 28: field groups not found in type config.plain"

【问题讨论】:

    标签: kubernetes prometheus prometheus-alertmanager


    【解决方案1】:

    已解决,首先,需要列出所有可用的规则:

        kubectl -n monitoring get prometheusrules
    NAME                                                              AGE
    prometheus-prometheus-oper-alertmanager.rules                     29h
    prometheus-prometheus-oper-etcd                                   29h
    prometheus-prometheus-oper-general.rules                          29h
    prometheus-prometheus-oper-k8s.rules                              29h
    prometheus-prometheus-oper-kube-apiserver-error                   29h
    prometheus-prometheus-oper-kube-apiserver.rules                   29h
    prometheus-prometheus-oper-kube-prometheus-node-recording.rules   29h
    prometheus-prometheus-oper-kube-scheduler.rules                   29h
    prometheus-prometheus-oper-kubernetes-absent                      29h
    prometheus-prometheus-oper-kubernetes-apps                        29h
    prometheus-prometheus-oper-kubernetes-resources                   29h
    prometheus-prometheus-oper-kubernetes-storage                     29h
    prometheus-prometheus-oper-kubernetes-system                      29h
    prometheus-prometheus-oper-kubernetes-system-apiserver            29h
    prometheus-prometheus-oper-kubernetes-system-controller-manager   29h
    prometheus-prometheus-oper-kubernetes-system-kubelet              29h
    prometheus-prometheus-oper-kubernetes-system-scheduler            29h
    prometheus-prometheus-oper-node-exporter                          29h
    prometheus-prometheus-oper-node-exporter.rules                    29h
    prometheus-prometheus-oper-node-network                           29h
    prometheus-prometheus-oper-node-time                              29h
    prometheus-prometheus-oper-node.rules                             29h
    prometheus-prometheus-oper-prometheus                             29h
    prometheus-prometheus-oper-prometheus-operator                    29h
    

    然后选择一项进行编辑,或删除除默认一项以外的所有内容:prometheus-prometheus-oper-general.rules

    我选择编辑节点导出器规则

    kubectl edit prometheusrule prometheus-prometheus-oper-node-exporter -n monitoring
    

    在文件末尾添加这些行

    - alert: RestartAlerts
      annotations:
        description: '{{ $labels.container }} restarted (current value: {{ $value}}s)
              times in pod {{ $labels.namespace }}/{{ $labels.pod }}'
        summary: More than 5 restarts in pod {{ $labels.container }}
      expr: kube_pod_container_status_restarts_total{container="coredns"} > 5
      for: 1min
      labels:
        severity: warning
    

    不久之后,我收到了有关此警报的电子邮件。

    【讨论】:

      猜你喜欢
      • 2022-10-04
      • 2021-08-18
      • 2021-11-23
      • 1970-01-01
      • 2021-11-09
      • 1970-01-01
      • 2019-03-28
      • 2020-12-09
      • 1970-01-01
      相关资源
      最近更新 更多