【发布时间】:2019-11-18 10:10:57
【问题描述】:
您好,我是 Kubernetes 和 Helm Chart 的新手。此处已提出并回答了类似的问题 (How to set prometheus rules in stable/prometheus chart values.yaml?)
但我正在寻找一种方法在另一个文件中定义规则,然后将该文件包含在 values.yaml 中,以便于维护(因为我有超过 2000 多行警报...)
尤其是我的values.yaml:
serverFiles:
alerts:
groups:
- name: kubernetes-apps
rules:
- alert: KubePodCrashLooping
annotations:
message: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container
}}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping
expr: rate(kube_pod_container_status_restarts_total{component="kube-state-metrics"}[15m])
* 60 * 5 > 0
for: 1h
labels:
severity: critical
...
<2000 more lines>
...
rules: {}
prometheus.yml:
rule_files:
- /etc/config/rules
- /etc/config/alerts
这就是我希望在新的values.yaml 中实现的目标:
serverFiles:
alerts: {{ include from values-alerts.yaml }}
rules: {}
prometheus.yml:
rule_files:
- /etc/config/rules
- /etc/config/alerts
这是我想包含在values.yaml 中的values-alerts.yaml 文件:
alerts:
groups:
- name: kubernetes-apps
rules:
- alert: KubePodCrashLooping
annotations:
message: Pod {{ $labels.namespace }}/{{ $labels.pod }} ({{ $labels.container
}}) is restarting {{ printf "%.2f" $value }} times / 5 minutes.
runbook_url: https://github.com/kubernetes-monitoring/kubernetes-mixin/tree/master/runbook.md#alert-name-kubepodcrashlooping
expr: rate(kube_pod_container_status_restarts_total{component="kube-state-metrics"}[15m])
* 60 * 5 > 0
for: 1h
labels:
severity: critical
...
<2000 more lines>
...
请告知这是否可行,或者是否有其他更好的方法。
谢谢,
【问题讨论】:
-
下面的解决方案是否达到了您的预期效果?
标签: kubernetes prometheus kubernetes-helm