Azure AKS Prometheus-operator 双指标答案

【问题标题】：Azure AKS Prometheus-operator double metricsAzure AKS Prometheus-operator 双指标
【发布时间】：2020-07-10 08:04:15
【问题描述】：

我正在运行 Azure AKS Cluster 1.15.11，并将 prometheus-operator 8.15.6 安装为 helm 图表，我看到 Kubernetes Dashboard 显示的一些指标与 prometheus Grafana 提供的指标相比有所不同。

一个被监控的应用程序 pod 中有三个容器。 Kubernetes-dashboard 显示此 pod 的内存消耗约为 250MB，标准 prometheus-operator dashboard 显示的内存消耗几乎是大约两倍的值，约为 500MB。

起初我们认为我们的监控设置可能存在一些错误配置。由于 prometheus-operator 是作为标准 helm chart 安装的，因此节点导出器的守护程序集确保每个节点都部署了一个导出器，因此重复的导出器不应该是原因。但是，在将我们的集群迁移到不同的节点池之后，我注意到当我们的应用程序在 user node pool 而不是 system node pool 上运行时，两个工具上的指标确实完全匹配.我知道系统节点池正在运行 CoreDNS 和 tunnelfront，但我假设它们作为单独的组件运行，我也知道总体而言，在同一节点池中运行基础架构和应用程序并不是最佳选择。

但是，我仍然想知道为什么在 系统节点池 下运行应用程序会导致 prometheus 的指标加倍？

【问题讨论】：

标签： azure kubernetes monitoring prometheus prometheus-operator

【解决方案1】：

我遇到了一个类似的问题（aks v1.14.6，prometheus-operator v0.38.1），我的所有值都乘以了 3 倍。结果你必须记住删除名为 prometheus-operator-kubelet 的额外端点在安装过程中在kube-system-namespace 中创建在您删除/重新安装 prometheus-operator 之前，因为 Prometheus 聚合了为每个端点收集的指标类型。

【讨论】：