【问题标题】:prometheus monitor container memory [duplicate]prometheus 监控容器内存 [重复]
【发布时间】:2022-01-13 03:58:29
【问题描述】:

通过监控容器使用的真实内存,发现所有容器的真实内存都大于所有物理节点的真实内存。这很奇怪。

但是,我在监控的metrics中发现没有container_Name字段,如果没有移除container_Name字段。这时候才能发现容器的实际内存是合理的

为什么会出现这种情况(PS:container_name!= "pod" 被排除在外


sum(sum(container_memory_rss{container_name!="POD",container_name=~"[a-z].*"}) by (container_name))/1024^4

sum(sum(container_memory_rss{container_name!="POD") by (container_name))/1024^4 

【问题讨论】:

    标签: prometheus


    【解决方案1】:

    这是我们用于映射容器内存指标的方法

    按(容器、pod、命名空间、节点、作业)求和(container_memory_rss{container != "POD", image != "", container != ""})

    要回答您的具体问题,为什么价值更高?那是因为它包括节点内存本身。

    kubelet (cadvisor) 报告多个组的内存指标,例如,id="/" 是根 cgroup(即整个节点)的指标

    例如在我的设置中,以下指标是节点内存

    {endpoint="https-metrics", id="/", instance="10.0.84.2:10250", job="kubelet", metrics_path="/metrics/cadvisor", node="ip-10-xx-x-x.us-west-2.compute.internal", service="kube-prometheus-stack-kubelet"}
    

    同样在www.asserts.ai,我们使用 rss 的最大值、工作和使用指标来得出容器使用的实际内存。

    请参阅下面对我们的记录规则的参考

          
          #
          - record: asserts:container_memory
            expr: sum by (container, pod, namespace, node, job, asserts_env, asserts_site)(container_memory_rss{container != "POD", image != "", container != ""})
            labels:
              source: rss
    
          - record: asserts:container_memory
            expr: sum by (container, pod, namespace, node, job, asserts_env, asserts_site)(container_memory_working_set_bytes{container != "POD", image != "", container != ""})
            labels:
              source: working
    
          - record: asserts:container_memory
            # why sum ? multiple copies of same container may be running on same pod
            expr: sum by (container, pod, namespace, node, job, asserts_env, asserts_site)
              (
              container_memory_usage_bytes {container != "POD", image != "", container != ""} -
              container_memory_cache {container != "POD", image != "", container != ""}-
              container_memory_swap {container != "POD", image != "", container != ""}
              )
            labels:
              source: usage
    
          # For KPI Rollup Purposes
          - record: asserts:resource:usage
            expr: |-
              max without (source) (asserts:container_memory)
              * on (namespace, pod, asserts_env, asserts_site) group_left(workload) asserts:mixin_pod_workload
    
    
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2021-09-10
      • 1970-01-01
      • 2017-03-30
      • 1970-01-01
      • 2021-08-25
      相关资源
      最近更新 更多