【问题标题】:Resource monitoring for Kubernetes PodsKubernetes Pod 的资源监控
【发布时间】:2018-01-09 15:59:35
【问题描述】:

我正在为 K8s REST API 使用 kubernetes-client java 库。我想探索这里描述的资源监控功能 https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/

我在创建这样的部署时为 Pod 设置资源

// ******************* RESOURCES*********************

    Quantity memLimit = new Quantity();
    memLimit.setAmount("400");
    Map<String, Quantity> memMap = new HashMap<String,Quantity>();
    memMap.put("memory", memLimit);
    ResourceRequirements resourceRequirements = new ResourceRequirementsBuilder()
      .withRequests(memMap)
      .build();

    // ******************* DEPLOYMENT *********************
    Deployment deployment = new DeploymentBuilder()
        .withNewMetadata()
        .withName("first-deployment")
        .endMetadata()
        .withNewSpec()
        .withReplicas(3)
        .withNewTemplate()
        .withNewMetadata()
        .addToLabels(namespaceID, "hello-world-example")
        .endMetadata()
        .withNewSpec()
        .addNewContainer()      
        .withName("nginx-one")
        .withImage("nginx")
        .addNewPort()
        .withContainerPort(80)
        .endPort()
        .withResources(resourceRequirements)
        .endContainer()
        .endSpec()
        .endTemplate()
        .endSpec()
        .build();
    deployment = client.extensions().deployments().inNamespace(namespace).create(deployment);

我现在怎么知道在分配给 pod 的内存中使用了多少内存? 文档说它是 pod status 的一部分,但 pod status 的形式是

     (conditions=
    [PodCondition
    (lastProbeTime=null, lastTransitionTime=2018-01-09T15:53:28Z, 
    message=null, reason=null, 
status=True, type=PodScheduled, 
    additionalProperties={})],
 containerStatuses=[], hostIP=null, 
    initContainerStatuses=[],
 message=null, phase=Pending, podIP=null,
 qosClass=Burstable, reason=null, 
startTime=null, additionalProperties={})

以及容器状态

(containerID=null, image=nginx, 
imageID=, lastState=ContainerState(running=null, terminated=null, waiting=null, additionalProperties={}),
 name=nginx-one, ready=false, restartCount=0, state=ContainerState(running=null, terminated=null, waiting=
ContainerStateWaiting(message=null, reason=ContainerCreating, additionalProperties={}), additionalProperties={}), 
additionalProperties={})

有监控 Pod 上资源的例子吗?

【问题讨论】:

  • 您看不到一些内存使用情况,因为 pod 没有启动。它已计划并处于 ContainerCreating 状态。您何时检索 pod 状态?您将不得不等待它开始运行。
  • 太好了,马上试试!

标签: java docker kubernetes fabric8 resource-monitor


【解决方案1】:

花一个小时观看视频:Load Testing Kubernetes: How to Optimize Your Cluster Resource Allocation in Production,其中介绍了一些关于如何根据负载测试调整资源配置大小的技术和建议。视频中的示例利用了 cAdvisor,因此一旦您的 Pod/容器启动并运行,您就可以利用该机制至少捕获容器占用多少资源的基本视图。

【讨论】:

    【解决方案2】:

    我不确定 k8 api-server 是否提供了一个端点来获取与性能相关的指标,但是使用 fabric8,即使 Pod 处于运行状态,您也不应该能够监控资源消耗。

    这里是Pod response json:

    {
      "kind": "Pod",
      "apiVersion": "v1",
      "metadata": {
        "name": "nginx-41cbe3-10-json-9cc655bcc-w576m",
        "generateName": "nginx-41cbe3-10-json-9cc655bcc-",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/pods/nginx-41cbe3-10-json-9cc655bcc-w576m",
        "uid": "e14a955f-18b7-11e8-a642-42010a800090",
        "resourceVersion": "12765988",
        "creationTimestamp": "2018-02-23T16:37:47Z",
        "labels": {
          "app": "nginx",
          "cliqr": "99911519403865240",
          "pod-template-hash": "577211677"
        },
        "annotations": {
          "kubernetes.io/created-by": "{\"kind\":\"SerializedReference\",\"apiVersion\":\"v1\",\"reference\":{\"kind\":\"ReplicaSet\",\"namespace\":\"default\",\"name\":\"nginx-41cbe3-10-json-9cc655bcc\",\"uid\":\"e1493bd0-18b7-11e8-a642-42010a800090\",\"apiVersion\":\"extensions\",\"resourceVersion\":\"12765971\"}}\n",
          "kubernetes.io/limit-ranger": "LimitRanger plugin set: cpu request for container nginx"
        },
        "ownerReferences": [
          {
            "apiVersion": "extensions/v1beta1",
            "kind": "ReplicaSet",
            "name": "nginx-41cbe3-10-json-9cc655bcc",
            "uid": "e1493bd0-18b7-11e8-a642-42010a800090",
            "controller": true,
            "blockOwnerDeletion": true
          }
        ]
      },
      "spec": {
        "volumes": [
          {
            "name": "default-token-zrhj5",
            "secret": {
              "secretName": "default-token-zrhj5",
              "defaultMode": 420
            }
          }
        ],
        "containers": [
          {
            "name": "nginx",
            "image": "nginx:latest",
            "ports": [
              {
                "containerPort": 80,
                "protocol": "TCP"
              }
            ],
            "resources": {
              "requests": {
                "cpu": "100m"
              }
            },
            "volumeMounts": [
              {
                "name": "default-token-zrhj5",
                "readOnly": true,
                "mountPath": "/var/run/secrets/kubernetes.io/serviceaccount"
              }
            ],
            "terminationMessagePath": "/dev/termination-log",
            "terminationMessagePolicy": "File",
            "imagePullPolicy": "Always"
          }
        ],
        "restartPolicy": "Always",
        "terminationGracePeriodSeconds": 30,
        "dnsPolicy": "ClusterFirst",
        "serviceAccountName": "default",
        "serviceAccount": "default",
        "nodeName": "gke-rishi-k8-cluster-default-pool-6ca1467e-xtmw",
        "securityContext": {},
        "schedulerName": "default-scheduler",
        "tolerations": [
          {
            "key": "node.alpha.kubernetes.io/notReady",
            "operator": "Exists",
            "effect": "NoExecute",
            "tolerationSeconds": 300
          },
          {
            "key": "node.alpha.kubernetes.io/unreachable",
            "operator": "Exists",
            "effect": "NoExecute",
            "tolerationSeconds": 300
          }
        ]
      },
      "status": {
        "phase": "Running",
        "conditions": [
          {
            "type": "Initialized",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2018-02-23T16:37:47Z"
          },
          {
            "type": "Ready",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2018-02-23T16:37:53Z"
          },
          {
            "type": "PodScheduled",
            "status": "True",
            "lastProbeTime": null,
            "lastTransitionTime": "2018-02-23T16:37:47Z"
          }
        ],
        "hostIP": "10.240.0.23",
        "podIP": "10.20.3.164",
        "startTime": "2018-02-23T16:37:47Z",
        "containerStatuses": [
          {
            "name": "nginx",
            "state": {
              "running": {
                "startedAt": "2018-02-23T16:37:52Z"
              }
            },
            "lastState": {},
            "ready": true,
            "restartCount": 0,
            "image": "nginx:latest",
            "imageID": "docker-pullable://nginx@sha256:600bff7fb36d7992512f8c07abd50aac08db8f17c94e3c83e47d53435a1a6f7c",
            "containerID": "docker://2c227a901bcde4705c5b79aedf1963079dfb345fae5849616d29e8cc7af0fd74"
          }
        ],
        "qosClass": "Burstable"
      }
    }
    

    【讨论】:

      【解决方案3】:

      我知道这个问题已经存在两年了,但这里的答案并没有提供这个问题的实际答案。

      为了获得 CPU 和内存利用率,您需要在 Kubernetes 集群上安装 kubernetes metrics server(如果您使用 helm,请参阅官方的 Helm chart)。一旦安装了指标服务器,您就可以运行 kubernetes 命令来报告指标的使用情况。例如,运行 kubectl top pods -A 将按照 CPU 利用率对所有 pod 进行排序,或者 kubectl top nodes 将列出每个节点的利用率。一旦安装了指标服务器,kubectl describe podsKubernetes dashboard 也会报告 CPU 和内存利用率数字。

      要回答您关于fabric8 的具体问题,一旦指标服务器运行,您可以使用following code 获取CPU 和内存利用率:

      KubernetesClient k8s = new KubernetesClientBuilder().build()
      NodeMetricsList nodeMetricsList = k8s.top().nodes().metrics();
      for (NodeMetrics nodeMetrics : nodeMetricsList.getItems()) {
          logger.info("{} {} {}",
              nodeMetrics.getMetadata().getName(),
              nodeMetrics.getUsage().get("cpu"),
              nodeMetrics.getUsage().get("memory")
          );
      }
      

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 2019-03-03
        • 1970-01-01
        • 1970-01-01
        • 2023-04-10
        • 1970-01-01
        • 1970-01-01
        • 2019-05-06
        • 2019-03-11
        相关资源
        最近更新 更多