具有来自 Prometheus 的自定义指标的 Horizontal Pod Autoscaler 以及 CPU 使用率的百分位数答案

【问题标题】：Horizontal Pod Autoscaler with custom metrics from Prometheus with percentiles for CPU usage具有来自 Prometheus 的自定义指标的 Horizontal Pod Autoscaler 以及 CPU 使用率的百分位数
【发布时间】：2020-01-11 07:16:59
【问题描述】：

所以我想弄清楚如何从 Prometheus 读取的自定义指标配置 Horizontal Pod Autoscaler，该指标返回百分位数 0.95 的 CPU 使用率

我已将所有设置都设置为使用带有 prometheus-adapter 的自定义指标，但我不明白如何在 Prometheus 中创建规则。例如，如果我去 Grafana 检查默认提供的一些图表，我会看到以下指标：

sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name)

但是如何将其修改为百分位 95？我尝试使用 histogram_quantile 函数，但它说没有找到数据点：

histogram_quantile(0.95, sum(namespace_pod_name_container_name:container_cpu_usage_seconds_total:sum_rate{namespace="api", pod_name="api-xxxxx9b-bdxx", container_name!="POD", cluster=""}) by (container_name))

但即使这样可行，在使用自定义指标时，pod 名称和命名空间是否会由 prometheus-adapter 或 prometheus 填充？

我发现的每个使用自定义指标的示例都与 CPU 无关。所以......我的另一个问题是人们如何在生产中使用自动缩放指标？我习惯于根据百分位数进行缩放，但我不明白在 Kubernetes 中这是如何管理的。

【问题讨论】：

标签： kubernetes prometheus

【解决方案1】：

如果我对您的理解正确，您不必使用自定义指标来水平自动缩放您的 pod。默认情况下，您可以根据观察到的 CPU 利用率自动扩展 Kubernetes pod 的数量。这是official documentation 以及必要的详细信息。

Horizontal Pod Autoscaler 自动缩放 pod 的数量在基于复制控制器、部署或副本集中观察到的 CPU 利用率（或者，通过自定义指标支持，在某些其他应用程序提供的指标）。

Horizontal Pod Autoscaler 作为 Kubernetes API 实现资源和控制器。资源决定行为控制器。控制器周期性调整副本数在复制控制器或部署中以匹配观察到的用户指定目标的平均 CPU 利用率。

在这里你可以找到如何设置它的walkthrough。

另外，here 是kubectl autoscale 命令文档。

示例：kubectl autoscale rc foo --max=5 --cpu-percent=80

自动扩展复制控制器“foo”，pod 数量在 1 到 5 之间，目标 CPU 利用率为 80%

我认为这是最简单的方法，因此无需使用一些自定义指标使其复杂化。

如果有帮助，请告诉我。

【讨论】：

是的，我知道我可以使用 kubernetes 指标服务中的 cpu 资源，问题是我需要根据 95% 的 cpu 使用指标自动缩放，我不知道 kubernetes 使用什么默认情况下，我找不到相关信息。所以我虽然关于使用普罗米修斯和自定义指标，但我现在看到我不能使用百分位数来衡量 CPU 使用率，因为我在普罗米修斯中都没有来自 kubernetes 指标的普罗米修斯运算符，所以我想知道 kubernetes 指标是否很好足够生产
如果组件本身提供类似的功能，避免外部依赖总是好的。在您的情况下，您可以使用 --cpu-percent=95 或使用 * HorizontalPodAutoscaler* k8s 对象。

【解决方案2】：

如果您想根据自定义指标添加 HPA，您可以使用 Prometheus 适配器。

Prometheus 适配器可帮助您向 HPA 公开自定义指标。

Helm Chart - https://github.com/helm/charts/tree/master/stable/prometheus-adapter

Prometheus 适配器 - https://github.com/DirectXMan12/k8s-prometheus-adapter

注意 - 您必须启用从公共到集群的 6443 端口，因为 prometheus 不提供覆盖选项。

https://github.com/helm/charts/blob/master/stable/prometheus-adapter/templates/custom-metrics-apiserver-deployment.yaml#L34

确保 Prometheus 正在获取自定义指标数据在要应用 hpa 的同一 kubernetes 集群上安装 Prometheus 适配器

helm install --name my-release stable/prometheus-adapter -f values.yaml

将以下配置文件传递给 helm - values.yaml

prometheus-adapter:
  enabled: true
  prometheus:
    url: http://prometheus.namespace.svc.cluster.local
  rules:
    default: true
    custom:
    - seriesQuery: '{__name__="cpu",namespace!="",pod!="",service="svc_name"}'
      seriesFilters: []
      resources:
        overrides:
          namespace: {resource: "namespace"}
          pod: {resource: "pod"}
      name:
        matches: "cpu"
        as: "cpu_95"
      metricsQuery: "histogram_quantile(0.95, sum(irate(<<.Series>>{<<.LabelMatchers>>}[2m])) by (<<.GroupBy>>,le))"

上面的配置会暴露

cpu 指标作为 cpu_95 到 HPA。

要验证，如果数据正确暴露，请运行以下命令 -

获取数据 curl 原始查询命令 - kubectl get --raw /apis/custom.metrics.k8s.io/v1beta1/namespaces/namespace_name/pods/\*/cpu_95 | jq .

HPA 配置 -

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: test-cpu-manual
  labels:
    app: app_name
spec:
  scaleTargetRef:
    apiVersion: apps/v1beta2
    kind: Deployment
    name: app_name
  minReplicas: 1
  maxReplicas: 15
  metrics:
  - type: Pods
    pods:
      metricName: cpu_95
      targetAverageValue: 75

【讨论】：