【问题标题】:prometheus-adapter is not running properlyprometheus-adapter 没有正常运行
【发布时间】:2021-08-12 10:23:45
【问题描述】:

按照文档部署Prometheus -operator后,发现kubectl top Nodes无法正常运行。

$ kubectl get apiService v1beta1.metrics.k8s.io 
v1beta1.metrics.k8s.io                  monitoring/prometheus-adapter   False (FailedDiscoveryCheck)   44m


$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

$ kubectl get  --raw "/apis/metrics.k8s.io/v1beta1"
Error from server (ServiceUnavailable): the server is currently unable to handle the request

prometheus-adapter.yaml

...
      - args:
        - --cert-dir=/var/run/serving-cert
        - --config=/etc/adapter/config.yaml
        - --logtostderr=true
        - --metrics-relist-interval=1m
        - --prometheus-url=http://prometheus-k8s.monitoring.svc.cluster.local:9090/prometheus
        - --secure-port=6443
...

在寻找问题时,我通过在配置文件中添加hostNetwork: true 找到了解决方案 (#1060)。

当我认为解决方案成功时,我发现kubectl top nodes仍然不起作用。

$ kubectl get apiService v1beta1.metrics.k8s.io
v1beta1.metrics.k8s.io   monitoring/prometheus-adapter   True        64m

$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

$ kubectl get  --raw "/apis/metrics.k8s.io/v1beta1"
{"kind":"APIResourceList","apiVersion":"v1","groupVersion":"metrics.k8s.io/v1beta1","resources":[{"name":"nodes","singularName":"","namespaced":false,"kind":"NodeMetrics","verbs":["get","list"]},{"name":"pods","singularName":"","namespaced":true,"kind":"PodMetrics","verbs":["get","list"]}]}

查看Prometheus-adapter的日志

E0812 10:03:02.469561       1 provider.go:265] failed querying node metrics: unable to fetch node CPU metrics: unable to execute query: Get "http://prometheus-k8s.monitoring.svc.cluster.local:9090/prometheus/api/v1/query?query=sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++node_cpu_seconds_total%7Bmode%3D%22idle%22%7D%5B60s%5D%0A++%29%0A++%2A+on%28namespace%2C+pod%29+group_left%28node%29+%28%0A++++node_namespace_pod%3Akube_pod_info%3A%7Bnode%3D~%22node02.whisper-tech.net%7Cnode03.whisper-tech.net%22%7D%0A++%29%0A%29%0Aor+sum+by+%28node%29+%28%0A++1+-+irate%28%0A++++windows_cpu_time_total%7Bmode%3D%22idle%22%2C+job%3D%22windows-exporter%22%2Cnode%3D~%22node02.whisper-tech.net%7Cnode03.whisper-tech.net%22%7D%5B4m%5D%0A++%29%0A%29%0A&time=1628762582.467": dial tcp: lookup prometheus-k8s.monitoring.svc.cluster.local on 100.100.2.136:53: no such host

问题的原因是hostNetwork: true被添加到Prometheus-Adapter中,导致pod无法通过coreDNS访问集群中的Prometheus-K8s

我想出的一个想法是让Kubernetes nodes 通过coreDNS 访问集群的内部部分

有没有更好的方法来解决当前的问题?我该怎么办?

【问题讨论】:

    标签: kubernetes kubectl prometheus-operator


    【解决方案1】:

    您的 Pod 使用 hostNetwork 运行,因此您应该按照 Pod's DNS Policy 文档中的说明明确设置其 DNS 策略“ClusterFirstWithHostNet”:

    “ClusterFirstWithHostNet”:对于使用 hostNetwork 运行的 Pod,您应该明确设置其 DNS 策略“ClusterFirstWithHostNet”。

    我创建了一个简单的示例来说明它是如何工作的。


    首先,我使用hostNetwork: true 创建了app-1 Pod:

    $ cat app-1.yml
    kind: Pod
    apiVersion: v1
    metadata:
      name: app-1
    spec:
      hostNetwork: true
      containers:
      - name: dnsutils
        image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
        command:
          - sleep
          - "3600"
    
    $ kubectl apply -f app-1.yml
    pod/app-1 created
    

    我们可以测试app-1 无法解析,例如kubernetes.default.svc:

    $ kubectl exec -it app-1 -- sh
    
    / # nslookup kubernetes.default.svc
    Server:         169.254.169.254
    Address:        169.254.169.254#53
    
    ** server can't find kubernetes.default.svc: NXDOMAIN
    

    让我们将 dnsPolicy: ClusterFirstWithHostNet 添加到 app-1 Pod 并重新创建它:

    $ cat app-1.yml
    kind: Pod
    apiVersion: v1
    metadata:
      name: app-1
    spec:
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      containers:
      - name: dnsutils
        image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
        command:
          - sleep
          - "3600"
    
    $ kubectl delete pod app-1 && kubectl apply -f app-1.yml
    pod "app-1" deleted
    pod/app-1 created
    

    最后,我们可以检查app-1 Pod 是否能够解析kubernetes.default.svc

    $ kubectl exec -it app-1 -- sh
    / # nslookup kubernetes.default.svc
    Server:         10.8.0.10
    Address:        10.8.0.10#53
    
    Name:   kubernetes.default.svc.cluster.local
    Address: 10.8.0.1
    

    正如您在上面的示例中所见,ClusterFirstWithHostNet dnsPolicy 一切正常。

    有关详细信息,请参阅DNS for Services and Pods 文档。

    【讨论】:

      猜你喜欢
      • 2021-06-23
      • 2021-10-13
      • 2021-12-23
      • 1970-01-01
      • 2020-03-22
      • 2019-12-24
      • 2015-05-08
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多