【问题标题】:CrashLoopBackOff in Prometheus's AlertManagerPrometheus 的 AlertManager 中的 CrashLoopBackOff
【发布时间】:2018-11-29 18:50:13
【问题描述】:

我正在尝试为我的 Kubernetes 集群设置 AlertManager。我已关注此文档 (https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/getting-started.md) -> 一切正常。

关于设置AlertManager,我正在研究这个文档(https://github.com/coreos/prometheus-operator/blob/master/Documentation/user-guides/alerting.md

我正在为alertmanager-example-0 获取CrashLoopBackOff。请检查随附的日志:

第一张图片:$ kubectl logs -f prometheus-operator-88fcf6d95-zctgw -n monitoring

第二张图片:$ kubectl describe pod alertmanager-example-0

谁能指出我做错了什么?提前致谢。

【问题讨论】:

    标签: docker kubernetes prometheus prometheus-alertmanager prometheus-operator


    【解决方案1】:

    听起来您有一个问题,您的警报管理器 pod 使用的 RBACService Account (system:serviceaccount:monitoring:prometheus-operator) 没有足够的权限与 kube-apiserver 通信。

    就您而言,Prometheus Operator 的 ClusterRoleBinding prometheus-operator 如下所示:

    $ kubectl get clusterrolebinding prometheus-operator -o=yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRoleBinding
    metadata:
      labels:
        app: prometheus-operator
      name: prometheus-operator
    roleRef:
      apiGroup: rbac.authorization.k8s.io
      kind: ClusterRole
      name: prometheus-operator
    subjects:
    - kind: ServiceAccount
      name: prometheus-operator
      namespace: monitoring
    

    更重要的是,ClusterRole 应该如下所示:

    $ kubectl get clusterrole prometheus-operator -o=yaml
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      labels:
        app: prometheus-operator
      name: prometheus-operator
    rules:
    - apiGroups:
      - extensions
      resources:
      - thirdpartyresources
      verbs:
      - '*'
    - apiGroups:
      - apiextensions.k8s.io
      resources:
      - customresourcedefinitions
      verbs:
      - '*'
    - apiGroups:
      - monitoring.coreos.com
      resources:
      - alertmanager
      - alertmanagers
      - prometheus
      - prometheuses
      - service-monitor
      - servicemonitors
      - prometheusrules
      verbs:
      - '*'
    - apiGroups:
      - apps
      resources:
      - statefulsets
      verbs:
      - '*'
    - apiGroups:
      - ""
      resources:
      - configmaps
      - secrets
      verbs:
      - '*'
    - apiGroups:
      - ""
      resources:
      - pods
      verbs:
      - list
      - delete
    - apiGroups:
      - ""
      resources:
      - services
      - endpoints
      verbs:
      - get
      - create
      - update
    - apiGroups:
      - ""
      resources:
      - nodes
      verbs:
      - list
      - watch
    - apiGroups:
      - ""
      resources:
      - namespaces
      verbs:
      - list
      - watch
    

    【讨论】:

    • 感谢您的支持,我创建的秘密是错误的。现在它正在工作,你能指导我创建自己的警报,将通知发送到我的 gmail
    猜你喜欢
    • 2021-07-13
    • 1970-01-01
    • 2020-12-11
    • 1970-01-01
    • 2021-12-11
    • 2020-09-01
    • 2021-11-23
    • 2022-01-16
    • 2020-02-24
    相关资源
    最近更新 更多