Kubernetes Ingress (GCE) 不断返回 502 错误答案

【问题标题】：Kubernetes Ingress (GCE) keeps returning 502 errorKubernetes Ingress (GCE) 不断返回 502 错误
【发布时间】：2017-08-15 12:26:39
【问题描述】：

我正在尝试在 GCE Kubernetes 中设置 Ingress。但是当我访问 Ingress 中定义的 IP 地址和路径组合时，我不断收到以下 502 错误：

这是我跑步时得到的结果：kubectl describe ing --namespace dpl-staging

Name:           dpl-identity
Namespace:      dpl-staging
Address:        35.186.221.153
Default backend:    default-http-backend:80 (10.0.8.5:8080)
TLS:
  dpl-identity terminates
Rules:
  Host  Path    Backends
  ----  ----    --------
  *
        /api/identity/*     dpl-identity:4000 (<none>)
Annotations:
  https-forwarding-rule:    k8s-fws-dpl-staging-dpl-identity--5fc40252fadea594
  https-target-proxy:       k8s-tps-dpl-staging-dpl-identity--5fc40252fadea594
  url-map:          k8s-um-dpl-staging-dpl-identity--5fc40252fadea594
  backends:         {"k8s-be-31962--5fc40252fadea594":"HEALTHY","k8s-be-32396--5fc40252fadea594":"UNHEALTHY"}
Events:
  FirstSeen LastSeen    Count   From                SubObjectPath   Type        Reason  Message
  --------- --------    -----   ----                -------------   --------    ------  -------
  15m       15m     1   {loadbalancer-controller }          Normal      ADD dpl-staging/dpl-identity
  15m       15m     1   {loadbalancer-controller }          Normal      CREATE  ip: 35.186.221.153
  15m       6m      4   {loadbalancer-controller }          Normal      Service no user specified default backend, using system default

我认为问题出在dpl-identity:4000 (<none>)。我不应该看到dpl-identity服务的IP地址而不是<none>吗？

这是我的服务描述：kubectl describe svc --namespace dpl-staging

Name:           dpl-identity
Namespace:      dpl-staging
Labels:         app=dpl-identity
Selector:       app=dpl-identity
Type:           NodePort
IP:             10.3.254.194
Port:           http    4000/TCP
NodePort:       http    32396/TCP
Endpoints:      10.0.2.29:8000,10.0.2.30:8000
Session Affinity:   None
No events.

另外，执行结果如下：kubectl describe ep -n dpl-staging dpl-identity

Name:       dpl-identity
Namespace:  dpl-staging
Labels:     app=dpl-identity
Subsets:
  Addresses:        10.0.2.29,10.0.2.30
  NotReadyAddresses:    <none>
  Ports:
    Name    Port    Protocol
    ----    ----    --------
    http    8000    TCP

No events.

这是我的 deployment.yaml：

apiVersion: v1
kind: Secret
metadata:
  namespace: dpl-staging
  name: dpl-identity
type: Opaque
data:
  tls.key: <base64 key>
  tls.crt: <base64 crt>
---
apiVersion: v1
kind: Service
metadata:
  namespace: dpl-staging
  name: dpl-identity
  labels:
    app: dpl-identity
spec:
  type: NodePort
  ports:
    - port: 4000
      targetPort: 8000
      protocol: TCP
      name: http
  selector:
    app: dpl-identity
---
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: dpl-staging
  name: dpl-identity
  labels:
    app: dpl-identity
  annotations:
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
  - secretName: dpl-identity
  rules:
  - http:
      paths:
        - path: /api/identity/*
          backend:
            serviceName: dpl-identity
            servicePort: 4000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  namespace: dpl-staging
  name: dpl-identity
kind: Ingress
metadata:
  namespace: dpl-staging
  name: dpl-identity
  labels:
    app: dpl-identity
  annotations:
    kubernetes.io/ingress.allow-http: "false"
spec:
  tls:
  - secretName: dpl-identity
  rules:
  - http:
      paths:
        - path: /api/identity/*
          backend:
            serviceName: dpl-identity
            servicePort: 4000
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  namespace: dpl-staging
  name: dpl-identity
  labels:
    app: dpl-identity
spec:
  replicas: 2
  strategy:
    type: RollingUpdate
  template:
    metadata:
      labels:
        app: dpl-identity
    spec:
      containers:
      - image: gcr.io/munpat-container-engine/dpl/identity:0.4.9
        name: dpl-identity
        ports:
        - containerPort: 8000
          name: http
        volumeMounts:
        - name: dpl-identity
          mountPath: /data
      volumes:
      - name: dpl-identity
        secret:
          secretName: dpl-identity

【问题讨论】：

你能执行kubectl describe ep -n dpl-staging dpl-identity吗？
@JanosLenart：我已经用请求的信息更新了我的答案。非常感谢您的帮助。

标签： kubernetes google-cloud-platform google-kubernetes-engine google-compute-engine

【解决方案1】：

您的后端k8s-be-32396--5fc40252fadea594 显示为"UNHEALTHY"。

如果后端不健康，Ingress 将不会转发流量，这将导致您看到的 502 错误。

它将被标记为不健康，因为它没有通过健康检查，您可以检查 k8s-be-32396--5fc40252fadea594 的健康检查设置，看看它们是否适合您的 pod，它可能正在轮询未返回 200 响应的 URI 或端口。您可以在 Compute Engine > Health Checks 下找到这些设置。

如果它们是正确的，那么您的浏览器和容器之间的许多步骤可能会错误地传递流量，您可以尝试kubectl exec -it PODID -- bash（如果您使用的是 Alpine，则为 ash），然后尝试 curl-ing localhost 以查看是否容器按预期响应，如果是并且健康检查也配置正确，那么这将缩小问题可能与您的服务有关，然后您可以尝试将服务从 NodePort 类型更改为 LoadBalancer 并查看是否命中直接来自浏览器的服务 IP 可以工作。

【讨论】：

非常感谢您的回答 - 我遇到了同样的问题，并且根据您提供的信息，可以确定 readinessProbe 没有为 pod 配置，因此它被标记为不健康
感谢kubectl exec -it PODID -- ash，@GrandVizier 非常漂亮
我也遇到了同样的问题，请您帮忙解决一下
我也遇到了同样的问题，请您帮忙解决一下。我使用 abobe 命令进入 pod shell，可以看到localhost/api/items 正在响应 json 响应。虽然无法使用 GKE Ingress 获得结果。请建议摆脱这个瓶颈

【解决方案2】：

我遇到了同样的问题。事实证明，我必须在入口前等待几分钟才能验证服务运行状况。如果有人要这样做并完成了readinessProbe 和linvenessProbe 之类的所有步骤，只需确保您的入口指向NodePort 的服务，然后等待几分钟直到黄色警告图标变为一个绿色的。此外，检查 StackDriver 上的日志以更好地了解正在发生的事情。我的readinessProbe 和livenessProbe 在/login 上，用于gce 课程。所以我不认为它必须在/healthz。

【讨论】：

【解决方案3】：

问题确实是一个健康检查，对于我的应用程序来说似乎是“随机的”，我使用基于名称的虚拟主机将代理请求从通过域的入口反向到两个单独的后端服务。两者都使用 Lets Encrypt 和 kube-lego 保护。我的解决方案是对共享一个入口的所有服务的健康检查路径进行标准化，并在我的deployment.yml 文件中声明readinessProbe 和livenessProbe 配置。

我在使用 Google 云集群节点版本 1.7.8 时遇到了这个问题，并发现这个问题与我所经历的非常相似： * https://github.com/jetstack/kube-lego/issues/27

我正在使用gce 和kube-lego，我的后端服务运行状况检查在/ 上，kube-lego 在/healthz 上。似乎与 gce ingress 进行健康检查的路径不同可能是原因，因此可能值得更新后端服务以匹配 /healthz 模式，因此所有服务都使用相同（或者正如 Github 问题中的一位评论者所说，他们更新了 kube-lego 以通过在/)。

【讨论】：

【解决方案4】：

我遇到了同样的问题，并且在我启用 livenessProbe 以及 readinessPorbe 后它仍然存在。原来这与基本身份验证有关。我在livenessProbe 和readinessPorbe 中添加了基本身份验证，但结果表明 GCE HTTP(S) 负载平衡器没有配置选项。

似乎还有其他一些问题，例如将容器端口设置为 8080 并将服务端口设置为 80 不适用于 GKE 入口控制器（但我不会明确指出问题所在）。从广义上讲，在我看来，可见性非常低，运行自己的入口容器在可见性方面是一个更好的选择。

我为我的项目选择了Traefik，它开箱即用，我想启用 Let's Encrypt 集成。我必须对 Traefik 清单进行的唯一更改是调整服务对象以禁用从集群外部访问 UI 并通过外部负载均衡器 (GCE TCP LB) 公开我的应用程序。此外，Traefik 更原生于 Kubernetes。我尝试了 Heptio Contour，但有些东西开箱即用（下次新版本出来时会试一试）。

【讨论】：

【解决方案5】：

我有同样的问题。我发现 pod 本身运行正常，我通过端口转发和访问健康检查 URL 进行了测试。

Port-Forward 可以在控制台中激活如下：

$    kubectl port-forward <pod-name> local-port:pod-port

因此，如果 pod 运行正常并且 ingress 仍然显示不健康状态，则可能是您的服务配置存在问题。在我的情况下，我的应用程序选择器不正确，导致选择了不存在的 pod。有趣的是，这不会在 google 控制台中显示为错误或警报。

pod 的定义：

#pod-definition.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: <pod-name>
  namespace: <namespace>
spec:
  selector:
    matchLabels:
      app: **<pod-name>**
  template:
    metadata:
      labels:
        app: <pod-name>
    spec:
    #spec-definition follows

#service.yaml    
apiVersion: v1
    kind: Service
    metadata:
      name: <name-of-service-here>
      namespace: <namespace>
    spec:
      type: NodePort
      selector:
        app: **<pod-name>**
      ports:
      - protocol: TCP
        port: 8080
        targetPort: 8080
        name: <port-name-here>

【讨论】：

【解决方案6】：

kubernetes 文档的 "Limitations" 部分指出：

所有 Kubernetes 服务必须在“/”上提供 200 页，或者您通过 GLBC 的 --health-check-path argument 指定的任何自定义值。

https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/cluster-loadbalancing/glbc#limitations

【讨论】：

【解决方案7】：

我解决了这个问题

从入口定义中删除服务
部署入口kubectl apply -f ingress.yaml
将服务添加到入口定义
再次部署入口

基本上，我听从了 Roy 的建议，并尝试将其关闭再打开。

【讨论】：

【解决方案8】：

日志可以从 Stackdriver Logging 中读取，在我的情况下，它是 backend_timeout 错误。通过 BackendConfig 增加默认超时时间（30 秒）后，即使在负载下也停止返回 502。

【讨论】：

【解决方案9】：

在使用 successThreshold: 1 和 failureThreshold: 3 添加以下就绪和活跃度探测后，我已解决此问题。此外，我将 initialDelaySeconds 保持为 70，因为有时应用程序响应有点晚，每个应用程序可能会有所不同。

注意： 还要确保 httpGet 中的路径应该存在于您的应用程序中（例如在我的情况下是 /api/books），其他明智的 GCP ping /healthz 路径并没有'不保证返回 200 OK。

    readinessProbe:
      httpGet:
        path: /api/books
        port: 80
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3
      initialDelaySeconds: 70
      timeoutSeconds: 60
    livenessProbe:
      httpGet:
        path: /api/books
        port: 80
      initialDelaySeconds: 70
      periodSeconds: 5
      successThreshold: 1
      failureThreshold: 3 
      timeoutSeconds: 60

经过一番挣扎，尝试了很多事情，我终于能够理清了。

继续学习和分享

【讨论】：

【解决方案10】：

当我使用错误的图像并且由于配置不同而无法满足请求时，我遇到了同样的问题。

【讨论】：