【发布时间】:2021-11-29 08:17:40
【问题描述】:
我正在使用 fluent-bit 收集日志并将其传递给 fluentd 以在 Kubernetes 环境中进行处理。 Fluent-bit 实例由 DaemonSet 控制并从 docker 容器中读取日志。
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Mem_Buf_Limit 5MB
Skip_Long_Lines On
还有一个 fluent-bit 服务也在运行
Name: monitoring-fluent-bit-dips
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=fluent-bit-dips
app.kubernetes.io/version=1.8.10
helm.sh/chart=fluent-bit-0.19.6
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=fluent-bit-dips
Type: ClusterIP
IP Families: <none>
IP: 10.43.72.32
IPs: <none>
Port: http 2020/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.144:2020,10.42.1.155:2020,10.42.2.186:2020 + 1 more...
Session Affinity: None
Events: <none>
Fluentd 服务描述如下
Name: monitoring-logservice
Namespace: dips
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=logservice
app.kubernetes.io/version=1.9
helm.sh/chart=logservice-0.1.2
Annotations: meta.helm.sh/release-name: monitoring
meta.helm.sh/release-namespace: dips
Selector: app.kubernetes.io/instance=monitoring,app.kubernetes.io/name=logservice
Type: ClusterIP
IP Families: <none>
IP: 10.43.44.254
IPs: <none>
Port: http 24224/TCP
TargetPort: http/TCP
Endpoints: 10.42.0.143:24224
Session Affinity: None
Events: <none>
但是 fluent-bit 日志没有达到 fluentd 并出现以下错误
[error] [upstream] connection #81 to monitoring-fluent-bit-dips:24224 timed out after 10 seconds
我尝试了几种方法,例如;
- 重新部署 fluent-bit pod
- 重新部署 fluentd pod
- 将 fluent-bit 版本从 1.7.3 升级到 1.8.10
这是一个 Kubernetes 环境,其中 fluent-bit 能够在部署的早期阶段与 fluentd 进行通信。除此之外,当我使用 docker-desktop 环境在本地部署时,同样的流畅版本也可以工作。
我的猜测是
- fluent-bit 无法管理日志进程量
- Fluent 服务一旦重启就无法通信
任何人对此有任何经验或知道如何更深入地调试此问题?
使用流畅的运行 pod 描述更新以下内容
Name: monitoring-logservice-5b8864ffd8-gfpzc
Namespace: dips
Priority: 0
Node: sl-sy-k3s-01/10.16.1.99
Start Time: Mon, 29 Nov 2021 13:09:13 +0530
Labels: app.kubernetes.io/instance=monitoring
app.kubernetes.io/name=logservice
pod-template-hash=5b8864ffd8
Annotations: kubectl.kubernetes.io/restartedAt: 2021-11-29T12:37:23+05:30
Status: Running
IP: 10.42.0.143
IPs:
IP: 10.42.0.143
Controlled By: ReplicaSet/monitoring-logservice-5b8864ffd8
Containers:
logservice:
Container ID: containerd://102483a7647fd2f10bead187eddf69aa4fad72051d6602dd171e1a373d4209d7
Image: our.private.repo/dips/logservice/splunk:1.9
Image ID: our.private.repo/dips/logservice/splunk@sha256:531f15f523a251b93dc8a25056f05c0c7bb428241531485a22b94896974e17e8
Ports: 24231/TCP, 24224/TCP
Host Ports: 0/TCP, 0/TCP
State: Running
Started: Mon, 29 Nov 2021 13:09:14 +0530
Ready: True
Restart Count: 0
Liveness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Readiness: exec [/bin/healthcheck.sh] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
SOME_ENV_VARS
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from monitoring-logservice-token-g9kwt (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
monitoring-logservice-token-g9kwt:
Type: Secret (a volume populated by a Secret)
SecretName: monitoring-logservice-token-g9kwt
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
【问题讨论】:
-
您的问题中只有一个 Fluentd pod,您可以
kubectl describe这个 pod 并在您的问题中发布完整的输出吗? -
@gohm'c 有什么你想看的具体细节吗?因为它超过50行。
-
不用担心台词,直接贴出来。
标签: kubernetes fluentd fluent-bit