【问题标题】:Kafka create too many TIME WAIT TCP connectionKafka 创建了太多 TIME WAIT TCP 连接
【发布时间】:2019-01-08 07:40:40
【问题描述】:

我使用 Kafka 0.11.0.3

我有一个 Kafka 代理和一个远程 Zookeeper 集群。我启动了 Kafka 服务器,它在 Zookeeper 中成功注册了它的 id,我什至可以使用 kafka-topic.sh 命令获取主题列表。问题是我在 Kafka 日志中反复观察到以下几行:

[2019-01-08 10:51:09,138] WARN Attempting to send response via channel for which there is no open connection, connection id 192.168.0.201:9092-192.168.0.201:58292 (kafka.network.Processor)
[2019-01-08 10:51:09,198] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,226] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,306] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,327] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,382] WARN Attempting to send response via channel for which there is no open connection, connection id 192.168.0.201:9092-192.168.0.201:58296 (kafka.network.Processor)
[2019-01-08 10:51:09,408] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,446] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,559] INFO Creating /controller (is it secure? false) (kafka.utils.ZKCheckedEphemeral)
[2019-01-08 10:51:09,602] INFO Result of znode creation is: OK (kafka.utils.ZKCheckedEphemeral)

代理尝试连接到同一台机器(Kafka 服务器正在运行)上的端口 58292,但无法建立连接。 我还检查了 Zookeeper 上的控制器目录,它是空的。 更奇怪的是,当我在 Kafka 服务器节点上建立 TCP 连接时,我观察到这么多 TIME_WAIT 连接:

Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 192.168.0.201:55572     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56290     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55442     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55512     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56074     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56286     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55460     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55904     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55488     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56308     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55502     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56326     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55960     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55930     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56300     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56004     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55470     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55474     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55432     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55412     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56304     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55858     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55860     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56324     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55388     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56168     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55898     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55820     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55676     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56202     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55756     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56278     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55658     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55628     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56038     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56108     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55988     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55894     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55428     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55424     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56128     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56146     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55884     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56280     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55798     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56120     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55888     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55708     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55696     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56298     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55646     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56150     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55376     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55980     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55556     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56208     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55752     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55982     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55864     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55760     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56056     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56002     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55536     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55576     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55392     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55726     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55426     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55710     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56042     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56264     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55606     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55972     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56176     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55780     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56342     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55534     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55438     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56114     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56068     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55880     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56350     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55970     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55404     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55672     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55454     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55946     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56126     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55538     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56124     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55712     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56084     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55992     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56302     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55984     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55394     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55550     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56094     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55936     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55530     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55868     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:56294     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0      0 192.168.0.201:55876     192.168.0.201:9092      TIME_WAIT   -                   
tcp        0     31 192.168.0.201:57552     192.168.0.204:2181      ESTABLISHED 1015/java           

唯一成功建立的连接是 Zookeeper(在最后一行)。我还从远程节点检查了端口 9092,它是打开的:

Starting Nmap 7.01 ( https://nmap.org ) at 2019-01-08 11:32 +0330
Nmap scan report for (192.168.0.201)
Host is up (0.0027s latency).
PORT     STATE SERVICE
9092/tcp open  unknown

Nmap done: 1 IP address (1 host up) scanned in 0.08 seconds

几点:

  • broker 正常运行了大约 2 个月,但错误突然发生。
  • Zookeeper 集群工作正常,因为 HDFS 等其他一些组件正在使用它并且没有错误。
  • 操作系统为 CentOS7,未启用防火墙。

这里是 Kafka 服务器配置:

broker.id=100
listeners=PLAINTEXT://192.168.0.201:9092
num.partitions=24
delete.topic.enable=true
log.dirs=/data/esb
zookeeper.connect=co1:2181,co2:2181
log.retention.hours=168
zookeeper.session.timeout.ms=40000

TIME_WAIT 连接的原因可能是什么?

【问题讨论】:

  • 不确定经纪人/动物园管理员之间建立了多少/多快的连接,我认为这是一个,其他人正在等待最终的 ACK 以关闭连接。也许是网络泛滥或资源不足?您可以通过获取/var/log/nmon 下的日志文件来废弃更多信息,并将它们提供给 NMOM 可视化工具 (nmonvisualizer.github.io/nmonvisualizer);还要检查 kafka/zookeeper GC 日志以查找是否存在等待资源的减速
  • 我认为,例如192.168.0.201:55388192.168.0.201:9092 建立连接时,端口 9092 也应该返回并建立到 55388 的连接(我在工作的 Kafka 代理中观察到这一点),但这并没有'不会发生,从 55388 到 9092 的连接将是 TIME_WAIT。
  • 顺便说一句,您应该至少使用 3 个 Zookeeper。绝对不是两个
  • @cricket_007 使用 3 个节点的 Zookeeper,或者 Zookeeper 的节点数一般为奇数,建议不要强制!原因是 Zokeeper 集群是容错的,直到集群的 (n/2) + 1 个节点启动并工作。
  • 好吧,你最好确保你不能失去其中任何一个。意外断电,甚至网络中断。这就是我要说的。

标签: tcp apache-kafka centos7


【解决方案1】:

我之前遇到过类似的TIME_WAIT问题,你可以查看你的zookeeper日志,默认位置是:

/bin/zookeeper.out

我的问题的原因基本上是权限问题:我使用普通用户启动了zookeeper,但不知何故/zkdata下的文件归root所有

zookeeper 日志会告诉你原因

【讨论】:

    猜你喜欢
    • 2014-07-25
    • 2019-05-08
    • 2015-06-13
    • 2011-07-31
    • 1970-01-01
    • 1970-01-01
    • 2016-02-19
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多