【发布时间】:2019-11-18 04:15:41
【问题描述】:
我已经在多个测试 Kubernetes 集群中部署了一个 StatefulSet 的 Apache Ignite。
我已通过当前配置的压力测试阶段。但是,我发现 Apache Ignite 中存在一些 OutOfMemory ERROR,有些在负载低得多的新测试集群中。
下面是我从 Apache Ignite 实例 1 中提取的日志快照:
INFO: TCP discovery spawning a new thread for connection [rmtAddr=/10.254.174.226, rmtPort=45453]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Runtime error caught during grid runnable execution: GridWorker [name=tcp-disco-client-message-worker, igniteInstanceName=null, finished=false, heartbeatTs=1573779638619, hashCode=373238347, interrupted=true, runner=tcp-disco-client-message-worker-#109]
java.lang.OutOfMemoryError: Java heap space
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger error
SEVERE: Runtime error caught during grid runnable execution: IgniteSpiThread [name=tcp-disco-client-message-worker-#109]
java.lang.OutOfMemoryError: Java heap space
Exception in thread "tcp-disco-client-message-worker-#109" java.lang.OutOfMemoryError: Java heap space
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: TCP discovery accepted incoming connection [rmtAddr=/10.254.183.232, rmtPort=41313]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: TCP discovery spawning a new thread for connection [rmtAddr=/10.254.183.232, rmtPort=41313]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: Started serving remote node connection [rmtAddr=/10.254.174.226:45453, rmtPort=45453]
Nov 15, 2019 @ 09:01:26.612 Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger warning
Nov 15, 2019 @ 09:01:26.612 WARNING: New next node has connection to it's previous, trying previous again. [next=TcpDiscoveryNode [id=5cbb5f1c-ca74-4b2f-ba70-314f621ab997, addrs=[10.254.168.12, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ignite-sit-5.ignite-sit.sit.svc.cluster.local/10.254.168.12:47500], discPort=47500, order=3922, intOrder=2000, lastExchangeTime=1573779246139, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
Nov 15, 2019 @ 09:01:26.612 Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
Nov 15, 2019 @ 09:01:26.612 INFO: New next node [newNext=TcpDiscoveryNode [id=6fcccf11-f903-4b4a-bbac-730ca0b80ce8, addrs=[10.254.169.217, 127.0.0.1], sockAddrs=[/127.0.0.1:47500, ignite-sit-4.ignite-sit.sit.svc.cluster.local/10.254.169.217:47500], discPort=47500, order=3912, intOrder=1993, lastExchangeTime=1573779190075, loc=false, ver=2.7.5#20190603-sha1:be4f2a15, isClient=false]]
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger info
INFO: Finished serving remote node connection [rmtAddr=/10.254.174.226:45453, rmtPort=45453
Nov 15, 2019 1:01:26 AM org.apache.ignite.logger.java.JavaLogger error
抱歉,日志格式错误。
我想知道导致 OutOfMemory 错误的原因以及如何防止这种情况再次发生。
我们将不胜感激。
更新: Heapdump 分析结果:
The thread org.apache.ignite.spi.discovery.tcp.ServerImpl$SocketReader @ 0xd9fbe2d0 tcp-disco-sock-reader-#369 keeps local variables with total size 312,295,344 (48.95%) bytes.
看起来 TCP SocketReader 需要大量堆内存。
【问题讨论】:
-
您需要在 OOM 上启用堆转储,然后分析所述堆。如果您发现其中有任何可疑之处,与您自己的代码无关,请在您的问题中说明。
-
我已经更新了堆分析结果
-
这很有趣,你的 Xmx 是什么?是不是大约 750M,所以 300M 的对象占用了几乎一半的堆?你能把Xmx增加到2G,看看会发生什么?知道disco-sock-reader中对象的分布也很好。
标签: java kubernetes out-of-memory ignite