【发布时间】:2020-11-10 13:45:18
【问题描述】:
我可以成功加入和离开作为 Docker 容器在我的本地 Docker 服务器上运行的单节点 Apache Ignite 2.8.1 拓扑。
在远程 Docker 服务器上运行完全相同的程序我可以看到我的程序加入集群拓扑但在连接完成之前我收到以下连接错误
SEVERE: Failed to send message to remote node [node=TcpDiscoveryNode [id=a239f009-bddd-4a06-845f-abb304850849, consistentId=127.0.0.1,172.17.0.13:42002, addrs=ArrayList [127.0.0.1, 172.17.0.13], sockAddrs=HashSet [/172.17.0.13:42002, /127.0.0.1:42002], discPort=42002, order=1, intOrder=1, lastExchangeTime=1605015503009, loc=false, ver=2.8.1#20200521-sha1:86422096, isClient=false], msg=GridIoMessage [plc=2, topic=TOPIC_CACHE, topicOrd=8, ordered=false, timeout=0, skipOnTimeout=false, msg=GridDhtPartitionsSingleMessage [parts=null, partCntrs=null, partsSizes=null, partHistCntrs=null, err=null, client=true, exchangeStartTime=106333448635300, finishMsg=null, super=GridDhtPartitionsAbstractMessage [exchId=GridDhtPartitionExchangeId [topVer=AffinityTopologyVersion [topVer=2, minorTopVer=0], discoEvt=DiscoveryEvent [evtNode=TcpDiscoveryNode [id=dc9a3700-5377-4095-ac2b-31a2cea3d9a5, consistentId=dc9a3700-5377-4095-ac2b-31a2cea3d9a5, addrs=ArrayList [0:0:0:0:0:0:0:1, 10.91.7.30, 127.0.0.1, 192.168.1.81, 192.168.38.1], sockAddrs=HashSet [host.docker.internal/192.168.1.81:0, /0:0:0:0:0:0:0:1:0, GBLG7Y7GH2.mshome.net/192.168.38.1:0, /127.0.0.1:0, GBLG7Y7GH2.enterprisenet.org/10.91.7.30:0], discPort=0, order=2, intOrder=0, lastExchangeTime=1605015498538, loc=true, ver=2.8.1#20200521-sha1:86422096, isClient=true], topVer=2, nodeId8=dc9a3700, msg=null, type=NODE_JOINED, tstamp=1605015505481], nodeId=dc9a3700, evt=NODE_JOINED], lastVer=GridCacheVersion [topVer=0, order=1605015496511, nodeOrder=0], super=GridCacheMessage [msgId=1, depInfo=null, lastAffChangedTopVer=AffinityTopologyVersion [topVer=-1, minorTopVer=0], err=null, skipPrepare=false]]]]]
class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a239f009-bddd-4a06-845f-abb304850849, addrs=[/172.17.0.13:42003, /127.0.0.1:42003]]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3738)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createTcpClient(TcpCommunicationSpi.java:3458)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createCommunicationClient(TcpCommunicationSpi.java:3198)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.reserveClient(TcpCommunicationSpi.java:3078)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage0(TcpCommunicationSpi.java:2918)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.sendMessage(TcpCommunicationSpi.java:2877)
at org.apache.ignite.internal.managers.communication.GridIoManager.send(GridIoManager.java:2035)
at org.apache.ignite.internal.managers.communication.GridIoManager.sendToGridTopic(GridIoManager.java:2132)
at org.apache.ignite.internal.processors.cache.GridCacheIoManager.send(GridCacheIoManager.java:1257)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.sendLocalPartitions(GridDhtPartitionsExchangeFuture.java:2020)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.clientOnlyExchange(GridDhtPartitionsExchangeFuture.java:1436)
at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPartitionsExchangeFuture.init(GridDhtPartitionsExchangeFuture.java:903)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body0(GridCachePartitionExchangeManager.java:3214)
at org.apache.ignite.internal.processors.cache.GridCachePartitionExchangeManager$ExchangeWorker.body(GridCachePartitionExchangeManager.java:3063)
at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:120)
at java.lang.Thread.run(Thread.java:748)
Suppressed: class org.apache.ignite.IgniteCheckedException: Failed to connect to node (is node still alive?). Make sure that each ComputeTask and cache Transaction has a timeout set in order to prevent parties from waiting forever in case of network issues [nodeId=a239f009-bddd-4a06-845f-abb304850849, addrs=[/172.17.0.13:42003, /127.0.0.1:42003]]
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3740)
... 15 more
Caused by: java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3584)
... 15 more
Caused by: java.net.SocketTimeoutException
at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:129)
at org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi.createNioSession(TcpCommunicationSpi.java:3584)
... 15 more
在我看来,问题与客户端连接设置有关,因此我尝试增加客户端发现 SPI“joinTimeout”、“networkTimeout”和“socketTimeout”设置以及“connectionTimeout”和“socketWriteTimeout”设置但没有成功。
【问题讨论】:
-
看来客户端节点a239f009-bddd-4a06-845f-abb304850849无法通过通信SPI访问。你有什么通信配置?您在 Docker 中是否正确映射了通信端口?
-
是的,所有端口都映射到远程 Docker 中。但是,我想我已经想通了。这似乎是一个相当不可能的场景,因为我的远程 Docker 服务器在公共 VPS 上运行,并且通信 SPI 无法将自己绑定到我的 VPS 的公共 IP。因此,客户端成功地发现了集群,然后尝试与它绑定的地址上的通信 SPI 通信,该地址不是 Docker 服务器运行的 VPS 的公共互联网 IP。我的理解正确吗?
标签: ignite