【发布时间】:2015-03-14 15:32:29
【问题描述】:
我在 Spring AMQP v1.4.2 中测试以下场景,网络中断后无法重新连接:
- 启动 spring 应用程序,使用 rabbit:listener-container 和 rabbit:connection-factory 异步消费消息(详细配置如下)。
- 日志显示应用程序正在成功接收消息。
- 通过丢弃rabbit服务器上的入站网络流量使RabbitMQ对应用程序不可见:
sudo iptables -A INPUT -p tcp --destination-port 5672 -j DROP - 等待至少 3 分钟(网络连接超时)。
- 修复连接:
sudo iptables -D INPUT -p tcp --destination-port 5672 -j DROP - 等待一段时间(甚至尝试了一个多小时),但没有发生重新连接。
- 重新启动应用程序,它又开始接收消息,这意味着网络恢复正常。
我还使用 VM 网络适配器断开连接而不是 iptables drop 测试了相同的场景,并且发生了同样的事情,即没有自动重新连接。有趣的是,当我尝试使用 iptables REJECT 而不是 DROP 时,它按预期工作,并且应用程序在我删除拒绝规则后立即重新启动,但我认为拒绝更像是服务器故障而不是网络故障.
如果 MessageListener 由于业务异常而失败,则异常由消息侦听器容器处理,然后它会返回侦听另一条消息。如果失败是由断开的连接(不是业务异常)引起的,那么正在为侦听器收集消息的消费者必须被取消并重新启动。 SimpleMessageListenerContainer 无缝处理这个问题,它会留下一个日志说监听器正在重新启动。 事实上,它会无休止地循环尝试重新启动消费者,只有当消费者表现得很糟糕时才会这样做放弃。一个副作用是,如果代理在容器启动时关闭,它将继续尝试直到可以建立连接。
这是我在断开连接大约一分钟后得到的日志:
2015-01-16 14:00:42,433 WARN [SimpleAsyncTaskExecutor-5] org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer Consumer raised exception, processing can restart if the connection factory supports it
com.rabbitmq.client.ShutdownSignalException: connection error
at com.rabbitmq.client.impl.AMQConnection.startShutdown(AMQConnection.java:717) ~[amqp-client-3.4.2.jar:na]
at com.rabbitmq.client.impl.AMQConnection.shutdown(AMQConnection.java:707) ~[amqp-client-3.4.2.jar:na]
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:565) ~[amqp-client-3.4.2.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
Caused by: java.io.EOFException: null
at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290) ~[na:1.7.0_55]
at com.rabbitmq.client.impl.Frame.readFrom(Frame.java:95) ~[amqp-client-3.4.2.jar:na]
at com.rabbitmq.client.impl.SocketFrameHandler.readFrame(SocketFrameHandler.java:139) ~[amqp-client-3.4.2.jar:na]
at com.rabbitmq.client.impl.AMQConnection$MainLoop.run(AMQConnection.java:534) ~[amqp-client-3.4.2.jar:na]
... 1 common frames omitted
我在重新连接几秒钟后收到此日志消息:
2015-01-16 14:18:14,551 WARN [SimpleAsyncTaskExecutor-2] org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer Consumer raised exception, processing can restart if the connection factory supports it. Exception summary: org.springframework.amqp.AmqpConnectException: java.net.ConnectException: Connection timed out
更新:很奇怪,当我在 org.springframework.amqp 包上启用 DEBUG 日志记录时,重新连接成功,我无法再重现该问题!
在未启用调试日志记录的情况下,我尝试调试 spring AMQP 代码。我观察到,在删除 iptables drop 后不久,SimpleMessageListenerContainer.doStop() 方法被调用,它又调用了 shutdown() 并取消了所有通道。当我在 doStop() 上设置断点时,我也收到了这条日志消息,这似乎与原因有关:
2015-01-20 15:28:44,200 ERROR [pool-1-thread-16] org.springframework.amqp.rabbit.connection.CachingConnectionFactory Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=405, reply-text=RESOURCE_LOCKED - cannot obtain exclusive access to locked queue 'e4288669-2422-40e6-a2ee-b99542509273' in vhost '/', class-id=50, method-id=10)
2015-01-20 15:28:44,243 WARN [SimpleAsyncTaskExecutor-3] org.springframework.amqp.rabbit.listener.BlockingQueueConsumer Failed to declare queue:e4288669-2422-40e6-a2ee-b99542509273
2015-01-20 15:28:44,243 WARN [SimpleAsyncTaskExecutor-3] org.springframework.amqp.rabbit.listener.BlockingQueueConsumer Queue declaration failed; retries left=0
org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[e4288669-2422-40e6-a2ee-b99542509273]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:486) ~[spring-rabbit-1.4.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:401) ~[spring-rabbit-1.4.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1022) [spring-rabbit-1.4.2.RELEASE.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
2015-01-20 15:28:49,245 ERROR [pool-1-thread-16] org.springframework.amqp.rabbit.connection.CachingConnectionFactory Channel shutdown: channel error; protocol method: #method<channel.close>(reply-code=405, reply-text=RESOURCE_LOCKED - cannot obtain exclusive access to locked queue 'e4288669-2422-40e6-a2ee-b99542509273' in vhost '/', class-id=50, method-id=10)
2015-01-20 15:28:49,283 WARN [SimpleAsyncTaskExecutor-3] org.springframework.amqp.rabbit.listener.BlockingQueueConsumer Failed to declare queue:e4288669-2422-40e6-a2ee-b99542509273
2015-01-20 15:28:49,300 ERROR [SimpleAsyncTaskExecutor-3] org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer Consumer received fatal exception on startup
org.springframework.amqp.rabbit.listener.QueuesNotAvailableException: Cannot prepare queue for listener. Either the queue doesn't exist or the broker will not allow us to use it.
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:429) ~[spring-rabbit-1.4.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer$AsyncMessageProcessingConsumer.run(SimpleMessageListenerContainer.java:1022) ~[spring-rabbit-1.4.2.RELEASE.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.7.0_55]
Caused by: org.springframework.amqp.rabbit.listener.BlockingQueueConsumer$DeclarationException: Failed to declare queue(s):[e4288669-2422-40e6-a2ee-b99542509273]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.attemptPassiveDeclarations(BlockingQueueConsumer.java:486) ~[spring-rabbit-1.4.2.RELEASE.jar:na]
at org.springframework.amqp.rabbit.listener.BlockingQueueConsumer.start(BlockingQueueConsumer.java:401) ~[spring-rabbit-1.4.2.RELEASE.jar:na]
... 2 common frames omitted
2015-01-20 15:28:49,301 ERROR [SimpleAsyncTaskExecutor-3] org.springframework.amqp.rabbit.listener.SimpleMessageListenerContainer Stopping container from aborted consumer
更新 2: 将requested-heartbeat 设置为 30 秒后,按照答案中的建议,重新连接大部分时间都有效,并成功重新定义了专用临时队列,绑定到扇出交换,但它仍然偶尔无法重新连接。
在极少数失败的情况下,我在测试期间监控了 RabbitMQ 管理控制台,观察到新连接已建立(旧连接因超时而被删除后),但重新连接后未重新定义独占临时队列。客户端也没有收到任何消息。现在真的很难可靠地重现该问题,因为它发生的频率较低。我在下面提供了完整的配置,现在包含队列声明。
更新 3: 即使在用自动删除命名队列替换独占临时队列后,偶尔也会发生相同的行为;即重新连接后不会重新定义自动删除命名队列,并且在重新启动应用程序之前不会收到任何消息。
如果有人能在这方面帮助我,我将不胜感激。
这是我所依赖的spring AMQP配置:
<!-- Create a temporary exclusive queue to subscribe to the control exchange -->
<rabbit:queue id="control-queue"/>
<!-- Bind the temporary queue to the control exchange -->
<rabbit:fanout-exchange name="control">
<rabbit:bindings>
<rabbit:binding queue="control-queue"/>
</rabbit:bindings>
</rabbit:fanout-exchange>
<!-- Subscribe to the temporary queue -->
<rabbit:listener-container connection-factory="connection-factory"
acknowledge="none"
concurrency="1"
prefetch="1">
<rabbit:listener queues="control-queue" ref="controlQueueConsumer"/>
</rabbit:listener-container>
<rabbit:connection-factory id="connection-factory"
username="${rabbit.username}"
password="${rabbit.password}"
host="${rabbit.host}"
virtual-host="${rabbit.virtualhost}"
publisher-confirms="true"
channel-cache-size="100"
requested-heartbeat="30" />
<rabbit:admin id="admin" connection-factory="connection-factory"/>
<rabbit:queue id="qu0-id" name="qu0">
<rabbit:queue-arguments>
<entry key="x-dead-letter-exchange" value="dead-letter"/>
</rabbit:queue-arguments>
</rabbit:queue>
<rabbit:topic-exchange id="default-exchange" name="default-ex" declared-by="admin">
<rabbit:bindings>
<rabbit:binding queue="qu0" pattern="p.0"/>
</rabbit:bindings>
</rabbit:topic-exchange>
<rabbit:listener-container connection-factory="connection-factory"
acknowledge="manual"
concurrency="4"
prefetch="30">
<rabbit:listener queues="qu0" ref="queueConsumerComponent"/>
</rabbit:listener-container>
【问题讨论】:
-
你不是说最早的 Spring AMQP 版本没有这个问题吗?
-
您介意在
DEBUG级别共享org.springframework.amqp.rabbit.listener类别的日志以查看有关此事的更多信息吗?顺便说一句,我刚刚尝试过在 Windows 上使用tcpTrace进行类似(或不是?)仿真,并在日志中看到类似的Caused by: java.io.EOFException: null at java.io.DataInputStream.readUnsignedByte。但是当我重新启动trace时,连接就会恢复。我的 AMQP 客户端是3.4.2- Spring AMQP 的传递依赖。 -
不特定于 Spring AMQP,但如果您想要重新连接和恢复队列等资源的能力,您可以尝试使用 Lyra。
标签: spring spring-amqp