【发布时间】:2020-02-01 16:47:52
【问题描述】:
生产环境偶尔会出现以下异常,
2020-01-29 17:10:46.085 ERROR 2852 --- [o-8022-exec-258] c.c.p.common.dao.SearchDao : Search person by id failed
java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-832 [ACTIVE]
at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:789) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:225) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1403) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1373) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
at org.elasticsearch.client.RestHighLevelClient.get(RestHighLevelClient.java:699) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-832 [ACTIVE]
at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
但这只是一个简单的查询,不是一个复杂的查询
curl 'http://localhost:9201/person/_doc/30154410564?pretty'
那么为什么会存在这些超时异常呢?并且有很多搜索查询但是为什么只有这个简单的query by id 会导致这个异常?
person索引是从Oracle DB同步的,有一个定时任务,每隔10分钟会同步变化的person索引,如果在这段时间内访问person索引,会导致30,000 milliseconds timeout。那么如何解决呢?而且好像Java客户端访问会出现这种现象,但是命令行curl访问就不会出现这种现象了。
PS:
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
green open person jb3msRw5S9ixgXN5SLd6bw 1 0 140754205 19239587 19.8gb 19.8gb
RestClient 配置:
private final RestHighLevelClient restHighLevelClient;
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost(host, port)));
【问题讨论】:
-
您能否在这些超时期间提供慢查询日志,并分享您的搜索查询和索引统计信息(多少分片、副本、没有文档等),并在这些超时期间检查elastic.co/guide/en/elasticsearch/reference/current/… 和从中提供信息。这种问题需要大量数据来执行 RCA,并且这个简单查询的超时很可能是在此期间运行的其他一些代价高昂的操作(它会占用资源),因此它偶尔会发生
-
某事告诉我它与您的集群无关,这看起来很安静,但不知何故您的客户端无法访问您的集群,因此查询超时。你确定集群在 9201 端口上运行吗?
-
@Val 是的,它在端口 9201 上运行。
-
所以你可以 curl 任何其他查询并且它可以工作,只是不是那个?
-
那么您确定您的高级 REST 客户端配置正确吗?客户端确实无法访问集群,连接超时。
标签: elasticsearch resthighlevelclient