【问题标题】:Why simple query by id could cause timeout exception?为什么简单的 id 查询会导致超时异常?
【发布时间】:2020-02-01 16:47:52
【问题描述】:

生产环境偶尔会出现以下异常,

2020-01-29 17:10:46.085 ERROR 2852 --- [o-8022-exec-258] c.c.p.common.dao.SearchDao               : Search person by id failed

java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-832 [ACTIVE]
        at org.elasticsearch.client.RestClient.extractAndWrapCause(RestClient.java:789) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:225) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:212) ~[elasticsearch-rest-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.internalPerformRequest(RestHighLevelClient.java:1433) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequest(RestHighLevelClient.java:1403) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.performRequestAndParseEntity(RestHighLevelClient.java:1373) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]
        at org.elasticsearch.client.RestHighLevelClient.get(RestHighLevelClient.java:699) ~[elasticsearch-rest-high-level-client-7.1.1.jar!/:7.1.1]

Caused by: java.net.SocketTimeoutException: 30,000 milliseconds timeout on connection http-outgoing-832 [ACTIVE]
        at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.timeout(HttpAsyncRequestExecutor.java:387) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:92) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
        at org.apache.http.impl.nio.client.InternalIODispatch.onTimeout(InternalIODispatch.java:39) ~[httpasyncclient-4.1.4.jar!/:4.1.4]
        at org.apache.http.impl.nio.reactor.AbstractIODispatch.timeout(AbstractIODispatch.java:175) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionTimedOut(BaseIOReactor.java:263) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.timeoutCheck(AbstractIOReactor.java:492) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.validate(BaseIOReactor.java:213) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:280) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104) ~[httpcore-nio-4.4.11.jar!/:4.4.11]
        at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:591) ~[httpcore-nio-4.4.11.jar!/:4.4.11]

但这只是一个简单的查询,不是一个复杂的查询

 curl 'http://localhost:9201/person/_doc/30154410564?pretty'

在这段时间内,负载非常低

那么为什么会存在这些超时异常呢?并且有很多搜索查询但是为什么只有这个简单的query by id 会导致这个异常?

person索引是从Oracle DB同步的,有一个定时任务,每隔10分钟会同步变化的person索引,如果在这段时间内访问person索引,会导致30,000 milliseconds timeout。那么如何解决呢?而且好像Java客户端访问会出现这种现象,但是命令行curl访问就不会出现这种现象了。

PS:

health status index               uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   person              jb3msRw5S9ixgXN5SLd6bw   1   0  140754205     19239587     19.8gb         19.8gb

并且在这段时间里,人索引有索引写入

RestClient 配置:

private final RestHighLevelClient restHighLevelClient;
restHighLevelClient = new RestHighLevelClient(RestClient.builder(new HttpHost(host, port)));

【问题讨论】:

  • 您能否在这些超时期间提供慢查询日志,并分享您的搜索查询和索引统计信息(多少分片、副本、没有文档等),并在这些超时期间检查elastic.co/guide/en/elasticsearch/reference/current/… 和从中提供信息。这种问题需要大量数据来执行 RCA,并且这个简单查询的超时很可能是在此期间运行的其他一些代价高昂的操作(它会占用资源),因此它偶尔会发生
  • 某事告诉我它与您的集群无关,这看起来很安静,但不知何故您的客户端无法访问您的集群,因此查询超时。你确定集群在 9201 端口上运行吗?
  • @Val 是的,它在端口 9201 上运行。
  • 所以你可以 curl 任何其他查询并且它可以工作,只是不是那个?
  • 那么您确定您的高级 REST 客户端配置正确吗?客户端确实无法访问集群,连接超时。

标签: elasticsearch resthighlevelclient


【解决方案1】:

通过调用 hot_threads api

 curl 'http://localhost:9201/_nodes/hot_threads?pretty'

得到以下信息:

   100.9% (504.4ms out of 500ms) cpu usage by thread 'elasticsearch[node-1][get][T#6]'
     8/10 snapshots sharing following 33 elements
       app//org.apache.lucene.index.SingletonSortedNumericDocValues.nextDoc(SingletonSortedNumericDocValues.java:53)
       app//org.apache.lucene.codecs.lucene80.IndexedDISI.writeBitSet(IndexedDISI.java:196)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.writeValues(Lucene80DocValuesConsumer.java:214)
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesConsumer.addNumericField(Lucene80DocValuesConsumer.java:111)
       app//org.apache.lucene.codecs.perfield.PerFieldDocValuesFormat$FieldsWriter.addNumericField(PerFieldDocValuesFormat.java:109)
       app//org.apache.lucene.index.ReadersAndUpdates.handleDVUpdates(ReadersAndUpdates.java:368)
       app//org.apache.lucene.index.ReadersAndUpdates.writeFieldUpdates(ReadersAndUpdates.java:570)
       app//org.apache.lucene.index.ReaderPool.writeAllDocValuesUpdates(ReaderPool.java:228)
       app//org.apache.lucene.index.IndexWriter.writeReaderPool(IndexWriter.java:3308)
       app//org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:520)
       app//org.apache.lucene.index.StandardDirectoryReader.doOpenFromWriter(StandardDirectoryReader.java:294)
       app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:269)
       app//org.apache.lucene.index.StandardDirectoryReader.doOpenIfChanged(StandardDirectoryReader.java:259)
       app//org.apache.lucene.index.FilterDirectoryReader.doOpenIfChanged(FilterDirectoryReader.java:112)
       app//org.apache.lucene.index.DirectoryReader.openIfChanged(DirectoryReader.java:140)
       app//org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:156)
       app//org.apache.lucene.search.SearcherManager.refreshIfNeeded(SearcherManager.java:58)
       app//org.apache.lucene.search.ReferenceManager.doMaybeRefresh(ReferenceManager.java:176)
       app//org.apache.lucene.search.ReferenceManager.maybeRefreshBlocking(ReferenceManager.java:253)
       app//org.elasticsearch.index.engine.InternalEngine.refresh(InternalEngine.java:1548)
       app//org.elasticsearch.index.engine.InternalEngine.get(InternalEngine.java:652)
       app//org.elasticsearch.index.shard.IndexShard.get(IndexShard.java:916)
       app//org.elasticsearch.index.get.ShardGetService.innerGet(ShardGetService.java:169)
       app//org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:93)
       app//org.elasticsearch.index.get.ShardGetService.get(ShardGetService.java:84)
       app//org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:106)
       app//org.elasticsearch.action.get.TransportGetAction.shardOperation(TransportGetAction.java:45)

好像在通过id查询人时,内部自动执行刷新,从官方文档中得到

默认情况下,get API 是实时的,不受索引刷新率的影响(当数据对搜索可见时)。如果文档已更新但尚未刷新,get API 将就地发出刷新调用以使文档可见。这也将使自上次刷新后更改的其他文档可见。为了禁用实时 GET,可以将实时参数设置为 false。

注意:每次访问人员详情页面,都会更新此人的viewCount。

所以我明确禁用了实时

        GetRequest getRequest = new GetRequest(personIndex, id.toString());
        getRequest.realtime(false);

这样做后,超时问题就解决了。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2016-10-20
    • 2020-11-08
    • 1970-01-01
    • 1970-01-01
    • 2012-02-16
    • 2011-09-28
    相关资源
    最近更新 更多