ElasticSearch BulkShardRequest 由于 org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor 而失败答案

【问题标题】：ElasticSearch BulkShardRequest failed due to org.elasticsearch.common.util.concurrent.EsThreadPoolExecutorElasticSearch BulkShardRequest 由于 org.elasticsearch.common.util.concurrent.EsThreadPoolExecutor 而失败
【发布时间】：2021-02-20 00:14:03
【问题描述】：

我正在将日志从我的反应式弹簧应用程序存储到弹性搜索中。我在弹性搜索中收到以下错误：

Elasticsearch 异常 [type=es_rejected_execution_exception, reason=rejected execution of processing of [129010665][indices:data/write/bulk[s][p]]: request: BulkShardRequest [[logs-dev-2020.11.05][ 1]] 包含 [索引 {[logs-dev-2020.11.05][_doc][0d1478f0-6367-4228-9553-7d16d2993bc2]，来源[n/a，实际长度：[4.1kb]，最大长度：2kb] }] 和刷新，目标分配 id：WwkZtUbPSAapC3C-Jg2z2g，主要术语：EsThreadPoolExecutor[name = 10-110-23-125-common-elasticsearch-apps-dev-v1/write，队列容量 = 200，组织上的 1。 elasticsearch.common.util.concurrent.EsThreadPoolExecutor@6599247a[正在运行，池大小 = 2，活动线程 = 2，排队任务 = 221，已完成任务 = 689547]]]

我的索引设置：

{
        "logs-dev-2020.11.05": {
        "settings": {
            "index": {
                "highlight": {
                    "max_analyzed_offset": "5000000"
                },
                "number_of_shards": "3",
                "provided_name": "logs-dev-2020.11.05",
                "creation_date": "1604558592095",
                "number_of_replicas": "2",
                "uuid": "wjIOSfZOSLyBFTt1cT-whQ",
                "version": {
                "created": "7020199"
                }
            }
        }
    }
}

我浏览过这个网站：

https://www.elastic.co/blog/why-am-i-seeing-bulk-rejections-in-my-elasticsearch-cluster

我认为在线程池中调整“写入”大小会解决，但在下面的网站中提到不推荐：

因此，强烈建议不要调整队列大小，因为这就像在问题上贴上临时创可贴，而不是真正解决根本问题。

那么我们还能做些什么来改善这种情况呢？

其他信息：

弹性搜索版本 7.2.1
集群运行状况良好，它们是集群中的 3 个节点
每天都会创建索引，每个索引有 3 个分片

【问题讨论】：

标签： elasticsearch elasticsearch-performance

【解决方案1】：

虽然您是对的，但增加 thread_pool 大小并不是永久解决方案，但您会很高兴知道 elasticsearch 本身在一个小版本中将写入 thread_pool（在您的批量请求中使用）的大小从 200 增加到 10k升级。请看size of 200 in ES 7.8，而10k of ES 7.9。

如果你使用的是 ES 7.X 版本，那么你也可以将大小增加到如果不是 10k，那么至少 1k（以避免拒绝请求）。

如果你想要一个适当的修复，你需要做以下事情

找出它是一致的还是只是一些短期突发的写入请求，而在一段时间内会被清除。
如果一致，则需要判断写优化是否全部到位，请参考my short-tips to improve index speed。
查看您是否已达到数据节点的全部容量，如果是，请扩展您的集群以处理增加的/合法的负载。

【讨论】：

我的意思是问我们是否需要增加线程池写入大小，在我的情况下为 2。我们是否只需要增加我的情况下为 200 的 queue_size。
@AbdulBasith，您需要增加queue_size，如果您的应用程序中有更多内核并且CPU不是瓶颈，您可以增加固定为2的处理器（aks索引线程），
让我增加尝试忍者
@AbdulBasith 很棒 :)，很高兴听到 :)