如何防止 Elasticsearch 使用这么多内存？答案

【问题标题】：How can I prevent Elasticsearch from using so much memory?如何防止 Elasticsearch 使用这么多内存？
【发布时间】：2015-10-15 20:49:54
【问题描述】：

我正在尝试使用 Kibana 来可视化我已摄取到 Elasticsearch 中的一些 Bro 记录。我已经加载了大约 1 个月的记录（总共大约 30 亿条记录，大约 4TB）。数据被毫无问题地摄取和索引。我可以在 Kibana 中构建一些简单的可视化，但是当我尝试加载我创建的仪表板（包括 12 种不同的可视化，并启动至少那么多 Elasticsearch 查询）时，我开始收到错误。

我正在运行一个包含 5 个数据节点的 7 节点 Elasticsearch 集群：

host001 192.168.1.1 18  8  0.00 - * Feron  
host002 192.168.1.2 15  8  0.00 - - Dark Phoenix    
host003 192.168.1.3 58 21  0.25 d - Starbolt          
host004 192.168.1.4 37 23  0.07 d - Niles Van Roekel  
host005 192.168.1.5 47 29  0.10 d - Angel Salvadore    
host006 192.168.1.6 68 29 16.37 d - Candra            
host007 192.168.1.7 56 29 14.36 d - Algrim the Strong

elasticsearch.log 错误的亮点如下：

针对不同字段的这些行的集合（当为字段数据使用过多内存时会触发断路器，我认为这是我的问题的核心）：

[2015-10-06 08:24:00,265][WARN][indices.breaker] [Eric Slaughter] [FIELDDATA] 来自字段 [AA] 的新使用内存 3752926600 [3.4gb] 将大于配置的断路器： 3745107148 [3.4gb]，打破

这些的集合（似乎即使有断路器，Elasticsearch 无论如何都会耗尽内存）：

[2015-10-06 08:32:06,279][WARN][netty.channel.socket.nio.AbstractNioSelector] 选择器循环中出现意外异常。 org.elasticsearch.index.engine.CreateFailedEngineException：[bro-2015-10-06][2] [dns#AVA9HeN5uS-hcepf0HbN] 创建失败在 org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:262) 在 org.elasticsearch.index.shard.IndexShard.create(IndexShard.java:470) 在 org.elasticsearch.action.bulk.TransportShardBulkAction.shardIndexOperation（TransportShardBulkAction.java:437）在 org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:149) 在 org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:515) 在 org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:422) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 在 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 在 java.lang.Thread.run(Thread.java:744) 原因：org.apache.lucene.store.AlreadyClosedException：拒绝删除任何文件：此 IndexWriter 遇到不可恢复的异常在 org.apache.lucene.index.IndexFileDeleter.ensureOpen(IndexFileDeleter.java:354) 在 org.apache.lucene.index.IndexFileDeleter.deleteFile(IndexFileDeleter.java:719) 在 org.apache.lucene.index.IndexFileDeleter.deleteNewFiles(IndexFileDeleter.java:712) 在 org.apache.lucene.index.IndexWriter.deleteNewFiles(IndexWriter.java:4821) 在 org.apache.lucene.index.DocumentsWriter$DeleteNewFilesEvent.process(DocumentsWriter.java:749) 在 org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4875) 在 org.apache.lucene.index.IndexWriter.processEvents(IndexWriter.java:4867) 在 org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1527) 在 org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1252) 在 org.elasticsearch.index.engine.InternalEngine.innerCreateNoLock(InternalEngine.java:343) 在 org.elasticsearch.index.engine.InternalEngine.innerCreate(InternalEngine.java:285) 在 org.elasticsearch.index.engine.InternalEngine.create(InternalEngine.java:256) ... 8 更多引起：java.lang.OutOfMemoryError: Java heap space

然后是下面的一堆，我认为这是试图在另一个节点上建立一个副本分片（这将导致那个节点失败，并开始连锁反应......我已经摆脱了这个错误通过消除副本分片，但我更喜欢更好的解决方案）

[2015-10-06 08:38:35,707][WARN][action.bulk] [Eric Slaughter] 无法执行索引：远程副本 [Tower][KxzEXAXKTCazjLzgOJE_aA][KxzEXAXKTCazjLzgOJE_aA][ host005][inet[/192.168.1.5:9300]]{master=false}[bro-2015-10-06][8] org.elasticsearch.transport.NodeDisconnectedException: [Tower][inet[/192.168.1.5:9300]][indices:data/write/bulk[s][r]] 断开连接

我知道解决这个问题的一种方法是水平扩展，但我没有这样的奢侈，我希望能够正确利用我拥有的集群（特别是因为我只使用 0.5 TB 数据，还有更多可用数据）。

我还研究了一些其他选项，可以在下面的映射中看到。数据的“doc_values”格式应该将字段数据加载到磁盘上，但它并不能完全消除这个问题。可能有其他东西占用了所有内存，或者元字段（_type、_id 等）应该受到指责（因为我还没有找到一种方法来配置那些带有“doc_values”的字段）。我还在字符串字段中使用全局序号。

如果有人需要有关我的集群或配置的更多信息，请告诉我！我真的很难过，所以提前感谢你们可以提供的任何帮助。

这是我使用的映射模板：http://pastebin.com/S8UVKRxZ

这是我的 elasticsearch.yml 配置：http://pastebin.com/PaG0pBC5

【问题讨论】：

标签： elasticsearch kibana

【解决方案1】：

每个索引有多少条记录？如果每个索引的记录量为数十亿，您可能需要拆分索引。

^^ 我希望这是一个评论，但由于我的声誉低，我无法评论你的问题。

来自 ElasticSearch 的文档： limiting_memory_usage

您可能会惊讶地发现 Elasticsearch 没有加载到 fielddata 只是与您的查询匹配的文档的值。它加载索引中所有文档的值，甚至包含不同的_type！

【讨论】：

感谢您的回复！目前，所有记录都存储在一个索引中。我当然可以尝试将它们分成更小的部分，但出于好奇，这有什么帮助？我认为更多的索引会导致更多的内存使用。
请看我更新的答案。也许查询不需要来自一个大索引的所有数据，而只需要一些索引的数据，这些数据小于您存储的所有数据的总和。
这确实解决了我的内存问题，并且我在加载仪表板时不再遇到内存不足错误！谢谢。