【问题标题】:Datastax Solr nodes: Nodetool repair stuckDatastax Solr 节点:Nodetool 修复卡住
【发布时间】:2014-11-26 07:56:53
【问题描述】:

我们在 CentO 上有两个 DatastaxEnterprise Solr 集群(4.5 版)数据中心(欧洲 DC1,北美 DC2):

DC1: 2 nodes with rf set to 2
DC2: 1 nodes with rf set to 1

每个节点都有 2 个内核和 4GB 的 RAM。 我们只创建了一个keyspace,DC1的2个节点各有400MB的数据,而DC2的节点是空的。

如果我在 DC2 中的节点上启动 nodetool 修复,该命令可以正常运行大约 20/30 分钟,然后停止工作,仍然卡住。

在 DC2 节点的日志中,我可以看到:

WARN [NonPeriodicTasks:1] 2014-10-01 05:57:44,188 WorkPool.java (line 398) Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
ERROR [NonPeriodicTasks:1] 2014-10-01 05:57:44,190 CassandraDaemon.java (line 199) Exception in thread Thread[NonPeriodicTasks:1,5,main]
org.apache.solr.common.SolrException: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:351)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.doCommit(AbstractSolrSecondaryIndex.java:994)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.forceBlockingFlush(AbstractSolrSecondaryIndex.java:139)
    at org.apache.cassandra.db.index.SecondaryIndexManager.flushIndexesBlocking(SecondaryIndexManager.java:338)
    at org.apache.cassandra.db.index.SecondaryIndexManager.maybeBuildSecondaryIndexes(SecondaryIndexManager.java:144)
    at org.apache.cassandra.streaming.StreamReceiveTask$OnCompletionRunnable.run(StreamReceiveTask.java:113)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:399)
    at com.datastax.bdp.concurrent.WorkPool.flush(WorkPool.java:339)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.flushIndexUpdates(AbstractSolrSecondaryIndex.java:484)
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:278)
    ... 12 more
 WARN [commitScheduler-3-thread-1] 2014-10-01 05:58:47,351 WorkPool.java (line 398) Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
ERROR [commitScheduler-3-thread-1] 2014-10-01 05:58:47,352 SolrException.java (line 136) auto commit error...:org.apache.solr.common.SolrException: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:351)
    at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216)
    at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
    at java.util.concurrent.FutureTask.run(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(Unknown Source)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.RuntimeException: Timeout while waiting for workers when flushing pool {}. IndexCurrent timeout is Failure to flush may cause excessive growth of Cassandra commit log.
 millis, consider increasing it, or reducing load on the node.
    at com.datastax.bdp.concurrent.WorkPool.doFlush(WorkPool.java:399)
    at com.datastax.bdp.concurrent.WorkPool.flush(WorkPool.java:339)
    at com.datastax.bdp.search.solr.AbstractSolrSecondaryIndex.flushIndexUpdates(AbstractSolrSecondaryIndex.java:484)
    at com.datastax.bdp.search.solr.handler.update.CassandraDirectUpdateHandler.commit(CassandraDirectUpdateHandler.java:278)
    ... 8 more

我尝试在 cassandra.yaml 文件中增加一些超时,但没有成功。 谢谢

【问题讨论】:

标签: solr cassandra datastax repair nodetool


【解决方案1】:

对于 DSE solr 安装,您的节点完全没有指定。

我通常会推荐至少 8 个内核和至少 64 Gb 的内存。 将堆分配到 12-14 Gb。

下面的故障排除指南很不错:

https://support.datastax.com/entries/38367716-Solr-Configuration-Best-Practices-and-Troubleshooting-Tips

您当前的数据负载很小,因此您可能不需要完整的内存 - 我猜这里的瓶颈是 CPU。

如果您运行的不是 4.0.4 或 4.5.2,我会使用其中一个版本。

【讨论】:

    【解决方案2】:

    两个可能有帮助的项目:

    1. 您在日志中看到的RuntimeException 沿着将索引更改提交到磁盘的 Lucene 代码路径,所以我肯定会确定写入磁盘是否是您的瓶颈。 (您是否为数据和提交日志使用不同的物理磁盘?)

    2. 您可能同时想要调整的参数是控制WorkPool 刷新超时的参数dse.yaml 称为flush_max_time_per_core

    【讨论】:

      【解决方案3】:

      减少 solr 索引争用的一种方法是增加 solrconfig.xml 中的 autoSoftCommit maxTime

      <autoSoftCommit>
         <maxTime>1000000</maxTime>
      </autoSoftCommit>
      

      【讨论】:

        猜你喜欢
        • 2017-02-19
        • 2018-11-21
        • 2021-06-18
        • 1970-01-01
        • 1970-01-01
        • 2022-01-17
        • 1970-01-01
        • 2021-05-01
        • 1970-01-01
        相关资源
        最近更新 更多