【问题标题】:Stuck at compaction卡在压实
【发布时间】:2016-08-26 08:45:24
【问题描述】:

我在 Cassandra 的自动压缩方面遇到了一个奇怪的问题。我在 Debian 8 系统上使用 cassandra 3.7,在将大约 70GB 的数据推送到 cassandra 节点(它是一个 RF=1 用于测试目的的单个节点)之后,运行 nodetool compactionstats 从命令行我得到:

root@cassandra01:~# nodetool compactionstats
pending tasks: 280
- system.batches: 280

并且不显示其他信息。检查 system.log 我明白了:

ERROR [CompactionExecutor:74] 2016-08-23 19:41:30,006 CassandraDaemon.java:217 - Exception in thread Thread[CompactionExecutor:74,1,main]
java.lang.AssertionError: null
        at org.apache.cassandra.io.compress.CompressionMetadata$Chunk.<init>(CompressionMetadata.java:475) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.compress.CompressionMetadata.chunkFor(CompressionMetadata.java:240) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.MmappedRegions.updateState(MmappedRegions.java:158) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:73) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.MmappedRegions.<init>(MmappedRegions.java:61) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.MmappedRegions.map(MmappedRegions.java:99) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.CompressedSegmentedFile.<init>(CompressedSegmentedFile.java:44) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.CompressedSegmentedFile$Builder.complete(CompressedSegmentedFile.java:135) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.SegmentedFile$Builder.complete(SegmentedFile.java:181) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.util.SegmentedFile$Builder.buildData(SegmentedFile.java:192) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.sstable.format.big.BigTableWriter.openEarly(BigTableWriter.java:271) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.sstable.SSTableRewriter.maybeReopenEarly(SSTableRewriter.java:182) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.io.sstable.SSTableRewriter.append(SSTableRewriter.java:134) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.db.compaction.writers.DefaultCompactionWriter.realAppend(DefaultCompactionWriter.java:65) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.db.compaction.writers.CompactionAwareWriter.append(CompactionAwareWriter.java:141) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:187) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:82) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60) ~[apache-cassandra-3.7.jar:3.7]
        at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionCandidate.run(CompactionManager.java:264) ~[apache-cassandra-3.7.jar:3.7]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_101]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_101]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_101]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
ERROR [Reference-Reaper:1] 2016-08-23 19:42:05,511 Ref.java:203 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@57d68945) to class org.apache.cassandra.io.util.SegmentedFile$Cleanup@831676520:/cassandra/disk1/system/batches-919a4bc57a333573b03e13fc3f68b465/mb-44056-big-Index.db was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2016-08-23 19:42:05,511 Ref.java:203 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@39229a12) to class org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1862266673:Memory@[7fb261a66020..7fb261a69220) was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2016-08-23 19:42:05,511 Ref.java:203 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@d80df0a) to class org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$Tidy@791904242:[Memory@[0..188), Memory@[0..f50)] was not released before the reference was garbage collected
ERROR [Reference-Reaper:1] 2016-08-23 19:42:05,523 Ref.java:203 - LEAK DETECTED: a reference (org.apache.cassandra.utils.concurrent.Ref$State@26c920c) to class org.apache.cassandra.io.util.MmappedRegions$Tidier@326857495:/cassandra/disk1/system/batches-919a4bc57a333573b03e13fc3f68b465/mb-44056-big-Data.db was not released before the reference was garbage collected

在我看来,cassandra 挂在 CompressionMetadata 类中的某个位置并出现异常,然后 Reference-Reaper 发现由于内存泄漏到未捕获的异常。但是,问题永远不会消失,因为这些错误每隔 30 秒左右就会在日志中显示一次。

以前有人见过那个东西吗?

谢谢。

【问题讨论】:

    标签: cassandra


    【解决方案1】:

    我以前没有看到过这个错误,但听起来可能是您将数据推送到单个节点的速度太快了。看起来它似乎能够处理写入负载,但可能过了一段时间它无法跟上压缩并继续创建更多压缩作业,直到溢出。可能您现在有一些损坏的 SSTable。

    尝试以较慢的速度推送数据并在推送期间监控压缩状态,以确保压缩作业能够及时完成并且不会累积。如果不能降低写入速度,那么可能需要更多节点来分担负载。

    我见过的最接近的错误是在推送大量数据后,compactionstats 中显示了几个排队的压缩作业,但它们都不会启动,它们只是坐在那里。我可以通过滚动重启集群中的所有节点来清除这一点。

    【讨论】:

    • 感谢 Jim,在写的过程中,我看到了普通表的压缩创建并成功完成。我也看到了 system.batches 压缩,但它们通常会完成。在对系统施加压力一段时间后,这些 system.batches 开始积累(其他表也是)。当节点空闲时,其他压缩完成,但这些 280 没有。我不得不擦除节点并再次推送数据。如果我以较慢的速度推送数据一切都很好,但是我通常会暂停当我超时时写(例如节点处于压力下),所以我总是以“最大”速度写。
    猜你喜欢
    • 2016-01-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2018-12-12
    • 2019-07-17
    • 2015-03-29
    • 2011-12-03
    • 1970-01-01
    相关资源
    最近更新 更多