Cassandra 节点上的高磁盘 I/O（读取）答案

【问题标题】：High disk I/O (read) on Cassandra nodesCassandra 节点上的高磁盘 I/O（读取）
【发布时间】：2021-12-04 20:59:30
【问题描述】：

我们有 3 个节点的 Cassandra 集群。

我们有一个使用键空间的应用程序，该键空间在读取时会在磁盘上创建高负载。这个问题具有累积效应。我们与键空间交互的天数越多，磁盘读取的增长就越多。： hightload read

读取速度高达 > 700 MB/s。然后存储（SAN）开始降级，然后Сassandra集群也降级。

UPD 25.10.2021：“我写的有点不对，通过SAN空间分配给虚拟机，就像普通驱动一样”

唯一有用的是清除键空间。

输出命令“tpstats”和“cfstats”

[cassandra-01 ~]$ nodetool tpstats
Pool Name                         Active   Pending      Completed   Blocked  All time blocked
ReadStage                              1         1     1837888055         0                 0
MiscStage                              0         0              0         0                 0
CompactionExecutor                     0         0        6789640         0                 0
MutationStage                          0         0      870873552         0                 0
MemtableReclaimMemory                  0         0           7402         0                 0
PendingRangeCalculator                 0         0              9         0                 0
GossipStage                            0         0       18939072         0                 0
SecondaryIndexManagement               0         0              0         0                 0
HintsDispatcher                        0         0              3         0                 0
RequestResponseStage                   0         0     1307861786         0                 0
Native-Transport-Requests              0         0     2981687196         0                 0
ReadRepairStage                        0         0         346448         0                 0
CounterMutationStage                   0         0              0         0                 0
MigrationStage                         0         0            168         0                 0
MemtablePostFlush                      0         0           8193         0                 0
PerDiskMemtableFlushWriter_0           0         0           7402         0                 0
ValidationExecutor                     0         0             21         0                 0
Sampler                                0         0          10988         0                 0
MemtableFlushWriter                    0         0           7402         0                 0
InternalResponseStage                  0         0           3404         0                 0
ViewMutationStage                      0         0              0         0                 0
AntiEntropyStage                       0         0             71         0                 0
CacheCleanupExecutor                   0         0              0         0                 0

Message type           Dropped
READ                         7
RANGE_SLICE                  0
_TRACE                       0
HINT                         0
MUTATION                     5
COUNTER_MUTATION             0
BATCH_STORE                  0
BATCH_REMOVE                 0
REQUEST_RESPONSE             0
PAGED_RANGE                  0
READ_REPAIR                  0

[cassandra-01 ~]$ nodetool cfstats box_messages -H
Total number of tables: 73
----------------
Keyspace : box_messages
    Read Count: 48847567
    Read Latency: 0.055540737801741485 ms
    Write Count: 69461300
    Write Latency: 0.010656743870327794 ms
    Pending Flushes: 0
        Table: messages
        SSTable count: 6
        Space used (live): 3.84 GiB
        Space used (total): 3.84 GiB
        Space used by snapshots (total): 0 bytes
        Off heap memory used (total): 10.3 MiB
        SSTable Compression Ratio: 0.23265712113582082
        Number of partitions (estimate): 4156030
        Memtable cell count: 929912
        Memtable data size: 245.04 MiB
        Memtable off heap memory used: 0 bytes
        Memtable switch count: 92
        Local read count: 20511450
        Local read latency: 0.106 ms
        Local write count: 52111294
        Local write latency: 0.013 ms
        Pending flushes: 0
        Percent repaired: 0.0
        Bloom filter false positives: 57318
        Bloom filter false ratio: 0.00841
        Bloom filter space used: 6.56 MiB
        Bloom filter off heap memory used: 6.56 MiB
        Index summary off heap memory used: 1.78 MiB
        Compression metadata off heap memory used: 1.95 MiB
        Compacted partition minimum bytes: 73
        Compacted partition maximum bytes: 17084
        Compacted partition mean bytes: 3287
        Average live cells per slice (last five minutes): 2.0796939751354797
        Maximum live cells per slice (last five minutes): 10
        Average tombstones per slice (last five minutes): 1.1939751354797576
        Maximum tombstones per slice (last five minutes): 2
        Dropped Mutations: 5 bytes

【问题讨论】：

为 Cassandra 使用 SAN 是一种已知的反模式。
Cassandra 有一种称为压缩 (nodetool compactiostats) 的机制，它将读取您已刷新到磁盘的数据并压缩（默认）4 个大小相似的表，以消除不同的行版本并处理文件系统上的文件数量）。这严重受 IO 限制 - 并且会影响所有节点，可能同时影响您的 SAN。
需要注意的是，如果同一个 SAN 设备为所有 3 个节点托管磁盘，它也充当单点故障。
我写的有点不对，通过SAN空间分配给虚拟机，就像普通驱动器一样

标签： cassandra nosql nodetool disk-io

【解决方案1】：

（我无法发表评论，因此无法将其发布为答案）

正如人们提到的那样，SAN 不会是这里最好的套件，人们可以阅读记录在 here 的反模式列表，可能也适用于 OSS C*。

【讨论】：