nodetool cfstats 压缩分区最大字节数答案

【问题标题】：nodetool cfstats Compacted partition maximum bytesnodetool cfstats 压缩分区最大字节数
【发布时间】：2018-11-24 23:32:04
【问题描述】：

我担心“压缩分区最大字节数”的值，因为 89MB 看起来相当高。

这是否表示模型损坏或其他问题？应用端没有观察到问题。

使用 week_first_day, device_id 分区键将存储到表中的数据打包到每个设备的每周存储桶中。

表格的数据模型：

CREATE TABLE device_data (
week_first_day timestamp,
device_id uuid,
nano_since_epoch bigint,
sensor_id uuid,
source text,
unit text,
username text,
value double,
PRIMARY KEY ((week_first_day, device_id), nano_since_epoch, sensor_id)
)

nodetool cfstats

Table: device_data
            SSTable count: 5
            Space used (live): 447558297
            Space used (total): 447558297
            Space used by snapshots (total): 0
            Off heap memory used (total): 211264
            SSTable Compression Ratio: 0.2610509614736755
            Number of partitions (estimate): 939
            Memtable cell count: 458
            Memtable data size: 63785
            Memtable off heap memory used: 0
            Memtable switch count: 0
            Local read count: 0
            Local read latency: NaN ms
            Local write count: 458
            Local write latency: 0.058 ms
            Pending flushes: 0
            Percent repaired: 99.83
            Bloom filter false positives: 0
            Bloom filter false ratio: 0.00000
            Bloom filter space used: 2216
            Bloom filter off heap memory used: 2176
            Index summary off heap memory used: 672
            Compression metadata off heap memory used: 208416
            Compacted partition minimum bytes: 43
            Compacted partition maximum bytes: 89970660
            Compacted partition mean bytes: 1100241
            Average live cells per slice (last five minutes): NaN
            Maximum live cells per slice (last five minutes): 0
            Average tombstones per slice (last five minutes): NaN
            Maximum tombstones per slice (last five minutes): 0
            Dropped Mutations: 0

【问题讨论】：

标签： cassandra

【解决方案1】：

这实际上取决于该分区中数据的访问模式 - 如果您经常读取整个分区，那么这可能会导致问题，但如果您只读取其中的一部分，那么它不应该是一个问题。例如，您可以通过将天用作存储桶来分解分区。

请查看 2 年前 Cassandra 峰会上的演讲 Myths of Big Partitions - 它有更多关于 Cassandra 3.x 处理方式的详细信息。

【讨论】：

通常按时间段检索数据。大多数情况下是当天。本月较少。我记得最大分区大小的幻数是 100MB，这就是我问的原因。此外，我想知道为什么压缩分区的平均和最大大小分布得那么宽（89MB 对 1MB）。为什么会这样？
如果大部分访问都是在一天内发生的，我建议他们将一天用作存储桶，如果您需要几天的数据，您可以并行发出多个请求...
关于分区大小 - 可能是某些设备生成的数据比其他设备多得多，因此它们获得了更大的分区...
通常设备有大约 15 个传感器。每个都每分钟发送一次数据。这意味着每周存储桶中每台设备大约有 150.000 个点。在我评估设计时，这听起来并不大。我将继续监控大小并可能切换到日存储桶。
username 在特定设备的所有测量中是否相同？如果是，那么您可以将其声明为静态，并且对于分区中的所有行都是相同的...