在 cassandra 上查询会消耗 CPU答案

【问题标题】：Querying on cassandra consumes CPU在 cassandra 上查询会消耗 CPU
【发布时间】：2020-10-04 15:16:18
【问题描述】：

从应用程序代码连接到 Cassandra 并进行查询会消耗 Cassandra 的 CPU。

我的查询类似于 :: select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 in (a1, a2, a3..a100);

我只在集群列上使用in 关键字。但它仍然严重影响CPU。有时 CPU 会达到 100%。

这正常吗？

【问题讨论】：

标签： cassandra cql cassandra-3.0 cassandra-2.0 cassandra-2.1

【解决方案1】：

不，100% 的 CPU 使用率对于查询是不正常的。但坦率地说，使用IN 子句查询 100 个条目都不是。

即使在集群键上使用 IN 也会强制 Cassandra 执行随机读取。 Cassandra 是为顺序阅读而构建的。我不会推荐两位数的 IN 子句条目。

建议：

尽量将返回的行数保持在最低限度。您可能需要将此查询分解为十个或二十个较小的查询。
如果您真的只需要 'a1' 到 'a100'，为什么不尝试将其作为范围查询呢？

select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 >= 'a1' and clustering_key_2 <= 'a100');

通常，查询期间 100% CPU 意味着集群需要更多节点。但是，由于查询受到分区的限制，更多的节点将无济于事。在这种情况下，分区可能会太大，将表重新建模为具有更小的分区会更均匀地分散集群上的负载。

编辑 20200616

还有其他因素会导致查询消耗大量 CPU。

您是否正在查询支持就地写入或大量删除的列？由于忽略了过时和墓碑数据，这两种情况都会使 Cassandra 不得不更加努力地工作。

尝试运行iostat。如果您处于虚拟化/云环境中，您可能会看到“嘈杂的邻居”问题，例如 CPU 窃取和高（磁盘）I/O 等待时间。

【讨论】：

非常感谢亚伦。我试过select fields from table where partition_key = "PARTITION_KEY" and clustering_key_1 = "KEY1" and clustering_key_2 = "KEY2"; 但CPU性能还是一样的。一般来说，在 cassandra 上查询（即使以每分钟 500 次查询等高速率）是否应该消耗 CPU？找不到太多关于读取路径消耗 CPU 的信息。
@ShivaPrasad 该查询撤回了多少数据？ 1千？ 1MB？ 10mb？
@ShivaPrasad 在此表上运行 nodetool tablehistograms：docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/tools/…