Cassandra 对具有不同分区键的表的批量查询性能答案

【问题标题】：Cassandra batch query performance on tables having different partition keysCassandra 对具有不同分区键的表的批量查询性能
【发布时间】：2017-08-13 06:50:41
【问题描述】：

我有一个测试用例，我每秒收到来自客户端的 150k 个请求。

我的测试用例需要将UNLOGGED batch 插入到多个表中并具有不同的分区键

BEGIN UNLOGGED  BATCH
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Country' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('US')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='City' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Dallas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='State' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Texas')
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='SSN' and ptype='text' and date='2017-03-20' and pvalue=decimalAsBlob(000000000);
update kspace.count_table set counter=counter+1 where source_id= 1 and name='source_name' and pname='Gender' and ptype='text' and date='2017-03-20' and pvalue=textAsBlob('Female')
APPLY BATCH

有没有比我目前遵循的更好的方法？

因为目前，我正在批量插入可能存在于不同集群中的多个表，因为它们具有不同的分区键，并且据我所知，将批量查询插入到具有不同分区键的不同表中需要额外的权衡。

【问题讨论】：

标签： java database cassandra datastax

【解决方案1】：

首先，了解批处理的用例很重要。

批次经常被错误地用于优化性能。

批处理用于维护多个表之间的数据一致性。如果需要原子性，则使用记录的批处理。如果在您的情况下，这是一个计数器表，并且表之间的计数不需要一致，则不要使用批处理。如果集群没问题，Cassandra 会确保所有写入都成功。

未记录的批次需要协调器来管理插入，这会给协调器节点带来沉重的负担。如果其他节点拥有partition key，则coordinator节点需要处理一个网络跃点，导致传递效率低下。对同一分区键进行更新时使用未记录的批次。

请关注以下文章：

https://docs.datastax.com/en/cql/3.1/cql/cql_using/useBatch.html

https://medium.com/@foundev/cassandra-batch-loading-without-the-batch-keyword-40f00e35e23e#.npmx2cnsq

【讨论】：