为 upsert 和 select 查询建模 cassandra 表答案

【问题标题】：modelling cassandra tables for upsert and select query为 upsert 和 select 查询建模 cassandra 表
【发布时间】：2015-10-31 18:13:09
【问题描述】：

我设计了下表来存储服务器警报：

create table IF NOT EXISTS host_alerts(
    unique_key text,
    host_id text,
    occur_time timestamp,
    clear_time timestamp,
    last_occur timestamp,
    alarm_name text,
    primary key (unique_key,host_id,clear_time)
);

让我们输入一些数据：

truncate host_alerts;

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:01:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'1970-01-01 00:00:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

insert into host_alerts(unique_key,host_id,alarm_name,
    clear_time,occur_time,last_occur
) 
values('1','server-1','disk failure',
'2015-07-01 00:02:00+0530','2015-07-01 00:00:00+0530','2015-07-01 00:02:00+0530');

我的应用程序将运行的查询是：

//All alarms which are **not cleared** for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time = '1970-01-01 00:00:00+0530';

//All alarms which are  cleared for host_id
select * from host_alerts where  host_id = 'server-1' and clear_time > '2015-07-01 00:00:00+0530';

//All alarms between first occurrence
select * from host_alerts where  host_id = 'server-1' 
and occur_time > '2015-07-01 00:02:00+0530'and occur_time < '2015-07-01 00:05:00+0530';

我不知道我是否应该准备更多的表格示例：host_alerts_by_hostname 或者 host_alerts_by_cleartime 等等或者干脆添加聚簇索引。 由于唯一 id 是唯一唯一的列，但我需要从其他列检索数据

未清除警报： '1970-01-01 00:00:00+0530' 清除事件有一些日期值。

host_id 是服务器名称

occur_time是事件发生的时间。

last_occur是事件再次发生的时间。

alarm_name是系统发生了什么。

我如何对我的表进行建模，以便我可以根据 unique_id 执行这些查询和更新？ 我尝试过的选择是不可能的，并且在 upsert 期间会为相同的 unique_key 创建新行。

【问题讨论】：

标签： cassandra cassandra-2.0

【解决方案1】：

我认为您可能需要三个表来支持您的三种查询类型。

第一个表将支持有关每个主机何时发生警报的历史记录的时间范围查询：

CREATE TABLE IF NOT EXISTS host_alerts_history (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, occur_time)
);

SELECT * FROM host_alerts_history WHERE host_id = 'server-1' AND occur_time > '2015-08-16 10:05:37-0400';

第二个表将跟踪每个主机的未清除警报：

CREATE TABLE IF NOT EXISTS host_uncleared_alarms (
    host_id text,
    occur_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, alarm_name)
);

SELECT * FROM host_uncleared_alarms WHERE host_id = 'server-1';

最后一个表会记录每个主机何时清除警报：

CREATE TABLE IF NOT EXISTS host_alerts_by_cleartime (
    host_id text,
    clear_time timestamp,
    alarm_name text,
    PRIMARY KEY (host_id, clear_time)
);

SELECT * FROM host_alerts_by_cleartime WHERE host_id = 'server-1' AND clear_time > '2015-08-16 10:05:37-0400';

当一个新的警报事件到来时，你会执行这个批处理：

BEGIN BATCH
INSERT INTO host_alerts_history (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
INSERT INTO host_uncleared_alarms (host_id, occur_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

请注意，插入未清除表是一个 upsert，因为时间戳不是键的一部分。因此，该表对于每个警报名称只有一个条目，并带有最后一次出现的时间戳。

当警报清除事件到达时，您将执行此批处理：

BEGIN BATCH
DELETE FROM host_uncleared_alarms WHERE host_id = 'server-1' AND alarm_name = 'disk full';
INSERT INTO host_alerts_by_cleartime (host_id, clear_time, alarm_name) VALUES ('server-1', dateof(now()), 'disk full');
APPLY BATCH;

我真的不明白你的“unique_key”是什么或它来自哪里。我不确定是否需要它，因为 host_id 和 alarm_name 的组合应该是您想要使用的粒度级别。在组合中添加另一个唯一键可能会引发许多无与伦比的警报/清除事件。如果 unique_key 是警报 id，则在我的示例中使用它作为键代替警报名称，并将警报名称作为数据列。

为防止您的表随着时间的推移被旧数据填满，您可以使用 TTL 功能在几天后自动删除行。

【讨论】：

感谢非常好的回答 unique_key 是 rdbms 中生成的随机密钥。 cassandra 是否具有在表之间自动复制数据的功能？我每次都需要检查 clear_time 字段，它不会减慢性能吗？另外，第三个我认为你的意思是发生时间？？
如何为每秒 100-1000 个警报执行此操作？
Cassandra 3.0 将支持物化视图以将数据从一个表传播到另一个表，但该版本暂时无法使用。我不明白你每次检查 clear_time 的意思。你要避免在 Cassandra 中先读后写，因为这会大大降低事务吞吐量。
每秒处理 1000 个警报应该没问题。您可以使用 Cassandra 执行异步操作并轻松实现该吞吐量。