【发布时间】:2016-07-19 09:48:24
【问题描述】:
我在一个 Cassandra (2.2.3) 项目中工作,在该项目中我必须存储评论,并可以获取所有附加评论的最小值、最大值、计数和平均值的参考元素。为此,当我插入新评论时,我必须删除并重新插入相应的记录以更新集群键,但为了存储这些键,我使用另一个表,如索引。问题是,在所有这些表的更新过程中,我使用批处理,但如果同时执行另一个更新过程,我可能在排序表中出现重复条目或在键存储索引表中出现无效值。
如何才能在没有并发写入风险的情况下执行批处理?
这是表格结构:
CREATE TABLE IF NOT EXISTS reviews (domain VARCHAR, scenario VARCHAR, refer VARCHAR, type VARCHAR, id VARCHAR, value FLOAT, comment VARCHAR, author VARCHAR, title VARCHAR, date TIMESTAMP, attributes MAP<VARCHAR, VARCHAR>, answer VARCHAR, answer_author VARCHAR, answer_title VARCHAR, answer_date TIMESTAMP, answer_attributes MAP<VARCHAR, VARCHAR>, PRIMARY KEY((domain, scenario, refer, type), id)) WITH CLUSTERING ORDER BY (id DESC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_avg (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value DESC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_min (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value ASC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_max (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value DESC);
CREATE TABLE IF NOT EXISTS reviews_ext_ordering_count (domain VARCHAR, refer VARCHAR, scenario VARCHAR, value INT, type VARCHAR, PRIMARY KEY((domain, scenario, type), value, refer)) WITH CLUSTERING ORDER BY (value ASC);
CREATE TABLE IF NOT EXISTS reviews_ext_index (domain VARCHAR, refer VARCHAR, scenario VARCHAR, count INT, avg FLOAT, min FLOAT, max FLOAT, sum FLOAT, type VARCHAR, PRIMARY KEY((domain, scenario, type), refer)) WITH CLUSTERING ORDER BY (refer ASC);
这里是 CQL 中的事务示例(而不是 PHP)
BEGIN BATCH
DELETE FROM acme_reviews_ext_ordering_avg WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
DELETE FROM acme_reviews_ext_ordering_min WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
DELETE FROM acme_reviews_ext_ordering_max WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
DELETE FROM acme_reviews_ext_ordering_count WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND value = [VALUE] AND refer = '[REFER]';
INSERT INTO acme_reviews_ext_ordering_avg (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
INSERT INTO acme_reviews_ext_ordering_min (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
INSERT INTO acme_reviews_ext_ordering_max (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
INSERT INTO acme_reviews_ext_ordering_count (domain, scenario, type, value, refer) VALUES ('[DOMAIN]', '[SCENARIO]', '[TYPE]', [VALUE], '[REFER]');
UPDATE acme_reviews_ext_index SET min = [MIN], avg = [AVG], max = [MAX], count = [COUNT], sum = [SUM] WHERE domain = '[DOMAIN]' AND scenario = '[SCENARIO]' AND type = '[TYPE]' AND refer = '[REFER]';
APPLY BATCH;
这是一个实际示例(也在 CQL 中):A 和 B 是同时插入评论的两个客户端,在这种情况下为了最小化,我将仅更新平均值:A 插入值 4,因此过去的平均值从 3 变为 3.5(这只是一个示例),B 插入 4.5 的值,平均值变为 3.7 而不是过去的 3,这里是两个批处理语句:
这里答:
BEGIN BATCH
DELETE FROM acme_reviews_ext_ordering_avg WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND value = 3 AND refer = 'post-id-value';
INSERT INTO acme_reviews_ext_ordering_avg (domain, scenario, type, value, refer) VALUES ('foo.bar', 'article', 'generic', 3.5, 'refer-id-value');
UPDATE acme_reviews_ext_index SET avg = 3.5 WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND refer = 'post-id-value';
APPLY BATCH;
这里 B:
BEGIN BATCH
DELETE FROM acme_reviews_ext_ordering_avg WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND value = 3 AND refer = 'post-id-value';
INSERT INTO acme_reviews_ext_ordering_avg (domain, scenario, type, value, refer) VALUES ('foo.bar', 'article', 'generic', 3.7, 'refer-id-value');
UPDATE acme_reviews_ext_index SET avg = 3.7 WHERE domain = 'foo.bar' AND scenario = 'article' AND type = 'generic' AND refer = 'post-id-value';
APPLY BATCH;
在并发写入的常见情况下,A 删除行和 B 不是因为该行已被 A 的批处理删除,而是都插入了导致重复的新行,在索引表中我将只有一个键值, A 或 B 所以副本的键值之一没有被索引。
我认为,当 A 和 B 批次完成时,我在排序表中只有一条记录,所以正确,但索引表中的值错误。
【问题讨论】:
-
您能否在问题中添加一个示例来说明您的问题?删除并重新插入记录以“更新集群键”对我来说听起来很奇怪。 C* 不提供批次隔离。假设您无法通过修改模型来解决问题,您需要在 C* 之外同步您的客户端。
-
当然,我已经添加了从 PHP 转换为 CQL 的批处理,请注意在 PHP 中我使用准备好的语句来传递参数。我看到了批处理的时间戳,它们在这种情况下有用吗?
-
您能否使用最少的表和列集以及带有值而不是占位符的实际查询来证明您的问题?
-
是的,我已经用一个实际例子和关于这个问题的最小解释编辑了这篇文章,所以我希望可以清楚