使用 cqlsh 将 CSV 导入 Cassandra 时的列顺序答案

【问题标题】：Column order when importing CSV into Cassandra with cqlsh使用 cqlsh 将 CSV 导入 Cassandra 时的列顺序
【发布时间】：2015-12-31 03:35:43
【问题描述】：

我是 Cassandra 的新手。 CQL 似乎忽略了 CREATE TABLE 语句中的列顺序，并首先按主键对列进行排序，然后按字典顺序对其余列进行排序。我知道这就是它们在内部存储的方式，但是从传统数据库的角度来看，无视列顺序并将实现细节泄露给用户是非常令人惊讶的。这在任何地方都有记录吗？

[cqlsh 4.1.1 | Cassandra 2.1.8 | CQL spec 3.1.1 | Thrift protocol 19.39.0]

cqlsh:test> create table test (c int primary key, b text, a int);
cqlsh:test> describe table test;

CREATE TABLE test (
  c int,
  a int,
  b text,
  PRIMARY KEY (c)
)

这使得导入包含您认为使用顺序的列的 CSV 文件变得困难。

cqlsh:test> copy test from stdin;
[Use \. on a line by itself to end input]
[copy] 1,abc,2
Bad Request: line 1:44 no viable alternative at input ',' (... c, b) VALUES ([abc],...)
Aborting import at record #0 (line 1). Previously-inserted values still present.

0 rows imported in 7.982 seconds.

cqlsh:test> copy test from stdin;
[Use \. on a line by itself to end input]
[copy] 1,2,abc
[copy] \.

1 rows imported in 14.911 seconds.

解决方案似乎是在 COPY 语句中指定列（或重新排序 CSV 数据）。

copy test (c, b, a) from stdin;
[Use \. on a line by itself to end input]
[copy] 1,abc,2
[copy] \.

1 rows imported in 5.727 seconds.

【问题讨论】：

标签： cassandra cql cqlsh

【解决方案1】：

您应该指定您希望与之交易的列。永远不要假设 Cassandra 的列顺序，即使您更改 csv 文件以匹配顺序，即使在具有许多列的表上指定确切的列也是更安全的。

Cassandra 使用列顺序和特定存储位置来加快访问数据的速度。

【讨论】：

【解决方案2】：

Cassandra 将其列排序如下：

分区键
集群键
其余列按字母顺序排列。

例如说我创建了下表：

CREATE TABLE products (
product_id text,
account_id text,
avg_rating float,
brand text,
brand_name text
PRIMARY KEY (product_id, account_id)
) WITH CLUSTERING ORDER BY (account_id ASC);

第一列 = product_id（因为它是分区键）
第二列 = account_id（因为它是一个集群键）
其余列按字母顺序排列。

【讨论】：