【问题标题】:Using partition key along with secondary index使用分区键和二级索引
【发布时间】:2015-08-30 22:44:48
【问题描述】:

以下是我需要执行的两个查询。

select * from where dept = 100 and emp_id = 1;

select * from where dept = 100 and name = 'One';

以下哪个选项更好?

选项 1:使用二级索引和分区键。我认为这种方式查询会执行得更快,因为不需要去不同的节点并且只需要在本地搜索索引。

cqlsh:d2> desc table emp_by_dept;

CREATE TABLE d2.emp_by_dept (
    dept int,
    emp_id int,
    name text,
    PRIMARY KEY (dept, emp_id)
) WITH CLUSTERING ORDER BY (emp_id ASC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'min_threshold': '4', 'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32'}
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';
CREATE INDEX emp_by_dept_name_idx ON d2.emp_by_dept (name);

cqlsh:d2> select * from emp_by_dept where dept = 100;

 dept | emp_id | name
------+--------+------
  100 |      1 |  One
  100 |      2 |  Two
  100 |     10 |  Ten

(3 rows)



 activity                                                                                        | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                              Execute CQL3 query | 2015-06-15 17:36:55.860000 | 10.0.2.16 |              0
                       Parsing select * from emp_by_dept where dept = 100; [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 |            202
                                                       Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:36:55.861000 | 10.0.2.16 |            418
                           Executing single-partition query on emp_by_dept [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10525
                                              Acquiring sstable references [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10564
                                               Merging memtable tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10635
                                               Key cache hit for sstable 1 [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10748
                               Seeking to partition beginning in data file [SharedPool-Worker-3] | 2015-06-15 17:36:55.871000 | 10.0.2.16 |          10757
 Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 |          18141
                                Merging data from memtables and 1 sstables [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 |          18166
                                        Read 3 live and 0 tombstoned cells [SharedPool-Worker-3] | 2015-06-15 17:36:55.879000 | 10.0.2.16 |          18335
                                                                                Request complete | 2015-06-15 17:36:55.928174 | 10.0.2.16 |          68174





cqlsh:d2> select * from emp_by_dept where dept = 100 and name = 'One';

 dept | emp_id | name
------+--------+------
  100 |      1 |  One

(1 rows)

Tracing session: c56e70a0-1357-11e5-ab8b-fb5400f1b4af

 activity                                                                                                                                                                                                                                                                                                                | timestamp                  | source    | source_elapsed
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+----------------
                                                                                                                                                                                                                                                                                                      Execute CQL3 query | 2015-06-15 17:42:20.010000 | 10.0.2.16 |              0
                                                                                                                                                                                                                              Parsing select * from emp_by_dept where dept = 100 and name = 'One'; [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 |             12
                                                                                                                                                                                                                                                                               Preparing statement [SharedPool-Worker-1] | 2015-06-15 17:42:20.010000 | 10.0.2.16 |             19
                                                                                                                                                                                                                                                                         Computing ranges to query [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |            881
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |           1144
                                                                                                                                                                                               Submitting range requests on 1 ranges with a concurrency of 1 (0.003515625 rows per range expected) [SharedPool-Worker-1] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |           1238
                                                                                                                                                                                                                                                             Executing indexed scan for [100, 100] [SharedPool-Worker-2] | 2015-06-15 17:42:20.011000 | 10.0.2.16 |           1703
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           1827
 Candidate index mean cardinalities are CompositesIndexOnRegular{columnDefs=[ColumnDefinition{name=name, type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, componentIndex=1, indexName=emp_by_dept_name_idx, indexType=COMPOSITES}]}:1. Scanning with emp_by_dept.emp_by_dept_name_idx. [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           1929
                                                                                                                                                                                                                              Executing single-partition query on emp_by_dept.emp_by_dept_name_idx [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2058
                                                                                                                                                                                                                                                                      Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2087
                                                                                                                                                                                                                                                                       Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2173
                                                                                                                                                                                                                                                                       Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.012000 | 10.0.2.16 |           2352
                                                                                                                                                                                                                                                 Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.012001 | 10.0.2.16 |           2377
                                                                                                                                                                                                                         Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 |           4300
                                                                                                                                                                                                                                                        Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.014000 | 10.0.2.16 |           4322
                                                                                                                                                                                                                                           Submitted 1 concurrent range requests covering 1 ranges [SharedPool-Worker-1] | 2015-06-15 17:42:20.031000 | 10.0.2.16 |          21798
                                                                                                                                                                                                                                                                Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          21989
                                                                                                                                                                                                                                                   Executing single-partition query on emp_by_dept [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22374
                                                                                                                                                                                                                                                                      Acquiring sstable references [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22385
                                                                                                                                                                                                                                                                       Merging memtable tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22433
                                                                                                                                                                                                                                                                       Key cache hit for sstable 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22514
                                                                                                                                                                                                                                                 Seeking to partition indexed section in data file [SharedPool-Worker-2] | 2015-06-15 17:42:20.032000 | 10.0.2.16 |          22523
                                                                                                                                                                                                                         Skipped 0/1 non-slice-intersecting sstables, included 0 due to tombstones [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          22963
                                                                                                                                                                                                                                                        Merging data from memtables and 1 sstables [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          22972
                                                                                                                                                                                                                                                                Read 1 live and 0 tombstoned cells [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          22991
                                                                                                                                                                                                                                                                      Scanned 1 rows and matched 1 [SharedPool-Worker-2] | 2015-06-15 17:42:20.033000 | 10.0.2.16 |          23096
                                                                                                                                                                                                                                                                                                        Request complete | 2015-06-15 17:42:20.033227 | 10.0.2.16 |          23227

选项 2:创建 2 个表,如下所示。

   CREATE TABLE d2.emp_by_dept (
        dept int,
        emp_id int,
        name text,
        PRIMARY KEY (dept, emp_id)
    ) WITH CLUSTERING ORDER BY (emp_id ASC);

select * from emp_by_dept where dept = 100 and emp_id = 1;

    CREATE TABLE d2.emp_by_dept_name (
        dept int,
        emp_id int,
        name text,
        PRIMARY KEY (dept, name)
    ) WITH CLUSTERING ORDER BY (name ASC);

select * from emp_by_dept_name where dept = 100 and name = 'One';

【问题讨论】:

  • 其次,索引在 Cassandra 中无法以这种方式工作
  • 二级索引用于什么目的?
  • @jny,我已经编辑了我的问题。

标签: cassandra bigdata nosql


【解决方案1】:

通常将二级索引与分区键一起使用是一种好方法,因为 - 正如您所说 - 二级键查找可以在单台机器上执行。

另一个需要考虑的概念是二级索引的基数。在您的情况下, emp_id 可能是唯一的,而 name 几乎是唯一的,因此索引很可能会返回一行,因此效率不高。为了更好的解释,我推荐这篇文章:http://www.wentnet.com/blog/?p=77

因此,如果查询时间很关键,并且您可以同时更新两个表,我建议使用您的选项 2。

用一些生成的数据来衡量这两个选项也会很有趣。

【讨论】:

    【解决方案2】:

    选项一是不可能的,因为 Cassandra 不支持同时使用主键和辅助键的查询。您最好的选择是选择选项二。

    虽然相似之处很多,但不要将其视为“关系表”。而是将其视为嵌套的、排序的地图数据结构。 Cassandra 相信数据的反规范化和复制可以提高读取性能。因此,选项 2 是完全正常的,并且在 Cassandra 的最佳实践范围内。

    一些您可能会觉得有用的链接 - http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-1/

    How do secondary indexes work in Cassandra?

    希望这会有所帮助。

    【讨论】:

    • Cassandra 确实支持我的尝试。我实际上尝试过,输出列在我的问题中。我正在使用 apache-cassandra2.1.2。
    【解决方案3】:

    由于维护两个表比维护一个表更难,所以第一个选项会更可取。

    【讨论】:

      【解决方案4】:

      Query1 = select * from where dept = 100 and emp_id = 1;

      Query2 = select * from where dept = 100 and name = 'One';

      选项 1:

      写入:写入 emp_by_dept 的时间 + 更新索引的时间

      读取:Query1 将直接从 emp_by_dept 读取,Query2 将从 emp_by_dept 读取 + 从索引表中获取位置 + 从 emp_by_dept 读取值

      选项 2:

      写入:写入 emp_by_dept 的时间 + 写入 emp_by_dept_name 的时间

      读取:Query1 将直接从 emp_by_dept 读取,Query2 将直接从 emp_by_dept_name 读取(所需数据已排序并保留)

      所以我假设两种情况下的写入时间应该几乎相同(我没有测试过)

      如果您的读取响应时间更重要,请选择 Option2。

      如果您担心维护 2 个表,请选择选项 1。

      感谢大家的投入。

      【讨论】:

        猜你喜欢
        • 2021-03-30
        • 2021-03-14
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2014-12-13
        • 2020-07-18
        • 1970-01-01
        相关资源
        最近更新 更多