如何理解 ClickHouse 中的粒度和块？答案

【问题标题】：How understand the granularity and block in ClickHouse?如何理解 ClickHouse 中的粒度和块？
【发布时间】：2020-05-31 23:55:09
【问题描述】：

这两个词我不太清楚。一个块是否有固定的行数？一个块是否是从磁盘读取的最小单位？不同的块是否存储在不同的文件中？一个块的范围是否大于颗粒？这意味着，一个块可以有多个颗粒跳过索引。

【问题讨论】：

标签： clickhouse

【解决方案1】：

https://clickhouse.tech/docs/en/operations/table_engines/mergetree/#primary-keys-and-indexes-in-queries

主键是稀疏的。默认情况下，每 8192 行包含 1 个值（= 1 个颗粒）。

让我们禁用自适应粒度（用于测试）-- index_granularity_bytes=0

create table X (A Int64) 
Engine=MergeTree order by A 
settings index_granularity=16,index_granularity_bytes=0;

insert into X select * from numbers(32);

index_granularity=16 -- 32 行 = 2 个粒度，主索引有 2 个值 0 和 16

select marks, primary_key_bytes_in_memory from system.parts where table = 'X';
┌─marks─┬─primary_key_bytes_in_memory─┐
│     2 │                          16 │
└───────┴─────────────────────────────┘

16 字节 === INT64 的 2 个值。

自适应索引粒度意味着粒度大小不同。因为宽行（很多字节）需要（为了性能）更少的（

index_granularity_bytes = 10MB ~ 1k 行 * 8129。所以每个颗粒有 10MB。如果行大小为 100k（长字符串），则颗粒将有 100 行（不是 8192 行）。

跳过索引粒度 GRANULARITY 3 -- 表示索引将为每 3 个表粒度存储一个值。

create table X (A Int64, B Int64, INDEX IX1 (B) TYPE minmax GRANULARITY 4) 
Engine=MergeTree order by A 
settings index_granularity=16,index_granularity_bytes=0;

insert into X select number, number from numbers(128);

128/16 = 8，表有 8 个颗粒，INDEX IX1 存储 2 个 minmax (8/4) 值

所以 minmax 索引存储 2 个值 - (0..63) 和 (64..128)

0..63 -- 指向前 4 个表的粒度。

64..128 -- 指向第二个 4 表的颗粒。

set send_logs_level='debug'
select * from X where B=77
[ 84 ] <Debug> dw.X (SelectExecutor): **Index `IX1` has dropped 1 granules**
[ 84 ] <Debug> dw.X (SelectExecutor): Selected 1 parts by date, 1 parts by key, **4 marks** to read from 1 ranges

SelectExecutor 检查跳过索引 - 可以跳过 4 个表颗粒，因为 77 不在 0..63 中。并且必须读取另外 4 个颗粒（4 个标记），因为 77 in (64..128) - 其中一些 4 个颗粒的 B=77。

【讨论】：

非常感谢。你的每一个答案都是那么充实。
我想知道 mark_bytes 是什么意思。第一个示例中的 mark_bytes 为 32，比 primary_key_bytes_in_memory 大 16 个字节。额外的 16 个字节是列文件中的偏移量吗？
@gogo 一栏 A. 主索引点标记文件。标记包含 2 个指向行位置的指针，第一个是压缩 .bin 文件中的偏移量，第二个是解压缩文件中的偏移量。
得到它。从磁盘读取意味着从磁盘读取压缩块还是从磁盘读取颗粒？
如果读取操作每次从磁盘读取一个颗粒，在你的最后一个例子中，我认为它会首先读取B列的跳过索引，然后读取B.bin的最后4个颗粒到找到 77 的行数，此时我们已经得到了 77 的行数（即 77）和粒数（即 4）。最后，它会使用粒数 num 来读取粒数 4 的全部数据A.bin 并从颗粒中取出 num 77 的数据。我说的对吗？

【解决方案2】：

https://clickhouse.tech/docs/en/development/architecture/#block

块可以包含任意数量的行。例如 1 行块：

set max_block_size=1;
SELECT * FROM numbers_mt(1000000000) LIMIT 3;

┌─number─┐
│      0 │
└────────┘
┌─number─┐
│      2 │
└────────┘
┌─number─┐
│      3 │
└────────┘

set max_block_size=100000000000;

create table X (A Int64) Engine=Memory;
insert into X values(1);
insert into X values(2);
insert into X values(3);
SELECT * FROM X;

┌─A─┐
│ 1 │
└───┘
┌─A─┐
│ 3 │
└───┘
┌─A─┐
│ 2 │
└───┘

块中的 3 行

drop table X;
create table X (A Int64) Engine=Memory;
insert into X values(1)(2)(3);
select * from X
┌─A─┐
│ 1 │
│ 2 │
│ 3 │
└───┘

【讨论】：