【问题标题】:Create materialized view based on aggregate materialized view基于聚合物化视图创建物化视图
【发布时间】:2021-05-04 21:35:01
【问题描述】:

基表

CREATE TABLE IF NOT EXISTS test_sessions
(
    session_id   UInt64,
    session_name String,
    created_at   DateTime
)
ENGINE = MergeTree()
PARTITION BY toYYYYMM(created_at)
ORDER BY (session_id);

有以下数据

INSERT INTO test_sessions (session_id, session_name, created_at) VALUES
(1, 'start', '2021-01-31 00:00:00'),
(1, 'stop', '2021-01-31 01:00:00'),
(2, 'start', '2021-01-31 01:00:00')
;

创建了 2 个物化视图来关闭会话

CREATE MATERIALIZED VIEW IF NOT EXISTS test_session_aggregate_states
(
    session_id UInt64,
    started_at AggregateFunction(minIf, DateTime, UInt8),
    stopped_at AggregateFunction(maxIf, DateTime, UInt8)
)
ENGINE = AggregatingMergeTree
PARTITION BY tuple()
ORDER BY (session_id)
POPULATE AS
SELECT session_id,
       minIfState(created_at, session_name = 'start') AS started_at,
       maxIfState(created_at, session_name = 'stop')  AS stopped_at
FROM test_sessions
GROUP BY session_id;

CREATE VIEW IF NOT EXISTS test_session_completed
(
    session_id UInt64,
    started_at DateTime,
    stopped_at DateTime
)
AS
SELECT session_id,
       minIfMerge(started_at) AS started_at,
       maxIfMerge(stopped_at) AS stopped_at
FROM test_session_aggregate_states
GROUP BY session_id
HAVING (started_at != '0000-00-00 00:00:00') AND
       (stopped_at != '0000-00-00 00:00:00')
;

正常工作:返回 1 行现有的“开始”和“停止”

SELECT * FROM test_session_completed;
-- 1,2021-01-31 00:00:00,2021-01-31 01:00:00

尝试创建基于test_session_completed 的物化视图,并连接到其他表(示例中没有连接)

CREATE MATERIALIZED VIEW IF NOT EXISTS test_mv
(
    session_id UInt64
)
ENGINE = MergeTree
PARTITION BY tuple()
ORDER BY (session_id)
POPULATE AS
SELECT session_id
FROM test_session_completed
;

编写一个测试查询来测试test_mv

INSERT INTO test_sessions (session_id, session_name, created_at) VALUES
(3, 'start', '2021-01-31 02:00:00'),
(3, 'stop', '2021-01-31 03:00:00');

SELECT * FROM test_session_completed;
-- SUCCESS
-- 3,2021-01-31 02:00:00,2021-01-31 03:00:00
-- 1,2021-01-31 00:00:00,2021-01-31 01:00:00

SELECT * FROM test_mv;
-- FAILURE
-- 1
-- EXPECTED RESULT
-- 3
-- 1

如何根据test_session_completed填写test_mv

ClickHouse 版本:20.11.4.13

【问题讨论】:

    标签: clickhouse


    【解决方案1】:
    1. 无法创建 MV 俯视图。
    2. MV 是一个插入触发器,如果​​在同一个表中没有状态started,就不可能获得状态completed。如果您不需要检查 started 是否发生在 completed 之前,那么您可以制作更简单的 MV,只需检查 where completed
    3. 您不需要minIfState,您可以使用min (SimpleAggregateFunction)。它将减少存储的数据并提高性能。
    4. 我觉得第二个MV过分了。

    检查这个: https://den-crane.github.io/Everything_you_should_know_about_materialized_views_commented.pdf

    https://youtu.be/ckChUkC3Pns?list=PLO3lfQbpDVI-hyw4MyqxEk3rDHw95SzxJ&t=9371


    我会这样做:

    如果不存在则创建表 test_sessions ( session_id UInt64, session_name 字符串, created_at 日期时间 ) 引擎 = 合并树() PARTITION BY toYYYYMM(created_at) ORDER BY (session_id); 如果不存在则创建物化视图 test_session_aggregate_states ( session_id UInt64, started_at SimpleAggregateFunction(min, DateTime), stop_at SimpleAggregateFunction(max, DateTime) ) 引擎 = 聚合合并树 元组分区() ORDER BY (session_id) 填充为 选择会话 ID, minIf(created_at, session_name = 'start') AS started_at, maxIf(created_at, session_name = 'stop') ASstopped_at FROM test_sessions 按会话 ID 分组; 插入到 test_sessions (session_id, session_name, created_at) 值 (3, '开始', '2021-01-31 02:00:00'), (3, '停止', '2021-01-31 03:00:00'); 已完成的会话: 选择会话 ID, min(started_at) AS 开始时间, max(stopped_at) AS 已停止 FROM test_session_aggregate_states 按会话 ID 分组 HAVING (started_at != '0000-00-00 00:00:00') 和 (stopped_at != '0000-00-00 00:00:00'); ┌─session_id─┬──────────started_at─┬──────────stopped_at─┐ │ 1 │ 2021-01-31 00:00:00 │ 2021-01-31 01:00:00 │ └────────────┴──────────────────────┴────────────── ────────┘

    并且使用 argMaxState 您可以在一个 session_id 中聚合多个 start stop

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 2014-04-25
      • 1970-01-01
      • 1970-01-01
      • 2015-02-11
      • 2012-11-16
      • 2021-07-21
      • 1970-01-01
      相关资源
      最近更新 更多