【发布时间】:2021-08-11 17:05:03
【问题描述】:
重点是在每个周期获得峰值(例如 5m 个峰值)以获取累积值。所以需要对每个周期求和,然后可以在这些总和中找到峰值(最大值)。 (select max(v) from (select sum(v) from t group by a1, a2))
我有一个基表t。
数据被插入t,考虑两个属性(时间t1和一些字符串a2)和一个数值。
价值会累积,因此需要将其相加以获得特定时期的总交易量。插入行示例:
t1 | a2 | v
----------------
date1 | b | 1
date2 | c | 20
我使用 MV 来计算 sumState(),然后使用 sumMerge() 和 max() 得到峰值。
我只需要最大值,所以我想知道我可以直接使用maxState()。
这就是我现在要做的:我使用计算 5m 总和的 MV,并从中读取 max()
CREATE TABLE IF NOT EXISTS sums_table ON CLUSTER '{cluster}' (
t1 DateTime,
a2 String,
v AggregateFunction(sum, UInt32)
)
ENGINE = ReplicatedAggregatingMergeTree(
'...',
'{replica}'
)
PARTITION BY toDate(t1)
ORDER BY (a2, t1)
PRIMARY KEY (a2);
CREATE MATERIALIZED VIEW IF NOT EXISTS mv_a
ON CLUSTER '{cluster}'
TO sums_table
AS
SELECT toStartOfFiveMinute(t1) AS t1, a2,
sumState(toUInt32(v)) AS v
FROM t
GROUP BY t1, a2
从中我可以读取 a2 的最大 5m 总和
SELECT
a2,
max(sum) AS max
FROM (
SELECT
t1,
a2,
sumMerge(v) AS sum
FROM sums_table
WHERE t1 BETWEEN :fromDateTime AND :toDateTime
GROUP BY t1, a2
)
GROUP BY a2
ORDER BY max DESC
效果很好。
所以我想使用maxState 和maxMerge() 来达到同样的效果:
CREATE TABLE IF NOT EXISTS max_table ON CLUSTER '{cluster}' (
t1 DateTime,
a2 String,
max_v AggregateFunction(max, UInt32)
)
ENGINE = ReplicatedAggregatingMergeTree(
'...',
'{replica}'
)
PARTITION BY toDate(t1)
ORDER BY (a2, t1)
PRIMARY KEY (a2)
CREATE MATERIALIZED VIEW IF NOT EXISTS mv_b
ON CLUSTER '{cluster}'
TO max_table
AS
SELECT
t1,
a2
maxState(v) AS max_v
FROM (
SELECT
toStartOfFiveMinute(t1) AS t1,
a2,
toUInt32(sum(v)) AS v
FROM t
GROUP BY t1, a2
)
GROUP BY t1, a2
我想如果我每次得到一个最大值 (t1) 和 a2,然后选择每个 a2 中的最大值,我会得到每个 a2 的最大值,但是使用这个我得到完全不同的最大值查询与上述总和的最大值相比。
SELECT
a2,
max(max) AS max
FROM (
SELECT
t1,
a2,
maxMerge(v) AS max
FROM max_table
WHERE t1 BETWEEN :fromDateTime AND :toDateTime
GROUP BY t1, a2
) maxs_per_time_and_a2
GROUP BY a2
我做错了什么?是不是我弄错了MV?是否可以将maxState 和maxMerge 用于2+ 属性来计算更长时期内的最大值,比如年份?
【问题讨论】:
标签: clickhouse