【发布时间】:2021-12-25 14:24:22
【问题描述】:
我在 Clickhouse 中有一张表格,结构如下:
x_id | y_id | z_id | rank | timestamp
1231 | 1324 | 9412 | 1 | 2021-03-12 00:13:34
121 | 5524 | 765 | 21 | 2021-03-13 15:43:21
54 | 76 | 8822 | 125 | 2021-05-14 17:23:12
213 | 61 | 7651 | 51 | 2021-03-16 12:15:43
53 | 65 | 123 | 23 | 2021-03-12 13:28:54
1231 | 432 | 7651 | 1541 | 2021-03-12 16:54:24
...
几个星期没有特定组(x_id、y_id、z_id)的记录,在这种情况下,如果有值,我需要取该组(x_id、y_id、z_id)的前一个排名(从前一周开始)存在。
例如:
group_ids, rank, timestamp
(1, 1, 1, 25, '2021-03-12 00:13:34') -> group (1, 1, 1), week 2021-03-08
(2, 2, 2, 30, '2021-03-16 00:13:34') -> group (2, 2, 2), week 2021-03-15
no data for group (1, 1, 1) for week 2021-03-15 - fill from the previous week and set "week" as the current week:
(1, 1, 1, 25, 2021-03-15)
and so on ...
然后使用子查询计算此数据的指标
SELECT
week,
SUM(CASE
WHEN rank BETWEEN 1 AND 3 THEN 1
ELSE 0
END) AS metric1,
/* ... */
FROM (
SELECT min(rank) AS rank, toStartOfWeek(Timestamp, 1) AS week FROM table GROUP BY week, x_id, y_id, z_id
) GROUP BY week ORDER BY week;
metric1 | metric2 | week
0 | 2 | 2021-03-22
1 | 0 | 2021-03-29
0 | 1 | 2021-04-05
是否可以使用前向填充缺失值构建查询?
【问题讨论】:
标签: group-by missing-data clickhouse