【发布时间】:2019-05-01 20:33:52
【问题描述】:
我有一组数据结构如下:
[uid, product, currency, platform, date]
[100, product_1, USA, desktop, 2019-01-01]
[100, product_2, USA, desktop, 2019-01-03]
[200, product_3, CAN, mobile, 2019-01-02]
[300, product_1, GBP, desktop, 2019-01-01]
and so on...
数据必须每年汇总一次:
[year, product, currency, platform, uid_count]
[2019, product_1, USA, desktop, 1000]
[2019, product_2, USA, desktop, 2000]
[2019, product_3, GBP, mobile, 5000]
在研究了一个解决方案后,我阅读了有关草图算法的信息,这似乎是正确的方向。本质上,数据太大而无法在一批中加载,因此我需要以增量方式处理它,例如每天,这样我就不运行如下 SQL 查询:
SELECT year(date), product, currency, platform, count(distinct uid) FROM tbl_name GROUP BY 1, 2, 3, 4
或
SELECT year(date), product, currency, platform, count(distinct uid) FROM tbl_name GROUP BY 1, 2, 3, 4
with cube
【问题讨论】: