【发布时间】:2020-04-02 14:54:35
【问题描述】:
我们有一个应用程序可以捕获用户进行的搜索。由于我们搜索的性质(我们在几个字符后提供结果)和人们输入的速度,我们会为每个搜索/字母获取一个日志条目。这看起来像这样:
(它看起来像一棵倒置的圣诞树......)
我们在内部需要这些数据来计算搜索次数(也称为 API 调用),但为了向我们的客户报告,报告“一半”查询并不是很好。
我正在寻找一种方法将这些行折叠成具有最长/最后一个搜索词的行。
有一个问题: 用户(cid)可以在会话中进行超过 1 次搜索,但如果我们查看时间戳,我猜我们可以将其分开。
它必须是这样的:
1) 将相隔不超过 2 秒的行分组
2) 按长度(或最后一个)查询排序得到最终查询
3) 按字词分组以计算一个字词用于报告的频率
数据作为文本:
2019-12-09 2019-12-09 12:58:45 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cleaner
2019-12-09 2019-12-09 12:58:45 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cleane
2019-12-09 2019-12-09 12:58:44 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum clean
2019-12-09 2019-12-09 12:58:43 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum clea
2019-12-09 2019-12-09 12:58:43 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cle
2019-12-09 2019-12-09 12:58:42 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum cl
2019-12-09 2019-12-09 12:58:41 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum c
2019-12-09 2019-12-09 12:58:40 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuum
2019-12-09 2019-12-09 12:58:39 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacuu
2019-12-09 2019-12-09 12:58:38 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vacu
2019-12-09 2019-12-09 12:58:37 5dea585477c94502b52c43fb 92cd6cef-3ed8-4416-ac2d-cc347780b976 search 1 search query vac
2019-12-09 2019-12-09 12:58:15 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widget
2019-12-09 2019-12-09 12:58:14 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widge
2019-12-09 2019-12-09 12:58:13 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue widg
2019-12-09 2019-12-09 12:58:12 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue wid
2019-12-09 2019-12-09 12:58:12 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue wi
2019-12-09 2019-12-09 12:58:11 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue w
2019-12-09 2019-12-09 12:58:10 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blue
2019-12-09 2019-12-09 12:58:09 5dea585477c94502b52c43fb 9b41fb1d-59d2-4a12-8974-b2261b2fe484 search 0 search query blu
2019-12-09 2019-12-09 12:57:38 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widget
2019-12-09 2019-12-09 12:57:37 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widge
2019-12-09 2019-12-09 12:57:36 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query widg
2019-12-09 2019-12-09 12:57:35 5dea585477c94502b52c43fb f96305d9-590b-4a10-95a2-2d49a9fc63a3 search 1 search query wid
预期结果:
vacuum cleaner 1
blue widget 1
widget 1
【问题讨论】:
-
这里的大多数人希望样本表数据和预期结果为格式化文本,而不是图像。
-
现在看起来好多了!
-
SELECT max(date), max(ev) FROM table GROUP BY cid?
标签: sql clickhouse