【问题标题】:select group by dimension and accumulate column percent按维度选择分组并累积列百分比
【发布时间】:2020-08-03 21:55:06
【问题描述】:

我正在使用 ClickHouse,我有这张桌子

| URL           | visits        |
| ------------- |:------------- |
| URL1          | 5             |
| URL2          | 30            |
| URL3          | 1             |
| URL4          | 30            |
| URL5          | 9             |
| URL1          | 5             |
| URL2          | 20            |

我可以按网址分组,

select
    url,
    sum(visits) as visits
from
    database.tableVistis
group by
    url

| URL           | visits        |
| ------------- |:------------- |
| URL1          | 10            |
| URL2          | 50            |
| URL3          | 1             |
| URL4          | 30            |
| URL5          | 9             |

我想要这个结果(按 url 分组、占总访问次数的百分比和累计总和百分比)

| URL           | visits        | %        | Accumulate |
| ------------- |:------------- |----------|------------|
| URL2          | 50            |50%       | 50%        |
| URL4          | 30            |30%       | 80%        |
| URL1          | 10            |10%       | 90%        |
| URL5          | 9             |9%        | 99%        |
| URL3          | 1             |1%        | 100%       |

有什么想法吗? 谢谢!!

【问题讨论】:

  • 您使用的是哪个数据库?请标记它。
  • 当您的问题中没有对聚合状态的引用时,问题标题包含对runningAccumulate 的引用。很混乱,可能需要修复标题。
  • 对不起,我对这种聚合状态的了解非常少:(

标签: clickhouse


【解决方案1】:

是的!谢谢!!

SELECT
    result.1 AS URL,
    result.2 AS visits,
    round(result.3,
    2) AS "%",
    round(result.4,
    2) AS Accumulate
FROM
    (
    SELECT
        groupArray((URL,
        visits)) url_visits,
        arraySum(x -> x.2,
        url_visits) total_visits,
        arrayMap(x -> (100 / total_visits) * x.2,
        url_visits) percent_visits,
        arrayCumSum(percent_visits) acc_percent_visits,
        arrayJoin(arrayMap((x,
        y,
        z) -> (x.1,
        x.2,
        y,
        z),
        url_visits,
        percent_visits,
        acc_percent_visits)) result
    FROM
        (
        SELECT
            URL,
            sum(visits) visits
        FROM
            (/* test data */
            SELECT
                URL,
                visits
            FROM
                (
                select
                    landing as URL,
                    visitas as visits
                from
                    Analytics
                where
                    fecha > '2019-05-01'))
        GROUP BY
            URL
        ORDER BY
            visits DESC))

这个查询更简洁:

SELECT 
    result.1 AS URL, 
    result.2 AS visits, 
    round(result.3, 2) AS `%`, 
    round(result.4, 2) AS Accumulate
FROM 
(
    SELECT 
        groupArray((URL, visits)) AS url_visits, 
        arraySum(x -> (x.2), url_visits) AS total_visits, 
        arrayMap(x -> ((100 / total_visits) * (x.2)), url_visits) AS percent_visits, 
        arrayCumSum(percent_visits) AS acc_percent_visits, 
        arrayJoin(arrayMap((x, y, z) -> (x.1, x.2, y, z), url_visits, percent_visits, acc_percent_visits)) AS result
    FROM 
    (
        SELECT 
            landing AS URL, 
            sum(visitas) AS visits
        FROM Analytics
        WHERE fecha > '2019-05-01'
        GROUP BY URL
        ORDER BY visits DESC
    )
)

【讨论】:

    【解决方案2】:

    试试这个查询:

    SELECT result.1 AS URL, result.2 AS visits, round(result.3,2) AS "%", round(result.4, 2) AS Accumulate
    FROM (
        SELECT 
            groupArray((URL, visits)) url_visits,
            arraySum(x -> x.2, url_visits) total_visits,
            arrayMap(x -> (100 / total_visits) * x.2, url_visits) percent_visits,
            arrayCumSum(percent_visits) acc_percent_visits,
            arrayJoin(arrayMap((x, y, z) -> (x.1, x.2, y, z), url_visits, percent_visits, acc_percent_visits)) result
        FROM (
            SELECT URL, sum(visits) visits
            FROM (
                /* test data */
                SELECT data.1 URL, data.2 visits
                FROM (
                    SELECT arrayJoin([
                        ('URL1', 5 ),
                        ('URL2', 30),
                        ('URL3', 1 ),
                        ('URL4', 30),
                        ('URL5', 9 ),
                        ('URL1', 5 ),
                        ('URL2', 20)]) data))
            GROUP BY URL
            ORDER BY visits DESC))
    /* result 
    ┌─URL──┬─visits─┬──%─┬─Accumulate─┐
    │ URL2 │     50 │ 50 │         50 │
    │ URL4 │     30 │ 30 │         80 │
    │ URL1 │     10 │ 10 │         90 │
    │ URL5 │      9 │  9 │         99 │
    │ URL3 │      1 │  1 │        100 │
    └──────┴────────┴────┴────────────┘
    
    */
    

    【讨论】:

      猜你喜欢
      • 2013-06-10
      • 2020-12-23
      • 2018-09-08
      • 1970-01-01
      • 2021-03-08
      • 2017-03-03
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多