【问题标题】:Is it possible to COUNT using PARTITION BY?是否可以使用 PARTITION BY 进行计数?
【发布时间】:2021-09-05 03:16:21
【问题描述】:

我在 BigQuery 中有一个数据库,其中每条记录都是我网站上的网络流量会话。

我目前有这张表可以告诉我一个人是如何访问我的网站的,还有一个列可以告诉我每个事件的顺序。

最终目标是查看一个人在自然会话之前必须进行多少次“非自然”会话。

我正在尝试创建一个额外的列,每次发生转化时都会返回“包含”(当在非自然会话之后发生自然会话时)

我知道如何使用 Excel 执行此操作,但我不知道如何使用 SQL 执行此操作。我有一种感觉“PARTITION BY”是解决方案,但我不知道如何。

这是我的 Excel 解决方案:

=IF(AND((COUNTIF($B$2:B2,FALSE))>=1,(IF(COUNTIF($B$2:B2,FALSE)>=1,COUNTIFS($B$2:B2,TRUE,$C$2:C2,">1"),0))>=1),"include","exclude")

【问题讨论】:

标签: sql google-bigquery


【解决方案1】:

如果您尝试添加 status 标志,那么您似乎需要以下规则:

  • “排除”所有无机会话。
  • “包括”在第一个无机会话之前的所有有机会话。
  • “包括”所有其他自然会话。

如果是这样,您可以简单地使用:

select t.*
       (case when not is_organic then 'exclude'
             when countif(not is_organic) over (partition by partition by person_id order by sequence) = 0
             then 'exclude'
             else 'include'
        end) as status
from t;

但是,对于这个问题:

最终目标是查看一个人在自然会话之前必须进行多少次“非自然”会话。

我会简单地将聚合与窗口函数一起使用:

select person_id, countif(sequence < first_organic_sequence)
from (select t.*,
             min(case when is_organic then sequence end) over (partition by person_id) as first_organic_sequence
      from t
     ) t
group by person_id;

或者如果sequence 总是以1 开头并且怎么没有间隔:

select person_id, min(case when is_organic then sequence end) - 1
from t
group by person_id;

【讨论】:

    【解决方案2】:

    请看以下内容。我创建了一个中间 CTE 来定义之前的记录是什么,然后使用它来确定第一个有机的会话数以及您的状态。

    with sample as (
        SELECT 1 as person_id, FALSE as is_organic, 1 as sequence
        UNION ALL
        SELECT 1 as person_id, TRUE as is_organic, 2 as sequence
        UNION ALL 
        SELECT 1 as person_id, TRUE as is_organic, 3 as sequence
        UNION ALL  
        SELECT 2 as person_id, TRUE as is_organic, 1 as sequence
        UNION ALL
        SELECT 2 as person_id, FALSE as is_organic, 2 as sequence
    ),
    modified  as (
        select *, 
            lag(is_organic)
                over(partition by person_id order by sequence ) as previous_indicator,
        from sample
    )
    select *,
        case when previous_indicator is not true then count(person_id) over(partition by person_id order by sequence) end as sessions_before_organic, 
        case when previous_indicator IS NOT NULL and is_organic = TRUE THEN 'include'
            else 'exlcude' 
        end as status
    from modified;
    

    【讨论】:

      【解决方案3】:

      带有 Partition By 子句的 SQL Count 是 t-SQL 开发人员可以轻松使用的新的强大语法之一。您可以根据此链接一起使用这两个命令: https://www.kodyaz.com/t-sql/sql-count-function-with-partition-by-clause.aspx

      【讨论】:

        猜你喜欢
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2010-12-22
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2017-11-02
        相关资源
        最近更新 更多