【问题标题】:Bigquery Rolling month dataBigquery 滚动月份数据
【发布时间】:2020-10-05 00:24:02
【问题描述】:

我正在尝试实现这样的输出。用户在 3 个月内访问了特定页面的次数。 页面就像主页、帐户页面、购物车页面等。

我的桌子

MMDDYY  Pagevisted  Username    No. of time Month
1/1/2019    Homepage    A   1   January
2/21/2019   AccountPage A       1   February
2/25/2019   AccountPage B   5   February
3/1/2019    Homepage    A   3   March
4/2/2019    cartpage            B   2   April
5/2/2019    AccountPage A   1   May
6/2/2019    Submisison  C   1   June
5/5/2019    Homepage    D   2   May
5/2/2019    Articles    E   2   May
7/25/2019   cartpage            E   2   July
8/12/2019   Articles    A   1   August
9/23/2019   Articles    A       6   September

请您帮我查询以基于滚动的方法获取数据。 例如。如果当前月份是一月,我需要一月、二月和三月的数据 如果当前月份是二月,我需要二月,三月,四月的数据 如果当前月份是三月,我需要三月,四月,五月的数据 等等。

输出应该是:

MMDDYY  Pagevisted  Username    No. of time[3 M rolling month]  
1/1/2019    Homepage    A   4   this include 1 from jan, 3 from march
2/21/2019   AccountPage A   1   Account page opened by A user from current month to next other 2 month i.e. Mar April is only once
2/25/2019   AccountPage B   5   Account page opened by B user from current month to next other 2 month i.e. Mar April is only 5 time
3/1/2019    Homepage    A   3   User A in march month opened homepage 3 time, but he didn't opened in following 2 other month i.e. Mar April May
6/2/2019    Submisison  C   1   
5/5/2019    Homepage    D   2   
5/2/2019    Articles    E   2   
7/25/2019   cartpage            E   2   
8/12/2019   Articles    A   7   
9/23/2019   Articles    A       6   

【问题讨论】:

    标签: sql google-bigquery


    【解决方案1】:

    以下是 BigQuery 标准 SQL

    #standardSQL
    SELECT *, SUM(no_of_time) OVER(rolling_3_month_window) AS rolling_3_month
    FROM `project.dataset.table`
    WINDOW rolling_3_month_window AS (    
      PARTITION BY username, pagevisited 
      ORDER BY DATE_DIFF(PARSE_DATE('%m/%d/%Y', mmddyyyy), '1970-01-01', MONTH)
      RANGE BETWEEN CURRENT ROW AND 2 FOLLOWING 
    )
    

    如果应用到您的问题中的样本数据,如下例所示

    #standardSQL
    WITH `project.dataset.table` AS (
      SELECT '1/1/2019' mmddyyyy, 'Homepage' pagevisited, 'A' username, 1 no_of_time, 'January' month UNION ALL
      SELECT '2/21/2019', 'AccountPage', 'A', 1, 'February' UNION ALL
      SELECT '2/25/2019', 'AccountPage', 'B', 5, 'February' UNION ALL
      SELECT '3/1/2019', 'Homepage', 'A', 3, 'March' UNION ALL
      SELECT '4/2/2019', 'cartpage', 'B', 2, 'April' UNION ALL
      SELECT '5/2/2019', 'AccountPage', 'A', 1, 'May' UNION ALL
      SELECT '6/2/2019', 'Submisison', 'C', 1, 'June' UNION ALL
      SELECT '5/5/2019', 'Homepage', 'D', 2, 'May' UNION ALL
      SELECT '5/2/2019', 'Articles', 'E', 2, 'May' UNION ALL
      SELECT '7/25/2019', 'cartpage', 'E', 2, 'July' UNION ALL
      SELECT '8/12/2019', 'Articles', 'A', 1, 'August' UNION ALL
      SELECT '9/23/2019', 'Articles', 'A', 6, 'September' 
    )
    SELECT *, SUM(no_of_time) OVER(rolling_3_month_window) AS rolling_3_month
    FROM `project.dataset.table`
    WINDOW rolling_3_month_window AS (    
      PARTITION BY username, pagevisited 
      ORDER BY DATE_DIFF(PARSE_DATE('%m/%d/%Y', mmddyyyy), '1970-01-01', MONTH)
      RANGE BETWEEN CURRENT ROW AND 2 FOLLOWING 
    )
    -- ORDER BY mmddyyyy    
    

    输出是

    Row mmddyyyy    pagevisited username    no_of_time  month       rolling_3_month  
    1   1/1/2019    Homepage    A           1           January     4    
    2   2/21/2019   AccountPage A           1           February    1    
    3   2/25/2019   AccountPage B           5           February    5    
    4   3/1/2019    Homepage    A           3           March       3    
    5   4/2/2019    cartpage    B           2           April       2    
    6   5/2/2019    AccountPage A           1           May         1    
    7   5/2/2019    Articles    E           2           May         2    
    8   5/5/2019    Homepage    D           2           May         2    
    9   6/2/2019    Submisison  C           1           June        1    
    10  7/25/2019   cartpage    E           2           July        2    
    11  8/12/2019   Articles    A           1           August      7    
    12  9/23/2019   Articles    A           6           September   6    
    

    【讨论】:

    • 我有一个疑问。假设用户 A 在 1 月份没有访问“主页”,那么该查询是否会在上个月(即 2 月、3 月)搜索“主页”并进行总和?
    • 不确定你的意思。我回答了被问到的问题 - 并且输出与预期的一样。如果您想扩展您的问题的逻辑 - 请发布包含所有相关详细信息和示例的新问题,我们将尽力提供进一步的帮助
    【解决方案2】:

    你似乎想要 count(*) 有一个窗框:

    select t.*,
           sum(num_times) over (partition by username, pagevisited
                                order by extract(year from date) * 12 + extract(month from date)
                                range between 2 preceding and current row
                               )
    from t;
    

    这假定您的日期列实际上是 date - 这是存储此类值的正确方法。如果没有,你可以转换它。

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2023-01-25
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多