【问题标题】:How to get last quarterly and last half yearly average of balance for each month in hive?如何获得蜂巢中每个月的上一季度和上半年平均余额?
【发布时间】:2018-10-19 05:52:51
【问题描述】:

我有一个带有cust_id, year_, month_, monthly_txn, monthly_bal 列的表格。我需要 计算每个月的前三个月和前六个月 avg(monthly_txn)variance(monthly_bal)。我有一个查询,它仅返回上个月而不是每个月的过去三个月和六个月的平均值和方差。我不擅长 Hive 的分析功能。

  SELECT cust_id, avg(monthly_txn)y,variance(monthly_bal)x, FROM ( 
  SELECT cust_id, monthly_txn,monthly_bal,
            row_number() over (partition by cust_id order by year_,month_ desc) r
        from mytable) b WHERE r <= 3 GROUP BY cust_id

但我想要下面的东西。

输入:

 cust_id     year_   month_     monthly_txn  monthly_bal
1            2018     1              456    8979289
1            2018     2              675    4567
1            2018     3              645    4890
1            2017     1              342    44522
1            2017     2              378    9898900
1            2017     2              456    234492358
1            2017     4              3535   789
1            2017     5              456    345
1            2017     6              598    334

预期输出:

假设 txn 季度和半年度 txn 的方差也一样

cust_id     year_    month_     monthly_txn  monthly_bal     q_avg_txn            h_avg_txn
   1         2018      1          456          8979289       avg(456,598,4561)    avg(456,598,4561,3535,4536,378)
   1         2018      2          675          4567          avg(675,456,598)     avg(675,456,3535,4561,598,4536)
   1         2018      3          645          4890          avg(645,675,645)     avg(645,675,645,3535,4561,598)
   1         2017      1          342          44522         avg(342)             avg(342)
   1         2017      2          378          9898900       avg(378,342)         avg(378,342)
   1         2017      3          4536         234492358     avg(4536,372,342)    avg(4536,378,342)
   1         2017      4          3535         789           avg(3535,4536,378)   avg(3535,4536,378,342) 
   1         2017      5          4561         345           avg(4561,3535,4536)  avg(4561,3535,4536,342,378)
   1         2017      6          598          334           avg(598,4561,3535)   avg(598,4561,3535,4536,342,378) 

【问题讨论】:

  • 使用示例数据和预期输出进行更新
  • “前三个月和前六个月”,那么当前日期是系统日期还是在where条件下的某个指定日期??
  • 对于日期,我只有年和月。我必须根据历史数据来做这件事。所以当前年份和月份将出现在每个记录中的月份和年份
  • 蜂巢版本好吗?
  • Hive 1.1.0-cdh5.8.4

标签: sql hive


【解决方案1】:

使用unbounded preceding分析函数(/*获取季度和半年值),然后使用子查询获取结果。

What is ROWS UNBOUNDED PRECEDING used for in Teradata?

【讨论】:

    【解决方案2】:

    如果您有每个感兴趣的月份的数据(即没有间隔),那么这应该可行:

    select t.*,
           avg(monthly_bal) over (partition by cust_id
                                  order by year_, month_ 
                                  rows between 2 preceding and current row
                                 ) as avg_3,
           avg(monthly_bal) over (partition by cust_id
                                  order by year_, month_ 
                                  rows between 5 preceding and current row
                                 ) as avg_6,
           variance(monthly_bal) over (partition by cust_id
                                       order by year_, month_ 
                                       rows between 2 preceding and current row
                                      ) as variance_3,
           variance(monthly_bal) over (partition by cust_id
                                       order by year_, month_ 
                                       rows between 5 preceding and current row
                                      ) as variance_6
    from mytable t;
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2021-12-26
      • 1970-01-01
      • 2016-10-11
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2020-04-28
      相关资源
      最近更新 更多