【问题标题】:Picking minValue and its row in hive在 hive 中选择 minValue 及其行
【发布时间】:2020-03-25 10:32:03
【问题描述】:

我必须在 2 小时的滑动日期窗口及其对应的日期值中选择 minValue。例如

Create table stock(time string, cost float);

Insert into stock values("1990-01-01 8:00 AM",4.5);
Insert into stock values("1990-01-01 9:00 AM",3.2);
Insert into stock values("1990-01-01 10:00 AM",3.1);
Insert into stock values("1990-01-01 11:00 AM",5.5);
Insert into stock values("1990-01-02 8:00 AM",5.1);
Insert into stock values("1990-01-02 9:00 AM",2.2);
Insert into stock values("1990-01-02 10:00 AM",1.5);
Insert into stock values("1990-01-02 11:00 AM",6.5);
Insert into stock values("1990-01-03 8:00 AM",8.1);
Insert into stock values("1990-01-03 9:00 AM",3.2);
Insert into stock values("1990-01-03 10:00 AM",2.5);
Insert into stock values("1990-01-03 11:00 AM",4.5);

为此,我可以编写这样的查询

select min(cost) over(order by unix_timestamp(time) range between current row and 7200 following)
from stock

因此,从当前行向前看 2 小时(7200 秒)并选择最小值 第一行的最小值将是 3.1,位于第三行上午 10:00。我通过这个查询得到了正确的最小值,但我还需要最小值的相应日期值,在这种情况下,我想要“1990-01-01 10:00 AM”。这个怎么选?

谢谢, 拉杰

【问题讨论】:

    标签: sql hive bigdata hiveql


    【解决方案1】:

    我认为这是一个难题。一种方法是join 查找值:

    select s.*
    from (select s.*,
                 min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
          from stock s
         ) s join
         stock smin
         on smin.cost = min_cost and
            unix_timestamp(smin.time) >= unix_timestamp(s.time) and
            unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
    

    这种方法的缺点是它可能会产生重复。如果这是一个问题:

    select s.*
    from (select s.*, smin.time as min_time,
                 row_number() over (partition by s.time order by smin.time) as seqnum
          from (select s.*,
                       min(cost) over (order by unix_timestamp(time) range between current row and 7200 following) as min_cost,
                from stock s
               ) s join
               stock smin
               on smin.cost = min_cost and
                  unix_timestamp(smin.time) >= unix_timestamp(s.time) and
                  unix_timestamp(smin.time) < unix_timestamp(s.time) + 7200
           ) s
    where seqnum = 1;
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      相关资源
      最近更新 更多