【问题标题】:Hive logic to get min time, max time and other columns获取最小时间、最大时间和其他列的 Hive 逻辑
【发布时间】:2017-07-25 09:13:05
【问题描述】:

我有格式的数据

+---------------------+-------------------------+-------------------------+-----------+------+
|         id          |       start time        |        end time         | direction | name |
+---------------------+-------------------------+-------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 15:10:28.677 | 2015-06-02 15:32:22.677 |         3 | xyz  |
| 9202340753368000000 | 2015-06-02 14:55:37.353 | 2015-06-02 15:12:18.84  |         1 | xyz  |
+---------------------+-------------------------+-------------------------+-----------+------+

我需要最小开始时间、最大结束时间、最小开始时间的方向值和名称等输出

+---------------------+-------------------------+------------------------+-----------+------+
|         id          |       start time        |        end time        | direction | name |
+---------------------+-------------------------+------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 14:55:37.353 | 2015-06-02 15:32:22.677|         1 | xyz  |
+---------------------+-------------------------+------------------------+-----------+------+

我尝试过使用

select x.id, min(x.start_time) as mintime, max(x.end_time) maxtime , y.direction, y.name   
 from dir_samp x inner join ( 
 select id, start_time,  end_time, name, direction ,  
   rank() over ( partition by id
                order by start_time asc) as r 
   from dir_samp 
) y  on x.id = y.id  where y.r = 1 group by x.id , y.direction, y.name

是否还有其他更有效的逻辑?请提供。

谢谢

【问题讨论】:

    标签: sql hive


    【解决方案1】:
    select      id
               ,min_vals.start_time
               ,end_time
               ,min_vals.direction
               ,min_vals.name
    
    from       (select      id  
                           ,min(named_struct('start_time',start_time,'direction',direction,'name',name)) as min_vals
                           ,max(end_time)                                                                as end_time
    
                from        dir_samp
    
                group by    id
                ) t
    ;
    

    +---------------------+----------------------------+----------------------------+-----------+------+
    | id                  | start_time                 | end_time                   | direction | name |
    +---------------------+----------------------------+----------------------------+-----------+------+
    | 9202340753368000000 | 2015-06-02 14:55:37.353000 | 2015-06-02 15:32:22.677000 | 1         | xyz  |
    +---------------------+----------------------------+----------------------------+-----------+------+
    

    【讨论】:

      【解决方案2】:

      你不需要内连接:

      select y.id, min(y.start_time) as mintime, 
             max(y.end_time) maxtime , 
             max(case when y.r=1 then y.direction end) as direction, 
             max(case when y.r=1 then y.name end) as name 
      from
      ( 
       select id, start_time,  end_time, name, direction ,  
         rank() over ( partition by id order by start_time asc) as r 
         from dir_samp 
      ) y 
      group by y.id;
      

      【讨论】:

        猜你喜欢
        • 2018-07-13
        • 2016-05-26
        • 1970-01-01
        • 1970-01-01
        • 1970-01-01
        • 2019-05-31
        • 1970-01-01
        • 1970-01-01
        • 2017-12-18
        相关资源
        最近更新 更多