【问题标题】:select non-existing data as null using HIVE LATERAL VIEW使用 HIVE LATERAL VIEW 选择不存在的数据为空
【发布时间】:2017-09-20 21:04:40
【问题描述】:

我正在尝试使用蜂巢中的外部爆炸将不存在的数据设为空,但我的查询没有返回任何内容。 编辑 : 表 - 年份字符串,公司排名

year:string,topcompanies:array<struct<name:string,rank:string>>

样本数据 编辑:

    2015,
  "topcompanies":[
  {"name":"apple","rank":"1"},
  {"name":"samsung","rank":"2"},
  {"name":"SONY","rank":"3"},
  ]

  2016,
  "topcompanies":[
  {"name":"apple","rank":"1"},
  {"name":"samsung","rank":"2"},
  {"name":"SONY","rank":"3"},
  {"name":"LG","rank":"4"}
  ]

查询获取数据

select  year, rank1, rank2, rank3, rank4
FROM companyrank
LATERAL VIEW outer explode(topcompanies) rank1_t as rank1_v
LATERAL VIEW outer explode(topcompanies) rank2_t as rank2_v
LATERAL VIEW outer explode(topcompanies) rank3_t as rank3_v
LATERAL VIEW outer explode(topcompanies) rank4_t as rank4_v
WHERE 
 (rank1_v.rank = 1 or rank1_v.rank is null)
 AND (rank2_v.rank = 2 or rank2_v.rank is null)
 AND (rank3_v.rank = 3 or rank3_v.rank is null)
 AND (rank4_v.rank = 4 or rank4_v.rank is null)

预期输出-

expected output when rank4 does not exists
year  rank1 rank2   rank3   rank4
2016  apple samsung SONY    null

如果rank4数据存在则

    year rank1  rank2   rank3   rank4
    2015 apple  samsung SONY    LG

编辑:

我需要获得每年的所有 4 个排名,如果任何一个排名不存在,那么排名应该显示为 NULL。

【问题讨论】:

    标签: hive hiveql


    【解决方案1】:

    您的问题的直接答案是“使用lateral view outer”,但有一个更简洁的解决方案。

    select      min (case when i.rank = 1 then i.name end)  as rank1
               ,min (case when i.rank = 2 then i.name end)  as rank2
               ,min (case when i.rank = 3 then i.name end)  as rank3
               ,min (case when i.rank = 4 then i.name end)  as rank4
    
    from        companyrank c
                lateral view inline(topcompanies) i
    ;
    

    +--------+----------+--------+--------+
    | rank1  |  rank2   | rank3  | rank4  |
    +--------+----------+--------+--------+
    | apple  | samsung  | SONY   | NULL   |
    +--------+----------+--------+--------+
    

    【讨论】:

    • 感谢@Dudu Markovitz...不幸的是,我正在运行 hive 0.14 并且 min 不受支持“FAILED: SemanticException [Error 10128]: Line 6:0 Not yet supported place for UDAF 'min'”。虽然我使用的是横向视图外爆炸,但当 rank4 没有值时仍然无法获取数据。
    • 尝试使用子查询:select min (case when i.rank = 1 then i.name end) as rank1 ... from (select i.* from companyrank c lateral view inline(topcompanies) i) i ;
    • 谢谢 Dudu .... 它工作得非常好...只有一个问题是性能...
    • 我们又来了... :-) 你认为使用合理的优化器,select * from tselect * from (select * from (select * from (select * from (select * from ( select * from t) t) t) t) t) t; 之间存在性能差异吗?
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2021-05-19
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2019-06-18
    • 1970-01-01
    相关资源
    最近更新 更多