【问题标题】:Hive Multiple Map rows to one map rowHive 多个地图行到一个地图行
【发布时间】:2020-08-06 20:34:00
【问题描述】:

我有一个下表,我需要从中导出不同的 account_id,并将其余 2 列中的所有行作为 map。请指导我解决这个问题,谢谢

输入

|  account_id    | fl_group_id          | fl_group_value          |
+----------------+----------------------+-------------------------+
| 1152956260987  | 10                   | 983                     |
| 1152956260987  | 12                   | 2144                    |
| 1152956260987  | 1                    | 82                      |

预期输出

|  account_id    | account_flg
+----------------+----------------------
| 1152956260987  | {"10":"983","12":"2144","1":"82"} 

我已经在蜂巢中尝试了以下查询

create table wf_test2 as select account_id, map(fl_group_id,fl_group_value) as account_flags from wf_test ;

select a.account_id,collect_set(a.account_flags)as account_flags from wf_test2 a where a.account_id='1152956260987' group by a.account_id ;

但我得到的输出是array<map<string,string>> 而不是map<string,string>

| 1152956260987  | [{"10":"983"},{"12":"2144"},{"1":"82"}] |

显示创建表 wf_test2

CREATE TABLE `wf_test2`(                           
   `account_id` string,                             
   `account_flags` map<string,string>)              
 ROW FORMAT SERDE                                   
   'org.apache.hadoop.hive.ql.io.orc.OrcSerde'      
 STORED AS INPUTFORMAT                              
   'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'  
 OUTPUTFORMAT                                       
   'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
 LOCATION                                           
   'hdfs://hive/wf_test2'           
 TBLPROPERTIES (                                    
   'bucketing_version'='2',                         
   'transactional'='true',                          
   'transactional_properties'='default',            
   'transient_lastDdlTime'='1596648078') 

【问题讨论】:

  • 你能分享wf_test2 show create table 因为无法运行select a.account_id,collect_set(a.account_flags)as account_flags from wf_test a where a.account_id='1152956260987' group by a.account_id ; 查询
  • @smart_coder 我现在在问题中添加了“create table DDL”
  • 您可以使用 Klout Brickhouse UDF 通过其自定义 collect 函数来执行此操作。类似SELECT account_id,collect(id,value) FROM table GROUPING BY account_id

标签: sql hive hiveql


【解决方案1】:

根据 hive shell 中的@Hitobat cmets,使用brickhouse-0.6.0.jar Maven repo - https://mvnrepository.com/artifact/com.klout/brickhouse,以下方法可能会有所帮助

create table wf_test2 (account_id string, account_flags map<string,string>) row format delimited  fields terminated by ',' lines terminated by '\n' LOCATION '/stackoverflow/data/hive/dwh/wf_test2' stored as textfile;

insert overwrite table wf_test values ("1152956260987","10","983"),("1152956260987","12","2144"),("1152956260987","1","82");

select * from wf_test;

-- 1152956260987    10  983
-- 1152956260987    12  2144
-- 1152956260987    1   82

add jar file:///home/sathya/Downloads/brickhouse-0.6.0.jar;

create temporary function collect as 'brickhouse.udf.collect.CollectUDAF';

select account_id, collect(fl_group_id,fl_group_value) from wf_test group by account_id;

--- 1152956260987   {"10":"983","1":"82","12":"2144"}

create table wf_test2 (account_id string, account_flags map<string,string>) row format delimited  fields terminated by ',' lines terminated by '\n' stored as textfile;

insert overwrite table wf_test2 select account_id, collect(fl_group_id,fl_group_value) from wf_test group by account_id;

select * from wf_test2;

-- 1152956260987    {"10":"983","1":"82","12":"2144"}

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2012-03-21
    • 2018-04-05
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2015-09-05
    相关资源
    最近更新 更多