【发布时间】:2020-08-06 20:34:00
【问题描述】:
我有一个下表,我需要从中导出不同的 account_id,并将其余 2 列中的所有行作为 map
输入
| account_id | fl_group_id | fl_group_value |
+----------------+----------------------+-------------------------+
| 1152956260987 | 10 | 983 |
| 1152956260987 | 12 | 2144 |
| 1152956260987 | 1 | 82 |
预期输出
| account_id | account_flg
+----------------+----------------------
| 1152956260987 | {"10":"983","12":"2144","1":"82"}
我已经在蜂巢中尝试了以下查询
create table wf_test2 as select account_id, map(fl_group_id,fl_group_value) as account_flags from wf_test ;
select a.account_id,collect_set(a.account_flags)as account_flags from wf_test2 a where a.account_id='1152956260987' group by a.account_id ;
但我得到的输出是array<map<string,string>> 而不是map<string,string>
| 1152956260987 | [{"10":"983"},{"12":"2144"},{"1":"82"}] |
显示创建表 wf_test2
CREATE TABLE `wf_test2`(
`account_id` string,
`account_flags` map<string,string>)
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
STORED AS INPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
OUTPUTFORMAT
'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'
LOCATION
'hdfs://hive/wf_test2'
TBLPROPERTIES (
'bucketing_version'='2',
'transactional'='true',
'transactional_properties'='default',
'transient_lastDdlTime'='1596648078')
【问题讨论】:
-
你能分享
wf_test2show create table 因为无法运行select a.account_id,collect_set(a.account_flags)as account_flags from wf_test a where a.account_id='1152956260987' group by a.account_id ;查询 -
@smart_coder 我现在在问题中添加了“create table DDL”
-
您可以使用 Klout Brickhouse UDF 通过其自定义
collect函数来执行此操作。类似SELECT account_id,collect(id,value) FROM table GROUPING BY account_id。