【问题标题】:how to convert jsonarray to multi column from hive如何将jsonarray从hive转换为多列
【发布时间】:2020-07-15 05:03:13
【问题描述】:

示例: 蜂巢表中有一个 json 数组列(类型:字符串),例如:

"[{"filed":"name", "value":"alice"}, {"filed":"age", "value":"14"}......]"

如何转换成:

name      age
alice     14

通过 hive sql? 我试过横向视图爆炸,但它不起作用。 非常感谢!

【问题讨论】:

  • 是STRING列类型吗?
  • 是,字符串类型

标签: sql json hive hiveql


【解决方案1】:

这是如何在 Hive 中解析它的工作示例。自己定制,在真实数据上调试,见代码中的cmets:

with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)


select id, 
       max(case when field_map['field'] = 'name' then field_map['value'] end) as name,
       max(case when field_map['field'] = 'age'  then field_map['value'] end) as age        --do the same for all fields 
from
(
select t.id,
       t.str as original_string,
       str_to_map(regexp_replace(regexp_replace(trim(a.field),', +',','),'\\{|\\}|"','')) field_map --remove extra characters and convert to map
  from your_table t
       lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode 
) s 
group by id --aggregate in single row
; 

结果:

OK
id      name    age
1       alice   14

另一种使用 get_json_object 的方法:

with your_table as (
select stack(1,
1,
'[{"field":"name", "value":"alice"}, {"field":"age", "value":"14"}, {"field":"something_else", "value":"somevalue"}]'
) as (id,str) --one row table with id and string with json. Use your table instead of this example
)


select id, 
       max(case when field = 'name' then value end) as name,
       max(case when field = 'age'  then value end) as age        --do the same for all fields 
from
(
select t.id,
       get_json_object(trim(a.field),'$.field') field,
       get_json_object(trim(a.field),'$.value') value
  from your_table t
       lateral view outer explode(split(regexp_replace(regexp_replace(str,'\\[|\\]',''),'\\},','}|'),'\\|')) a as field --remove [], replace "}," with '}|" and explode 
) s 
group by id --aggregate in single row
;

结果:

OK
id      name    age
1       alice   14

【讨论】:

    猜你喜欢
    • 2011-11-03
    • 2014-05-06
    • 2016-02-09
    • 2018-01-07
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2021-12-19
    • 2013-11-15
    相关资源
    最近更新 更多