【问题标题】:Hive : Merge two maps into one columnHive:将两张地图合并为一列
【发布时间】:2023-03-06 08:17:02
【问题描述】:

我有一个蜂巢表

create table mySource(
    col_1   map<string, string>,
    col_2   map<string, string>
)

这是记录的样子

col_1                col_2
{"a":1, "b":"2"}     {"c":3, "d":"4"}

我的目标表是这样的

create table myTarget(
        my_col   map<string, string>
    )

现在我想将 mySource 中的两列合并到一个地图中,并将其提供给我的目标表。基本上我想写类似的东西

insert into myTarget
    select
        some_method(col_1, col_2) as my_col
    from mySource;

hive 中是否有内置方法可以做到这一点?我用 collect_set 尝试了一些东西,但出现了很多错误

【问题讨论】:

    标签: java dictionary hive hiveql hive-udf


    【解决方案1】:

    仅使用内置方法的解决方案。分解两个映射,UNION ALL 结果,收集key:value 的数组,用',' 连接数组,使用str_to_map 将字符串转换为映射:

    with mytable as (--Use your table instead of this
    select 
    map('a','1', 'b','2') as col_1, map('c','3', 'd','4') as col_2
    )
    
    select str_to_map(concat_ws(',',collect_set(concat(key,':',val)))) as mymap
    from
    (
    select m1.key, m1.val 
      from mytable
           lateral view explode(col_1) m1 as key, val
    union all
    select m2.key, m2.val 
      from mytable
           lateral view explode(col_2) m2 as key, val
    )s       
    ;
    

    结果:

    mymap
    
    {"a":"1","b":"2","c":"3","d":"4"}  
    

    使用brickhouse 库会容易得多:

    ADD JAR /path/to/jar/brickhouse-0.7.1.jar;
    CREATE TEMPORARY FUNCTION COMBINE AS 'brickhouse.udf.collect.CombineUDF';
    
    select combine(col_1, col_2) as mymap from mytable;
    

    【讨论】:

      猜你喜欢
      • 2017-12-25
      • 1970-01-01
      • 2021-08-18
      • 2011-11-04
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-01-19
      相关资源
      最近更新 更多