【问题标题】:Hive - Reformat data structureHive - 重新格式化数据结构
【发布时间】:2022-01-21 23:35:42
【问题描述】:

所以我有一个 Hive 数据样本:

costumer xx_var yy_var branchflow
{"customer_no":"239230293892839892","acct":["2324325","23425345"]} 23 3 [{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]

我想把它变成这样的东西:

costumer_no acct xx_var yy_var branchflow
239230293892839892 2324325 23 3 [1,2,3,4,5,6,6,6,4]
239230293892839892 23425345 23 3 [1,2,3,4,5,6,6,6,99,4]

我已尝试使用此查询,但输出格式错误。

SELECT customer.customer_no,
       acct,
       xx_var,
       yy_var,
       bi_acctno,
       values_bi
FROM struct_test LATERAL VIEW explode(customer.acct) acct AS acctno
LATERAL VIEW explode(brancflow.acctno) bia as bi_acctno
LATERAL VIEW explode(brancflow.value) biv as values_bi
where bi_acctno = acctno

有人知道如何解决这个问题吗?

【问题讨论】:

    标签: sql json hive hiveql


    【解决方案1】:

    使用 json_tuple 提取 JSON 元素。如果是数组,它也将它作为字符串返回:删除方括号、拆分和分解。见演示代码中的 cmets。

    演示:

    with mytable as (--demo data, use your table instead of this CTE
    select '{"customer_no":"239230293892839892","acct":["2324325","23425345"]}' as costumer,    
           23 xx_var,   3 yy_var,   
           '[{"acctno":"2324325","value":[1,2,3,4,5,6,6,6,4]},{"acctno":"23425345","value":[1,2,3,4,5,6,6,6,99,4]}]' branchflow
    )
    
    select c.customer_no, 
           a.acct,  
           t.xx_var,    t.yy_var, 
           get_json_object(b.acct_branchflow,'$.value') value
      from mytable t
           --extract customer_no and acct array
           lateral view json_tuple(t.costumer, 'customer_no', 'acct') c as customer_no, accts
           --remove [] and " and explode array of acct
           lateral view explode(split(regexp_replace(c.accts,'^\\[|"|\\]$',''),',')) a as acct
           --remove [] and exlode array of json
           lateral view explode(split(regexp_replace(t.branchflow,'^\\[|\\]$',''),'(?<=\\}),(?=\\{)')) b as acct_branchflow
    --this will remove duplicates after lateral view: need only matching acct
     where get_json_object(b.acct_branchflow,'$.acctno') = a.acct
    

    结果:

    customer_no         acct        xx_var  yy_var  value
    239230293892839892  2324325     23      3       [1,2,3,4,5,6,6,6,4]
    239230293892839892  23425345    23      3       [1,2,3,4,5,6,6,6,99,4]
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2019-05-01
      • 2017-10-19
      • 1970-01-01
      相关资源
      最近更新 更多