【问题标题】:Import JSON Data to table with schema使用模式将 JSON 数据导入表
【发布时间】:2021-10-11 02:49:10
【问题描述】:

我将 JSON 对象存储在一个列(字符串)中。我想将其转换为带有架构的表。

JSON_DATA
{"id":"ksah2132","connections":{"structure":["123","456","789"]},"options":[{"id":"AA123","type":"optionA"},{"id":"BB123","type":"optionB"},{"id":"CC123","type":"optionC"}]}
{"id":"ksah3321","connections":{"structure":["567","332","435"]},"options":[{"id":"AA133","type":"optionA"},{"id":"BB156","type":"optionB"},{"id":"CC445","type":"optionC"}]}

带有架构的表:

CREATE TABLE `sandboxabc.raw_data`(`options` array<struct<id:string,type:string>>, `connections` struct<structure:array<string>>, `id` string)

如何使用 Spark SQL 将覆盖插入到新表中? 我的代码:

INSERT OVERWRITE TABLE sandboxabc.structured_data
SELECT
    from_json (JSON_DATA,'$.options') AS options
    ,from_json (JSON_DATA,'$.connections') AS connections
    ,from_json (JSON_DATA,'$.id') AS id
FROM
    sandboxabc.raw_data

输出样本:

id connection option
ksah2132 {"structure":["123","456","789"]} [{"id":"AA123","type":"optionA"},{"id":"BB123","type":"optionB"},{"id":"CC123","type":"optionC"}

【问题讨论】:

  • 1. JSON_DATA 字符串不正确,它需要一个开闭花括号 { }(一个在开头,一个在结尾)
  • 2.您是否在原始表中有数据,就像您在创建语句中显示的那样,并且您想读取原始表并在“sttuctured_data”中以结构化格式传输它?那么你可以发布你需要的结构化数据的示例输出吗?
  • @Pradeepyadav 是的,提供了样本..

标签: sql json apache-spark apache-spark-sql


【解决方案1】:

下面的 spark-sql 代码应该适合你。请注意,应该启用 hive 支持,并且类路径中应该存在 hive 相关的 jar。

INSERT OVERWRITE TABLE sandboxabc.structured_data
  SELECT
    id,
    from_json(connection, "struct<structure:array<string>>") as connection,
    from_json(options, "array<struct<id:string,type:string>>") as options
      FROM (
        select
           get_json_object(JSON_DATA,'$.id') as id,
           get_json_object(JSON_DATA,'$.connection') as connection,
           get_json_object(JSON_DATA,'$.options') as options
      FROM sandboxabc.raw_data)

【讨论】:

    猜你喜欢
    • 2014-12-28
    • 1970-01-01
    • 2011-02-09
    • 1970-01-01
    • 1970-01-01
    • 2012-05-29
    • 1970-01-01
    • 2013-06-01
    相关资源
    最近更新 更多