【发布时间】:2021-10-11 02:49:10
【问题描述】:
我将 JSON 对象存储在一个列(字符串)中。我想将其转换为带有架构的表。
| JSON_DATA |
|---|
| {"id":"ksah2132","connections":{"structure":["123","456","789"]},"options":[{"id":"AA123","type":"optionA"},{"id":"BB123","type":"optionB"},{"id":"CC123","type":"optionC"}]} |
| {"id":"ksah3321","connections":{"structure":["567","332","435"]},"options":[{"id":"AA133","type":"optionA"},{"id":"BB156","type":"optionB"},{"id":"CC445","type":"optionC"}]} |
带有架构的表:
CREATE TABLE `sandboxabc.raw_data`(`options` array<struct<id:string,type:string>>, `connections` struct<structure:array<string>>, `id` string)
如何使用 Spark SQL 将覆盖插入到新表中? 我的代码:
INSERT OVERWRITE TABLE sandboxabc.structured_data
SELECT
from_json (JSON_DATA,'$.options') AS options
,from_json (JSON_DATA,'$.connections') AS connections
,from_json (JSON_DATA,'$.id') AS id
FROM
sandboxabc.raw_data
输出样本:
| id | connection | option |
|---|---|---|
| ksah2132 | {"structure":["123","456","789"]} | [{"id":"AA123","type":"optionA"},{"id":"BB123","type":"optionB"},{"id":"CC123","type":"optionC"} |
【问题讨论】:
-
1. JSON_DATA 字符串不正确,它需要一个开闭花括号 { }(一个在开头,一个在结尾)
-
2.您是否在原始表中有数据,就像您在创建语句中显示的那样,并且您想读取原始表并在“sttuctured_data”中以结构化格式传输它?那么你可以发布你需要的结构化数据的示例输出吗?
-
@Pradeepyadav 是的,提供了样本..
标签: sql json apache-spark apache-spark-sql