【发布时间】:2021-11-03 09:54:30
【问题描述】:
我有以下 json 数据:
{
"3200": {
"id": "3200",
"value": [
"cat",
"dog"
]
},
"2000": {
"id": "2000",
"value": [
"bird"
]
},
"2500": {
"id": "2500",
"value": [
"kitty"
]
},
"3650": {
"id": "3650",
"value": [
"horse"
]
}
}
这个数据的schema,我们用spark加载数据后的printSchema工具如下:
root
|-- 3200: struct (nullable = true)
| |-- id: string (nullable = true)
| |-- value: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- 2000: struct (nullable = true)
| |-- id: string (nullable = true)
| |-- value: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- 2500: struct (nullable = true)
| |-- id: string (nullable = true)
| |-- value: array (nullable = true)
| | |-- element: string (containsNull = true)
|-- 3650: struct (nullable = true)
| |-- id: string (nullable = true)
| |-- value: array (nullable = true)
| | |-- element: string (containsNull = true)
我想得到以下数据框
id value
3200 cat
2000 bird
2500 kitty
3200 dog
3650 horse
如何进行解析以获得预期的输出
【问题讨论】:
标签: json apache-spark apache-spark-sql