【发布时间】:2017-09-19 14:56:45
【问题描述】:
我有 sql 查询,它返回数据帧中可用的这样的数据集
id,type,name,ppu,batter.id,batter.type,topping.id,topping.type
101,donut,cake,0_55,1001,Regular,5001,None
101,donut,cake,0_55,1002,Chocolate,5001,None
101,donut,cake,0_55,1003,Blueberry,5001,None
101,donut,cake,0_55,1004,Devil's Food,5001,None
101,donut,cake,0_55,1001,Regular,5002,Glazed
101,donut,cake,0_55,1002,Chocolate,5002,Glazed
101,donut,cake,0_55,1003,Blueberry,5002,Glazed
101,donut,cake,0_55,1004,Devil's Food,5002,Glazed
101,donut,cake,0_55,1001,Regular,5003,Chocolate
101,donut,cake,0_55,1002,Chocolate,5003,Chocolate
101,donut,cake,0_55,1003,Blueberry,5003,Chocolate
101,donut,cake,0_55,1004,Devil's Food,5003,Chocolate
我需要将它覆盖到这样的嵌套 json 结构中。
{
"id": "101",
"type": "donut",
"name": "Cake",
"ppu": 0.55,
"batter":
[
{ "id": "1001", "type": "Regular" },
{ "id": "1002", "type": "Chocolate" },
{ "id": "1003", "type": "Blueberry" },
{ "id": "1004", "type": "Devil's Food" }
],
"topping":
[
{ "id": "5001", "type": "None" },
{ "id": "5002", "type": "Glazed" },
{ "id": "5003", "type": "Chocolate" }
]
}
我们是否有可能在我必须编写的 Dataframe 聚合或自定义转换中执行此操作。
在这里找到类似的问题 Writing nested JSON in spark scala 但没有完全正确的答案。
【问题讨论】:
标签: json scala apache-spark dataframe nested