【发布时间】:2021-08-05 09:18:24
【问题描述】:
这对我来说有点难以解释,所以我会尽力而为。这是给定的数据集:
| Name | Car Brand | Car Model | Car Color | Year Bought |
|---|---|---|---|---|
| Tom | Toyota | Corolla | Black | 2009 |
| Tom | Hyundai | Kona | Blue | 2010 |
| Tom | Kia | Soul | Red | 2011 |
| Bob | Mazda | CX-30 | Red | 2008 |
| Bob | BMW | X1 | Blue | 2014 |
使用给定的数据集,我想根据名称对其进行压缩,然后将所有汽车放入一个列表中,并将其作为 JSON 对象输出到文件中的分隔行上。对于上面的数据集,输出应该是这样的:
{
"name": "Tom",
"Cars": [{
"CarSpecifications": {
"Brand": "Toyota",
"Model": "Corolla",
"Color": "Black"
},
"YearBought":2009
},
{
"CarSpecifications": {
"Brand": "Hyundai",
"Model": "Kona",
"Color": "Blue"
},
"YearBought":2010
},
{
"CarSpecifications": {
"Brand": "Hyundai",
"Model": "Kona",
"Color": "Blue"
},
"YearBought":2011
}]
}
{
"name": "Bob",
"Cars": [{
"CarSpecifications": {
"Brand": "Mazda",
"Model": "CX-30",
"Color": "Red"
},
"YearBought":2008
},
{
"CarSpecifications": {
"Brand": "BMW",
"Model": "X1",
"Color": "Blue"
},
"YearBought":2014
}]
}
如何使用 Scala 和 Scala Dataframes 完成这些转换?
【问题讨论】:
标签: scala dataframe apache-spark