如下创建一个案例类
case class TestData (location: String, name: String, value: String)
虚拟数据
val test = Seq(("New York", "Jack", "jdhj"),
("Los Angeles", "Tom", "ff"),
("Chicago", "David", "ff"),
("Houston", "John", "dd"),
("Detroit", "Michael", "fff"),
("Chicago", "Andrew", "ddd"),
("Detroit", "Peter", "dd"),
("Detroit", "George", "dkdjkd")
)
//change each row to TestData object
.map(x => TestData(x._1, x._2, x._3))
.toDS() // create dataset from above data
根据需要输出
test.groupBy($"location")
.agg(collect_list(struct("name", "value")).as("data"))
.show(false)
输出:
+-----------+--------------------------------------------+
|location |data |
+-----------+--------------------------------------------+
|Los Angeles|[[Tom,ff]] |
|Detroit |[[Michael,fff], [Peter,dd], [George,dkdjkd]]|
|Chicago |[[David,ff], [Andrew,ddd]] |
|Houston |[[John,dd]] |
|New York |[[Jack,jdhj]] |
+-----------+--------------------------------------------+