【发布时间】:2017-05-28 09:02:09
【问题描述】:
我有如下数据框,如果有人可以帮助我获得以下不同格式的输出,我将不胜感激。
输入:
|customerId|transHeader|transLine|
|1001 |1001aa |1001aa1 |
|1001 |1001aa |1001aa2 |
|1001 |1001aa |1001aa3 |
|1001 |1001aa |1001aa4 |
|1002 |1002bb |1002bb1 |
|1002 |1002bb |1002bb2 |
|1002 |1002bb |1002bb3 |
|1002 |1002bb |1002bb4 |
|1003 |1003cc |1003cc1 |
|1003 |1003cc |1003cc2 |
|1003 |1003cc |1003cc3 |
+----------+-----------+---------+
预期输出集 1:
customerId headerLineMapGroup
1001 Map(1001aa -> (1001aa1, 1001aa2, 1001aa3, 1001aa4))
1002 Map(1002bb -> (1002bb1, 1002bb2, 1002bb3, 1002bb4))
1003 Map(1003cc -> (1003cc1, 1003cc2, 1003cc3))
预期输出集 2:
customerId headerLineListOfMapGroup
1001 List[ Map(1001aa -> 1001aa1), Map(1001aa ->1001aa2), Map(1001aa ->1001aa3), Map(1001aa ->1001aa4) ]
1002 List[ Map(1002bb -> 1002bb1), Map(1002bb -> 1002bb2), Map(1002bb -> 1002bb3), Map(1002bb -> 1002bb4)]
1003 List[ Map(1003cc -> 1003cc1), Map(1003cc ->1003cc2), Map(1003cc ->1003cc3) ]
【问题讨论】:
-
能否请您添加文本数据并删除图像。以便它可以被搜索和复制。
标签: scala apache-spark-sql spark-dataframe rdd