【发布时间】:2021-04-18 20:23:54
【问题描述】:
rrr = sc.parallelize([1, 2, 3])
fff = sc.parallelize([5, 6, 7, 8])
test = rrr.cartesian(fff)
这里是test:
[(1, 5),(1, 6),(1, 7),(1, 8),
(2, 5),(2, 6),(2, 7),(2, 8),
(3, 5),(3, 6),(3, 7),(3, 8)]
调用groupByKey后有没有办法保留订单:
test.groupByKey().mapValues(list).take(2)
输出是列表以随机顺序排列的地方:
Out[255]: [(1, [8, 5, 6, 7]), (2, [5, 8, 6, 7]), (3, [6, 8, 7, 5])]
想要的输出是:
[(1, [5,6,7,8]), (2, [5,6,7,8]), (3, [5,6,7,8])]
如何做到这一点?
【问题讨论】: