【发布时间】:2022-01-25 10:57:26
【问题描述】:
我需要阅读下面提供的具有逗号分隔值的行,并生成一个键值对 RDD,如输出所示。我是新手,欢迎任何指导。
输入:
R-001, A1, 10, A2, 20, A3, 30
R-002, X1, 20, Y2, 10
R-003, Z4, 30, Z10, 5, N12, 38
输出:
R-001, A1
R-001, A2
R-001, A3
R-002, X1
R-002, Y2
R-003, Z4
R-003, Z10
R-003, N12
代码:
lines = spark.parallelize([
"R-001, A1, 10, A2, 20, A3, 30",
"R-002, X1, 20, Y2, 10",
"R-003, Z4, 30, Z10, 5, N12, 38"])
【问题讨论】:
标签: python-3.x apache-spark pyspark rdd