【发布时间】:2019-04-22 14:35:26
【问题描述】:
我的数据框看起来像:
+-------------------+-------------+
| Nationality| continent|
+-------------------+-------------+
| Turkmenistan| Asia|
| Azerbaijan| Asia|
| Canada|North America|
| Luxembourg| Europe|
| Gambia| Africa|
我的输出应该是这样的:
Map(Gibraltar -> Europe, Haiti -> North America)
所以,我正在尝试将数据框转换为
scala.collection.mutable.Map[String, String]()
我正在尝试使用以下代码:
var encoder = Encoders.product[(String, String)]
val countryToContinent = scala.collection.mutable.Map[String, String]()
var mapped = nationalityDF.mapPartitions((it) => {
....
....
countryToContinent.toIterator
})(encoder).toDF("Nationality", "continent").as[(String, String)](encoder)
val map = mapped.rdd.groupByKey.collect.toMap
但结果映射有以下输出:
Map(Gibraltar -> CompactBuffer(Europe), Haiti -> CompactBuffer(North America))
如何在没有 CompactBuffer 的情况下获得 hash-map 结果?
【问题讨论】:
标签: scala apache-spark apache-spark-sql