对每个 RDD 元素应用匹配函数答案

【问题标题】：Applying matching function to each RDD element对每个 RDD 元素应用匹配函数
【发布时间】：2018-01-20 17:59:08
【问题描述】：

我有这个：

(0,List(pablo, luca))
(1,List(marco))
(3,List(anna))
(2,List(fobi))

我想用相应的字符串（如（“零”、“一”、“二”、“树”）代替每一个 Int (0, 1, 2, 3)：

(zero,List(pablo, luca))
(uno,List(marco))
(tree,List(anna))
(due,List(fobi))

所以为了这个目标，我正在使用这个：

finalCommunitiesDetectedRdd: RDD[(Int, Seq[String])] = ...

def getNameOfBin(id: Int): String = id match {
    case 0  => "Low SA Users:"
    case 1  => "Medium-Low SA Users:"
    case 2  => "Medium-High SA Users:"
    case 3  => "High SA Users:"
    case other => "nothing" // what to do if nothing else matches
}

var finalCommunitiesDetectedWithNamesRdd: RDD[(String, Seq[String])] = finalCommunitiesDetectedRdd.map{ case (id, Seq(username)) => (getNameOfBin(id), Seq(username)) }

finalCommunitiesDetectedWithNamesRdd.foreach(println) // check

但我得到：

18/01/20 10:38:32 错误执行程序：阶段 49.0 (TID 26) 中任务 0.0 中的异常 scala.MatchError: (0,List(pablo, luca)) (属于 scala.Tuple2 类)

为什么？

【问题讨论】：

标签： scala apache-spark

【解决方案1】：

Seq(username) 只会匹配只有一个元素的序列。如果您不关心元组的第二个元素，只需像这样匹配：

case (id, seq) => (getNameOfBin(id), seq)

【讨论】：