为什么在reduce的结果中收集失败并显示“错误：值收集不是Int的成员”？

【问题标题】：Why does collect fail with "error: value collect is not a member of Int" on the result of reduce?为什么在reduce的结果中收集失败并显示“错误：值收集不是Int的成员”？
【发布时间】：2017-06-10 03:48:32
【问题描述】：

我正在尝试使用 apache spark/scala 找到单词数量最多的行。我在 spark-shell 中运行程序。

当我使用以下代码时，我得到了正确的输出：

scala> file1.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b)

但是当我尝试使用以下代码收集结果时出现错误：

scala> file1.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b).collect()
<console>:30: error: value collect is not a member of Int
              file1.map(line => line.split(" ").size).reduce((a, b) => if (a > b) a else b).collect()

为什么我在使用collect() 操作时会出错？

【问题讨论】：

标签： scala apache-spark

【解决方案1】：

reduce 是将一系列T 类型的值减少为T 类型的单个值的操作。

reduce(f: (T, T) ⇒ T): T 使用指定的交换和关联二元运算符减少此 RDD 的元素。

在reduce 之后，您将获得最终结果（您也可以通过collected 进行其他转换）。

在您的情况下，分配reduce 的值并检查其类型。是Int。

val result = file1.
  map(line => line.split(" ").size).
  reduce((a, b) => if (a > b) a else b)
// check the type of the value from `reduce`
scala> :type result
Int

reduce 与 collect 非常相似，因为两者都是为您提供价值的操作，但 collect 将为您提供 Array[T]...

collect(): Array[T] 返回一个包含此 RDD 中所有元素的数组。

...而reduce 只是一个值T。

【讨论】：