【发布时间】:2021-04-12 11:21:06
【问题描述】:
我创建了这个 RDD:
scala> val data=sc.textFile("sparkdata.txt")
然后我尝试返回文件的内容:
scala> data.collect
我将现有数据划分为单个单词:
scala> val splitdata = data.flatMap(line => line.split(" "));
scala> splitdata.persist()
scala> splitdata.collect;
现在,我正在做 map reduce 操作:
scala> val mapdata = splitdata.map(word => (word,1));
scala> mapdata.collect;
scala> val reducedata = mapdata.reduceByKey(_+_);
要得到结果:
scala> reducedata.collect;
当我想显示前 10 行时:
splitdata.groupByKey(identity).count().show(10)
我收到以下错误:
<console>:38: error: value groupByKey is not a member of org.apache.spark.rdd.RDD[String]
splitdata.groupByKey(identity).count().show(10)
^
<console>:38: error: missing argument list for method identity in object Predef
Unapplied methods are only converted to functions when a function type is expected.
You can make this conversion explicit by writing `identity _` or `identity(_)` instead of `identity`.
splitdata.groupByKey(identity).count().show(10)
^
【问题讨论】:
标签: scala apache-spark