reduceByKey

函数原型:

def reduceByKey(func: (V, V) => V): RDD[(K, V)]

def reduceByKey(func: (V, V) => V, numPartitions: Int): RDD[(K, V)]

def reduceByKey(partitioner: Partitioner, func: (V, V) => V): RDD[(K, V)]

作用:

按照func的映射关系,将两个V型的值映射到相同类型的V值上去。

 

例子:

scala> var rdd1 = sc.makeRDD(Array(("A",0),("A",2),("B",1),("B",2),("C",1)))
rdd1: org.apache.spark.rdd.RDD[(String, Int)] = ParallelCollectionRDD[0] at makeRDD at <console>:27

scala> rdd1.partitions.size
res0: Int = 48

scala> var rdd2 = rdd1.reduceByKey((x,y) => x + y)
rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[1] at reduceByKey at <console>:29

scala> rdd2.collect
res1: Array[(String, Int)] = Array((A,2), (B,3), (C,1))

scala> rdd2.partitions.size
res2: Int = 48

scala> var rdd2 = rdd1.reduceByKey(new org.apache.spark.HashPartitioner(2),(x,y) => x + y)
rdd2: org.apache.spark.rdd.RDD[(String, Int)] = ShuffledRDD[2] at reduceByKey at <console>:29

scala> rdd2.collect
res3: Array[(String, Int)] = Array((B,3), (A,2), (C,1))

scala> rdd2.partitions.size
res4: Int = 2

 

相关文章:

  • 2022-12-23
  • 2021-07-13
  • 2021-10-30
  • 2022-12-23
  • 2022-01-19
  • 2021-11-02
  • 2021-11-06
  • 2022-12-23
猜你喜欢
  • 2021-12-13
  • 2021-12-02
  • 2022-01-08
  • 2022-03-01
  • 2022-12-23
  • 2021-09-10
  • 2021-06-25
相关资源
相似解决方案