SoftwareBuilding
#原理很简单:先是通过flatMap函数,把rdd进行扁平化操作,再用map函数得到(k,1)的样式,然后再用groupByKey函数,合并value值,就相当于对key进行去重操作,再用keys()函数,取出key
 
实验数据:delcp.txt
    hello
    hello
    world
    world
    h
    h
    h
    g
    g
    g


from pyspark import SparkContext

sc = SparkContext(\'local\',\'delcp\')

rdd = sc.textFile("file:///usr/local/spark/mycode/TestPackage/delcp.txt")
delp = rdd.flatMap(lambda line : line.split(" ")
).map(lambda a : (a,1)).groupByKey().keys()

delp.foreach(print)

分类:

技术点:

相关文章:

  • 2021-12-18
  • 2022-12-23
  • 2021-09-10
  • 2022-12-23
  • 2021-07-17
  • 2022-12-23
  • 2022-01-08
  • 2022-12-23
猜你喜欢
  • 2022-03-07
  • 2022-12-23
  • 2021-08-05
  • 2022-12-23
  • 2021-04-07
  • 2021-09-07
  • 2022-12-23
相关资源
相似解决方案