gaojian

[Spark][Python]sortByKey 例子:

[training@localhost ~]$ hdfs dfs -cat test02.txt
00002 sku010
00001 sku933
00001 sku022
00003 sku888
00004 sku411
00001 sku912
00001 sku331
[training@localhost ~]$


mydata001=sc.textFile("test02.txt")
mydata002=mydata001.map(lambda line: line.split(\' \'))

mydata002.take(3)
Out[4]: [[u\'00002\', u\'sku010\'], [u\'00001\', u\'sku933\'], [u\'00001\', u\'sku022\']]

mydata003=mydata002.sortByKey()

In [9]: mydata003.take(5)

Out[9]:
[[u\'00001\', u\'sku933\'],
[u\'00001\', u\'sku022\'],
[u\'00001\', u\'sku912\'],
[u\'00001\', u\'sku331\'],
[u\'00002\', u\'sku010\']]

In [10]:


API 参考:
https://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.RDD

分类:

技术点:

相关文章:

  • 2018-10-06
  • 2022-12-23
  • 2021-09-11
  • 2021-09-11
  • 2021-04-18
  • 2021-08-21
  • 2021-06-09
  • 2022-12-23
猜你喜欢
  • 2022-12-23
  • 2022-12-23
  • 2021-09-11
  • 2021-09-11
  • 2021-09-11
  • 2022-12-23
  • 2021-06-06
相关资源
相似解决方案