非交互式运行Spark Application 的例子

$ cat Count.py

import sys
from pyspark import SparkContext

if __name__ == "__main__":

sc = SparkContext()
logfile = sys.argv[1]

count = sc.textFile(logfile).filter(lambda line: '.jpg' in line).count()
print "JPG requests: ", count

sc.stop()

 

$

$ spark-submit --master yarn-client Count.py /test/weblogs/*

Number of JPG requests: 10258
$

相关文章: