import org.apache.spark.SparkContext

import org.apache.spark.SparkContext._

object WordCount {

def main(args: Array[String]): Unit = {

val inputPath="file:///test/kmeans_data.txt"

val outputPath="file:///test/result"

val sc = new SparkContext()

val texts = sc.textFile(inputPath)

println(sc.master) //查看是local模式还是yarn模式

val wordCounts = texts.flatMap{a => a.split(" ")}

.map(word => (word,1))

.reduceByKey(_+_)

wordCounts.saveAsTextFile(outputPath) //保存

}

}

 

使用idea或sbt打jar包,然后spark-submit:

local模式:

[[email protected] ~]# spark-submit --class WordCount --master local file:///export/spark_jar/wordcount.jar

19/04/19 17:36:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

local

[[email protected] ~]#

结果:

运行spark——5. 实例:wordcount

yarn模式:

路径改为hdfs路径

val inputPath="hdfs://master:9000/test/kmeans_data.txt"

val outputPath="hdfs://master:9000/test/result"

[[email protected] ~]# spark-submit --class WordCount --master yarn-client file:///export/spark_jar/wordcount.jar

Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.

19/04/19 18:13:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

19/04/19 18:13:35 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.

yarn

[[email protected] ~]#

结果:

[[email protected] ~]# hadoop fs -ls /test/result

Found 3 items

-rw-r--r--       1 root supergroup       0 2019-04-19 18:14      /test/result/_SUCCESS

-rw-r--r--       1 root supergroup       24 2019-04-19 18:14     /test/result/part-00000

-rw-r--r--       1 root supergroup       24 2019-04-19 18:14     /test/result/part-00001

[[email protected] ~]#

 

 

 

yan模式遇到报错:

ERROR YarnClientSchedulerBackend:

YARN application has exited unexpectedly with state FAILED!

Check the YARN application

思路:yarn失败的错误,yarn出错大多数是因为内存不够用

解决:

修改yarn-site.xml,加上

<property>    

<name>yarn.nodemanager.pmem-check-enabled</name>  

  <value>false</value> </property> <property>  

  <name>yarn.nodemanager.vmem-check-enabled</name>  

  <value>false</value>

</property>

yarn.nodemanager.pmem-check-enabled

是否启动一个线程检查每个任务正使用的物理内存量,

如果任务超出分配值,则直接将其杀掉,默认是true。

yarn.nodemanager.vmem-check-enabled

是否启动一个线程检查每个任务正使用的虚拟内存量,

如果任务超出分配值,则直接将其杀掉,默认是true。

 

相关文章: