import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
object WordCount {
def main(args: Array[String]): Unit = {
val inputPath="file:///test/kmeans_data.txt"
val outputPath="file:///test/result"
val sc = new SparkContext()
val texts = sc.textFile(inputPath)
println(sc.master) //查看是local模式还是yarn模式
val wordCounts = texts.flatMap{a => a.split(" ")}
.map(word => (word,1))
.reduceByKey(_+_)
wordCounts.saveAsTextFile(outputPath) //保存
}
}
使用idea或sbt打jar包,然后spark-submit:
local模式:
[[email protected] ~]# spark-submit --class WordCount --master local file:///export/spark_jar/wordcount.jar
19/04/19 17:36:56 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
local
[[email protected] ~]#
结果:
yarn模式:
路径改为hdfs路径
val inputPath="hdfs://master:9000/test/kmeans_data.txt"
val outputPath="hdfs://master:9000/test/result"
[[email protected] ~]# spark-submit --class WordCount --master yarn-client file:///export/spark_jar/wordcount.jar
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
19/04/19 18:13:32 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/04/19 18:13:35 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
yarn
[[email protected] ~]#
结果:
[[email protected] ~]# hadoop fs -ls /test/result
Found 3 items
-rw-r--r-- 1 root supergroup 0 2019-04-19 18:14 /test/result/_SUCCESS
-rw-r--r-- 1 root supergroup 24 2019-04-19 18:14 /test/result/part-00000
-rw-r--r-- 1 root supergroup 24 2019-04-19 18:14 /test/result/part-00001
[[email protected] ~]#
yan模式遇到报错:
ERROR YarnClientSchedulerBackend:
YARN application has exited unexpectedly with state FAILED!
Check the YARN application
思路:yarn失败的错误,yarn出错大多数是因为内存不够用
解决:
修改yarn-site.xml,加上
<property>
<name>yarn.nodemanager.pmem-check-enabled</name>
<value>false</value> </property> <property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
yarn.nodemanager.pmem-check-enabled
是否启动一个线程检查每个任务正使用的物理内存量,
如果任务超出分配值,则直接将其杀掉,默认是true。
yarn.nodemanager.vmem-check-enabled
是否启动一个线程检查每个任务正使用的虚拟内存量,
如果任务超出分配值,则直接将其杀掉,默认是true。