坑一:
使用idea使用
setMaster("local[4]")
模式运行写好的代码没有问题,一旦使用
setMaster("spark://192.168.160.112:8090")
这种方式就会报classnotfound的错误,如下图,下例中使用了elasticsearch的驱动
[[email protected] estest_jar]# spark-submit --class EStest --master spark://192.168.160.135:7077 --name estest estest.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/12/15 16:06:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/15 16:06:28 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
17/12/15 16:06:31 INFO Version: Elasticsearch Hadoop v6.0.0 [8b59a8f82d]
17/12/15 16:06:31 INFO ScalaEsRDD: Reading from [bank2]
[Stage 0:> (0 + 0) / 5]17/12/15 16:06:31 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.160.135, executor 0): java.lang.ClassNotFoundException: org.elasticsearch.spark.rdd.EsPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
1.在写代码的时候,直接指明使用jar文件的路径
var jars=Seq("/home/javajars/elasticsearch-spark-20_2.11-6.0.0.jar","/home/javajars/postgresql-42.1.4.jar") val conf = new SparkConf().setAppName("estest").setMaster("spark://192.168.160.135:7077").setJars(jars) conf.set("es.nodes", "192.168.160.135") conf.set("es.port", "9200")
2.通过分析spark-worker的log发现,spark工作时只使用sparkhome/jars下的jar文件
log片段如下:红色部分是spark工作时使用的jar文件
17/12/15 00:06:27 INFO ExecutorRunner: Launch command: "/root/Downloads/jdk/bin/java" "-cp" "/spark/conf/:/spark/jars/*" "-Xmx1024M" "-Dspark.driver.port=33208" "org.apache.spark.executor.CoarseGrainedExecutorBackend" "--driver-url" "spark://[email protected]:33208" "--executor-id" "0" "--hostname" "192.168.160.135" "--cores" "2" "--app-id" "app-20171215000627-0012" "--worker-url" "spark://[email protected]:33341"
只要将所需的jar文件拷贝到sparkhome/jars下即可,集群模式我也是在每一个slave的sparkhome/jars下添加了jar文件坑二:
在提交打好的jar文件时报错找不到--class参数指定的类,报错如下:
[[email protected] estest_jar]# spark-submit --class EStest --master spark://192.168.160.135:7077 --name estest estest.jar
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
17/12/14 21:28:45 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/12/14 21:28:46 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect.
17/12/14 21:28:47 INFO Version: Elasticsearch Hadoop v6.0.0 [8b59a8f82d]
17/12/14 21:28:47 INFO ScalaEsRDD: Reading from [bank2]
17/12/14 21:28:49 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 192.168.160.135, executor 0): java.lang.ClassNotFoundException: EStest$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
var jars=Seq("/root/Downloads/jar/elasticsearch-spark-20_2.11-6.0.0.jar","/root/IdeaProjects/EStest/out/artifacts/estest_jar/estest.jar") val conf = new SparkConf().setAppName("estest").setMaster("spark://192.168.160.135:7077") conf.set("es.nodes", "192.168.160.135") conf.set("es.port", "9200")还有一种 估计直接把jar文件打到sparkhome/jars下估计也可行。