【问题标题】:Running Scala Jar with Spark-Submit使用 Spark-Submit 运行 Scala Jar
【发布时间】:2020-04-24 23:59:01
【问题描述】:

我已将 spark-scala 脚本编译为 JAR,我想使用 spark-submit 运行它。但是我遇到了这个错误:

2020-01-07 13:03:02,190 WARN util.Utils: Your hostname, nifi resolves to a loopback address: 127.0.1.1; using 10.0.2.15 instead (on interface enp0s3)
2020-01-07 13:03:02,192 WARN util.Utils: Set SPARK_LOCAL_IP if you need to bind to another address
2020-01-07 13:03:03,109 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2020-01-07 13:03:03,826 WARN deploy.SparkSubmit$$anon$2: Failed to load hello.
java.lang.ClassNotFoundException: hello
    at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:348)
    at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-01-07 13:03:03,857 INFO util.ShutdownHookManager: Shutdown hook called
2020-01-07 13:03:03,858 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-a8cc1ba6-3643-4646-82a3-4b44f4487105

这是我的代码:

import org.apache.spark.sql.SparkSession
import org.apache.spark.{SparkConf, SparkContext}

object hello {
  def main(args: Array[String]): Unit = {

    val conf = new SparkConf()
      .setMaster("local")
      .setAppName("quest9")

    val sc = new SparkContext(conf)
    val spark = SparkSession.builder().appName("quest9").master("local").getOrCreate()
    import spark.implicits._

    val zip_codes = spark.read.format("csv").option("header", "true").load("/home/hdfs/Documents/quest_9/doc/zip.csv")
    val census = spark.read.format("csv").option("header", "true").load("/home/hdfs/Documents/quest_9/doc/census.csv")

    census.createOrReplaceTempView("census")
    zip_codes.createOrReplaceTempView("zip")


    val query = census.as("census").join((zip_codes.where($"City" === "Inglewood").where($"County" === "Los Angeles").as("zip")),Seq("Zip_Code"),"inner").select($"census.Total_Males".as("male"),$"census.Total_Females".as("female")).distinct()
    query.show()
    val queryR = query.repartition(5)
    queryR.write.parquet("/home/hdfs/Documents/population/census/IDE/census.parquet")

    sc.stop()
  }
}

我认为我的问题是我使用 scala 对象而不是类,但我不确定。

我像这样运行 spark-submit

spark-submit \
--class hello \
/home/hdfs/IdeaProjects/untitled/out/artifacts/quest_jar/quest.jar

以前有人解决过这个错误吗?

【问题讨论】:

    标签: scala apache-spark jar spark-submit


    【解决方案1】:

    我认为您需要为 spark-submit 和您的对象指定一个包名称。

    例如:

    spark-submit \
    --class com.my.package.hello \
    /home/hdfs/IdeaProjects/untitled/out/artifacts/quest_jar/quest.jar
    

    package com.my.package
    
    import org.apache.spark.sql.SparkSession
    import org.apache.spark.{SparkConf, SparkContext}
    
    object hello {
        ...
    }
    

    【讨论】:

    • 我做了,我创建了一个虚拟包只是为了尝试一下,并称之为“测试”,它仍然说它找不到它。使用对象和类有区别吗?
    • 一个对象是一个单例,而你的 Main 是一个单例。没关系。
    • 你能运行类似jar tvf /home/hdfs/IdeaProjects/untitled/out/artifacts/quest_jar/quest.jar 的东西吗?它会列出你的 jar 内容
    • 我已经直接从 IDE 运行了代码,并且没有问题
    • 毫无疑问。但是现在我们必须弄清楚为什么提交你的 jar 会失败
    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2019-03-01
    • 2016-09-04
    • 1970-01-01
    • 2017-04-30
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多