【问题标题】:Uber jar not found in Kubernetes via spark-submit通过 spark-submit 在 Kubernetes 中找不到 Uber jar
【发布时间】:2020-02-01 22:54:13
【问题描述】:

我有一个非常简单的 Spark 工作,但我无法让它在 Kubernetes 中工作。我得到的错误是:

>     19/10/03 14:59:51 WARN DependencyUtils: Local jar /opt/spark/work-dir/target/scala-2.11/ScalaTest-assembly-1.0.jar does
> not exist, skipping.
>     19/10/03 14:59:51 WARN SparkSubmit$$anon$2: Failed to load ScalaTest.
>     java.lang.ClassNotFoundException: ScalaTest
>       at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>       at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>       at java.lang.Class.forName0(Native Method)
>       at java.lang.Class.forName(Class.java:348)
>       at org.apache.spark.util.Utils$.classForName(Utils.scala:238)
>       at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:806)
>       at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
>       at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
>       at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
>       at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
>       at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
>       at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

项目结构:

project/build.properties
project/plugins.sbt
src/main/scala/ScalaTest.scala
Dockerfile
build.sbt

build.properties

sbt.version=1.2.8

plugins.sbt

addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.14.6")
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.10.0-RC1")

ScalaTest.scala

import org.apache.spark.sql.SparkSession
import org.apache.spark.SparkContext

object ScalaTest {
  def main(args: Array[String]) {
    val spark = SparkSession.builder.appName("ScalaTest").config("spark.master", "local[*]").getOrCreate()

    import spark.implicits._

    println("hello")

  }
}

Dockerfile 这只是一个基于 Spark 二进制文件中的 kubernetes 文件夹构建的包装器映像。在构建这个图像之前,我确保我运行了 sbt assembly,它会生成 Uber jar。

FROM spark:latest

WORKDIR /opt/spark/work-dir

COPY target/scala-2.11/ScalaTest-assembly-1.0.jar target/scala-2.11/ScalaTest-assembly-1.0.jar

build.sbt

name := "ScalaTest"

version := "1.0"

scalaVersion := "2.11.12"

val sparkVersion = "2.4.4"

libraryDependencies ++= Seq(
    "org.apache.spark" % "spark-core_2.11" % sparkVersion % "provided",
    "org.apache.spark" % "spark-sql_2.11" % sparkVersion % "provided"
)

最后是我的spark-submit。在执行此操作之前,我将映像推送到 ECR 的注册表,以便 EKS 可以提取该映像。我还指出了 uber jar 我的图像中的位置。

~/spark-2.4.4-bin-hadoop2.7/bin/spark-submit \
    --master k8s://{K8S_ENDPOINT}:443 \
    --deploy-mode cluster \
    --name test-job \
    --conf spark.kubernetes.container.image={ECR_IMAGE}:latest \
    --conf spark.kubernetes.submission.waitAppCompletion=false \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.driver.pod.name=test-job \
    --class ScalaTest \
    local:///opt/spark/work-dir/target/scala-2.11/ScalaTest-assembly-1.0.jar

另请注意,当我运行以下命令时(本地容器中的 spark-submit),它按预期工作:

docker run --rm -it my-custom-image ../bin/spark-submit target/scala-2.11/ScalaTest-assembly-1.0.jar

更新 检查组装好的 uber jar,我可以看到 ScalaTest 的类在那里。

jar tf target/scala-2.11/ScalaTest-assembly-1.0.jar

...
ScalaTest$.class
ScalaTest.class
...

【问题讨论】:

    标签: apache-spark kubernetes sbt sbt-assembly spark-submit


    【解决方案1】:

    解决方案是将 jar 不在工作目录中,而是在 jars 文件夹中。我没有查看文档,但可能这是一个可以更改的环境变量。无论如何,Dockerfile 应该是这样的:

    FROM spark:latest
    
    COPY target/scala-2.11/ScalaTest-assembly-1.0.jar /ops/spark/jars/ScalaTest-assembly-1.0.jar
    

    然后相应地更改spark-submit

    ~/spark-2.4.4-bin-hadoop2.7/bin/spark-submit \
        --master k8s://{K8S_ENDPOINT}:443 \
        --deploy-mode cluster \
        --name test-job \
        --conf spark.kubernetes.container.image={ECR_IMAGE}:latest \
        --conf spark.kubernetes.submission.waitAppCompletion=false \
        --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
        --conf spark.kubernetes.driver.pod.name=test-job \
        --class ScalaTest \
        local:///opt/spark/jars/ScalaTest-assembly-1.0.jar
    

    【讨论】:

      猜你喜欢
      • 1970-01-01
      • 2017-08-26
      • 2021-08-20
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2022-11-20
      • 2015-06-19
      相关资源
      最近更新 更多