【问题标题】:Spark unable to download kafka librarySpark无法下载kafka库
【发布时间】:2019-02-02 01:19:26
【问题描述】:

我在 Kafka 中使用 Python 3.5 和 Spark 2.2 Streaming,由于缺少 kafka 库,脚本无法运行。

我很困惑为什么即使依赖信息来自 Spark 的网站本身,该库仍然丢失/找不到。

groupId = org.apache.spark
artifactId = spark-streaming-kafka-0-10_2.11
version = 2.2.0

我运行了“spark-submit script.py”,错误显示需要 kafka 库。

Spark Streaming's Kafka libraries not found in class path. Try one of the following.

  1. Include the Kafka library and its dependencies with in the
     spark-submit command as

     $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-8:2.2.0 ...

  2. Download the JAR of the artifact from Maven Central http://search.maven.org/,
     Group Id = org.apache.spark, Artifact Id = spark-streaming-kafka-0-8-assembly, Version = 2.2.0.
     Then, include the jar in the spark-submit command as

     $ bin/spark-submit --jars <spark-streaming-kafka-0-8-assembly.jar> ...

在下一次运行中,我运行了“spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10:2.2.0 script.py”并下载了 kafka 库。

这一次错误显示它无法找到/下载库。

Ivy Default Cache set to: C:\Users\james\.ivy2\cache
The jars for the packages stored in: C:\Users\james\.ivy2\jars
:: loading settings :: url = jar:file:/D:/programs/spark-2.2.0/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.spark#spark-streaming-kafka-0-10 added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
        confs: [default]
:: resolution report :: resolve 2908ms :: artifacts dl 0ms
        :: modules in use:
        ---------------------------------------------------------------------
        |                  |            modules            ||   artifacts   |
        |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
        ---------------------------------------------------------------------
        |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
        ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
                module not found: org.apache.spark#spark-streaming-kafka-0-10;2.2.0

        ==== local-m2-cache: tried

          file:/C:/Users/james/.m2/repository/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          file:/C:/Users/james/.m2/repository/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar

        ==== local-ivy-cache: tried

          C:\Users\james\.ivy2\local\org.apache.spark\spark-streaming-kafka-0-10\2.2.0\ivys\ivy.xml

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          C:\Users\james\.ivy2\local\org.apache.spark\spark-streaming-kafka-0-10\2.2.0\jars\spark-streaming-kafka-0-10.jar

        ==== central: tried

          https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          https://repo1.maven.org/maven2/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar

        ==== spark-packages: tried

          http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.pom

          -- artifact org.apache.spark#spark-streaming-kafka-0-10;2.2.0!spark-streaming-kafka-0-10.jar:

          http://dl.bintray.com/spark-packages/maven/org/apache/spark/spark-streaming-kafka-0-10/2.2.0/spark-streaming-kafka-0-10-2.2.0.jar

                ::::::::::::::::::::::::::::::::::::::::::::::

                ::          UNRESOLVED DEPENDENCIES         ::

                ::::::::::::::::::::::::::::::::::::::::::::::

                :: org.apache.spark#spark-streaming-kafka-0-10;2.2.0: not found

                ::::::::::::::::::::::::::::::::::::::::::::::



:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.spark#spark-streaming-kafka-0-10;2.2.0: not found]
        at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1177)
        at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:298)
        at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:153)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

【问题讨论】:

    标签: apache-spark apache-kafka


    【解决方案1】:

    首先:作为discussed on Developers Mailing list,Kafka 不包含在二进制分发中。这就是为什么你在类路径中没有它。

    第二:在您的--packages 命令中,您应该指定Scala 版本。不仅在 SBT 中需要,spark-submit 在后台使用 Ivy。

    所以,请尝试:

      $ bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.11:2.2.0 script.py
    

    额外点:也许我会创建一个 PR 来更改描述,这会误导

    【讨论】:

    • 我什至添加了scala版本,仍然找不到库。我的命令:/path/to/bin/spark-submit --packages org.apache.spark:spark-streaming-kafka-0-10_2.12:2.4.0 script.py。我得到以下信息:
    • ------------------------------------------ --------------------------- | |模块 ||文物 | |会议 |号码|搜索|dwnlded|驱逐||编号|已下载| -------------------------------------------------- ------------------- |默认 | 6 | 0 | 0 | 0 || 6 | 0 | -------------------------------------------------- -------------------
    【解决方案2】:

    试着写

    bin/spark-submit --jars yourjarfile.jar --packages org.apache.spark:spark-streaming-kafka-0-8-assembly_2.11:2.4.3 pythoncode.py
    

    我遇到了同样的问题,我解决了这个问题。我希望这会有所帮助。

    【讨论】:

      猜你喜欢
      • 2020-02-19
      • 2020-05-23
      • 1970-01-01
      • 2018-10-26
      • 2021-09-01
      • 2019-11-04
      • 2015-11-05
      • 1970-01-01
      • 2015-09-25
      相关资源
      最近更新 更多