【问题标题】:Running Spark example: ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer运行 Spark 示例:ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer
【发布时间】:2020-08-15 03:04:46
【问题描述】:

我是 spark 新手,尚未编写我的第一个 spark 应用程序,并且仍在研究这是否适合我们的目的。目前只是尝试运行访问kafka的spark附带的示例示例

我尝试使用两种方式运行开箱即用的 kafka 示例,但均未成功并出现相同的错误。

  1. 使用 helm/kubernetes 来自 spark
  2. 来自手动本地构建

我搜索现有帖子,但不太明白为什么开箱即用似乎不起作用。

Spark fails with NoClassDefFoundError for org.apache.kafka.common.serialization.StringDeserializer

Apache Kafka: ...StringDeserializer is not an instance of ...Deserializer

Why does Spark application fail with "Exception in thread "main" java.lang.NoClassDefFoundError: ...StringDeserializer"?

HELM/Kubernetes

Clone https://github.com/bitnami/charts.git bitnami/spark
using
registry: docker.io
  repository: bitnami/spark
  tag: 2.4.5-debian-10-r87
  tag: 2.4.5-debian-10-r94
Got success with ./bin/run-example SparkPi 10
But got error with ./bin/run-example streaming.JavaDirectKafkaWordCount myBroker myConsumerGroup myTopic

    INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer
        at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:78)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer
        at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
        ... 13 more

手动本地构建

Clone https://github.com/apache/spark.git
./build/mvn -DskipTests clean package
[INFO] BUILD SUCCESS

RAN EXAMPLE SUCCESSFULLY
./bin/run-example SparkPi 10
Pi is roughly 3.1424111424111425

RAN KAFKA EXAMPLE WITH ClassNotFoundException
./bin/run-example streaming.JavaDirectKafkaWordCount myBroker myConsumerGroup myTopic

    INFO StreamingExamples: Setting log level to [WARN] for streaming example. To override add a custom log4j.properties to the classpath.
    Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/StringDeserializer
        at org.apache.spark.examples.streaming.JavaDirectKafkaWordCount.main(JavaDirectKafkaWordCount.java:78)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:934)
        at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1013)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1022)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
    Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.StringDeserializer
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 13 more

【问题讨论】:

    标签: java apache-spark apache-kafka spark-streaming


    【解决方案1】:

    我不确定run-example 是否为外部库正确设置了类路径。

    你需要在类路径上kafka-clients(它应该包含在spark-sql-kafka-0-10中,这是Spark默认提供的,所以你必须下载它,并将它添加到Spark 库目录)。

    或者您可以使用spark-submit --packages,正如 Spark 在提交申请时所记录的那样

    【讨论】:

    • 我在 GCP DataProc 上遇到同样的错误 .. 我正在传递 jar 但仍然收到错误,请参考 stackoverflow.com/questions/70951195/… .. 对此有什么想法吗?
    • @Karan 你不明白什么?您的帖子不包含kafka-clients.jar
    猜你喜欢
    • 2017-07-23
    • 2018-01-06
    • 2023-03-24
    • 2014-01-03
    • 1970-01-01
    • 2019-07-01
    • 2016-08-05
    • 1970-01-01
    • 1970-01-01
    相关资源
    最近更新 更多