【问题标题】:Can't setup spark application with spark-atlas-connector无法使用 spark-atlas-connector 设置 spark 应用程序
【发布时间】:2019-08-09 13:50:45
【问题描述】:

无法通过 spark-atlas-connector 使用 apache atlas 设置我的 spark 应用程序。

我已经克隆了 https://github.com/hortonworks-spark/spark-atlas-connector 项目并执行了 mvn 包。然后我把所有的罐子放在我的项目中,并像这样设置配置:

def main(args: Array[String]): Unit = {

    val sparkConf = new SparkConf()
      .setAppName("atlas-test")
      .setMaster("local[2]")
      .set("spark.extraListeners", "com.hortonworks.spark.atlas.SparkAtlasEventTracker")
      .set("spark.sql.queryExecutionListeners", "com.hortonworks.spark.atlas.SparkAtlasEventTracker")
      .set("spark.sql.streaming.streamingQueryListeners", "com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker")

    val spark = SparkSession.builder()
      .config(sparkConf)
      .enableHiveSupport()
      .getOrCreate()

    import spark.implicits._


    val df = spark.read.format("kafka")
      .option("kafka.bootstrap.servers", BROKER_SERVERS)
      .option("subscribe", "foobar")
      .option("startingOffset", "earliest")
      .load()

    df.show()

    df.write
      .format("kafka")
      .option("kafka.bootstrap.servers", BROKER_SERVERS)
      .option("topic", "foobar-out")
      .save()

  }

Atlas 是通过我拉的 docker 容器启动的。 Kafka with Zookeper 是通过我拉的 docker 容器盯着看的。

这项工作在没有 spark-atlas-connector 的情况下工作,但是当我想添加一个连接器时,它会引发异常。

19/08/09 16:40:16 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Exception when registering SparkListener
    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
    at Boot$.main(Boot.scala:21)
    at Boot.main(Boot.scala)
Caused by: org.apache.atlas.AtlasException: Failed to load application properties
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:134)
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:86)
    at com.hortonworks.spark.atlas.AtlasClientConf.configuration$lzycompute(AtlasClientConf.scala:25)
    at com.hortonworks.spark.atlas.AtlasClientConf.configuration(AtlasClientConf.scala:25)
    at com.hortonworks.spark.atlas.AtlasClientConf.get(AtlasClientConf.scala:50)
    at com.hortonworks.spark.atlas.AtlasClient$.atlasClient(AtlasClient.scala:120)
    at com.hortonworks.spark.atlas.SparkAtlasEventTracker.<init>(SparkAtlasEventTracker.scala:33)
    at com.hortonworks.spark.atlas.SparkAtlasEventTracker.<init>(SparkAtlasEventTracker.scala:37)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2691)
    at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
    at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
    at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)
    ... 8 more
Caused by: com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:259)
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:238)
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.AbstractFileConfiguration.<init>(AbstractFileConfiguration.java:197)
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.PropertiesConfiguration.<init>(PropertiesConfiguration.java:284)
    at org.apache.atlas.ApplicationProperties.<init>(ApplicationProperties.java:69)
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:125)
    ... 32 more
19/08/09 16:40:16 INFO SparkContext: SparkContext already stopped.
Exception in thread "main" org.apache.spark.SparkException: Exception when registering SparkListener
    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2398)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:555)
    at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2520)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:935)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$7.apply(SparkSession.scala:926)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:926)
    at Boot$.main(Boot.scala:21)
    at Boot.main(Boot.scala)
Caused by: org.apache.atlas.AtlasException: Failed to load application properties
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:134)
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:86)
    at com.hortonworks.spark.atlas.AtlasClientConf.configuration$lzycompute(AtlasClientConf.scala:25)
    at com.hortonworks.spark.atlas.AtlasClientConf.configuration(AtlasClientConf.scala:25)
    at com.hortonworks.spark.atlas.AtlasClientConf.get(AtlasClientConf.scala:50)
    at com.hortonworks.spark.atlas.AtlasClient$.atlasClient(AtlasClient.scala:120)
    at com.hortonworks.spark.atlas.SparkAtlasEventTracker.<init>(SparkAtlasEventTracker.scala:33)
    at com.hortonworks.spark.atlas.SparkAtlasEventTracker.<init>(SparkAtlasEventTracker.scala:37)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2691)
    at org.apache.spark.util.Utils$$anonfun$loadExtensions$1.apply(Utils.scala:2680)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.AbstractTraversable.flatMap(Traversable.scala:104)
    at org.apache.spark.util.Utils$.loadExtensions(Utils.scala:2680)
    at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2387)
    at org.apache.spark.SparkContext$$anonfun$setupAndStartListenerBus$1.apply(SparkContext.scala:2386)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.SparkContext.setupAndStartListenerBus(SparkContext.scala:2386)
    ... 8 more
Caused by: com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:259)
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.AbstractFileConfiguration.load(AbstractFileConfiguration.java:238)
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.AbstractFileConfiguration.<init>(AbstractFileConfiguration.java:197)
    at com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.PropertiesConfiguration.<init>(PropertiesConfiguration.java:284)
    at org.apache.atlas.ApplicationProperties.<init>(ApplicationProperties.java:69)
    at org.apache.atlas.ApplicationProperties.get(ApplicationProperties.java:125)
    ... 32 more
19/08/09 16:40:17 INFO ShutdownHookManager: Shutdown hook called

【问题讨论】:

    标签: apache-spark apache-atlas


    【解决方案1】:

    System.setProperty("atlas.conf", "") 是 OP 指出的正确解决方案。 SAC 使用 ApplicationProperties.java。

    具体来说,它使用 ApplicationProperties.get。 源代码在这里: https://github.com/apache/atlas/blob/master/intg/src/main/java/org/apache/atlas/ApplicationProperties.java#L118

    您可以看到变量 ATLAS_CONFIGURATION_DIRECTORY_PROPERTY 设置为“atlas.conf”: https://github.com/apache/atlas/blob/master/intg/src/main/java/org/apache/atlas/ApplicationProperties.java#L43

    【讨论】:

      【解决方案2】:

      我相信您已经忘记了设置文档中的另一个步骤。您遇到的错误源于

      Caused by: com.hortonworks.spark.atlas.shade.org.apache.commons.configuration.ConfigurationException: Cannot locate configuration source null
      

      并在您发布的 github 存储库中引用他们的 README 文件:

      还要确保 atlas 配置文件 atlas-application.properties 在驱动程序的类路径中。例如,将此文件放入&lt;SPARK_HOME&gt;/conf

      【讨论】:

      • 我在 Intelijj idea 工作,并在那里开始工作(没有控制台)。我在里面创建了带有文件夹 conf 的环境 SPARK_HOME 并放在那里 atlas-application.properties 但无论如何都有问题
      • 天啊! atlas-application.properties 路径应该在属性中。我的意思是在你的工作中它应该看起来像这样 System.setProperty("atlas.conf", "")。
      【解决方案3】:

      请参阅官方 spark-atlas-connector github 页面。 atlas-application.properties 文件应该是可访问的。

      还要确保 atlas 配置文件 atlas-application.properties 位于驱动程序的类路径中。例如,将此文件放入 /conf。 如果您使用集群模式,请同时使用 --files atlas-application.properties 将此 conf 文件发送到远程驱动器。

      【讨论】:

        【解决方案4】:

        以下应该可以解决问题。请注意 --files--driver-class-path 选项是将此配置文件放置在 CLASSPATH 上所必需的,因此可用于 Atlas Client 类。

        此外,spark-shell 使用相对于 Spark Atlas 连接器的路径,因此请进行相应更改。

        $SPARK_HOME/bin/spark-shell \
          --jars spark-atlas-connector-assembly/target/spark-atlas-connector-assembly-0.1.0-SNAPSHOT.jar \
          --conf spark.extraListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker \
          --conf spark.sql.queryExecutionListeners=com.hortonworks.spark.atlas.SparkAtlasEventTracker \
          --conf spark.sql.streaming.streamingQueryListeners=com.hortonworks.spark.atlas.SparkAtlasStreamingQueryEventTracker \
          --files spark-atlas-connector/src/test/resources/atlas-application.properties \
          --driver-class-path spark-atlas-connector/src/test/resources
        

        【讨论】:

          猜你喜欢
          • 1970-01-01
          • 1970-01-01
          • 2020-06-12
          • 2017-03-27
          • 2018-05-12
          • 1970-01-01
          • 2017-08-06
          • 2019-10-28
          • 1970-01-01
          相关资源
          最近更新 更多