【问题标题】:Spark Submit error when running a JAR from Azure Databricks从 Azure Databricks 运行 JAR 时出现 Spark 提交错误
【发布时间】:2020-11-04 11:01:20
【问题描述】:

我正在尝试从 Azure Databricks 作业调度程序发出 spark 提交,目前遇到以下错误。错误说:文件文件:/tmp/spark-events 不存在。我需要一些指针来了解我们需要在 Azure blob 位置(这是我的存储层)还是在 Azure DBFS 位置创建这个目录。

根据下面的链接,从 Azure Databricks 作业调度程序运行 spark-submit 时在哪里创建目录不是很清楚。

SparkContext Error - File not found /tmp/spark-events does not exist

错误:

OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=512m; support was removed in 8.0
Warning: Ignoring non-Spark config property: eventLog.rolloverIntervalSeconds
Exception in thread "main" java.lang.ExceptionInInitializerError
    at com.dta.dl.ct.qm.hbase.reverse.pipeline.HBaseVehicleMasterLoad.main(HBaseVehicleMasterLoad.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
    at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
    at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
    at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
    at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
    at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File file:/tmp/spark-events does not exist
    at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:611)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:824)
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:601)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:421)
    at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:97)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:580)
    at com.dta.dl.ct.qm.hbase.reverse.pipeline.HBaseVehicleMasterLoad$.<init>(HBaseVehicleMasterLoad.scala:32)
    at com.dta.dl.ct.qm.hbase.reverse.pipeline.HBaseVehicleMasterLoad$.<clinit>(HBaseVehicleMasterLoad.scala)
    ... 13 more

【问题讨论】:

    标签: apache-spark azure-databricks


    【解决方案1】:

    您需要在驱动节点和工作节点上创建此文件夹。为此,一种方法是在全局初始化脚本中添加属性spark.history.fs.logDirectory(存在于 spark-defaults.conf 文件中),如here 所述。请确保在该属性上定义的文件夹存在并且可以从驱动程序节点访问

    【讨论】:

      猜你喜欢
      • 2020-01-13
      • 2017-08-26
      • 2017-12-25
      • 1970-01-01
      • 1970-01-01
      • 2021-10-22
      • 1970-01-01
      • 1970-01-01
      • 2020-11-16
      相关资源
      最近更新 更多