【问题标题】:Exception java.util.NoSuchElementException: None.get in Spark Dataset save() operation异常 java.util.NoSuchElementException: None.get in Spark Dataset save() operation
【发布时间】:2021-03-07 21:40:36
【问题描述】:

当我尝试将数据集作为镶木地板保存到 s3 存储时,出现异常“java.util.NoSuchElementException: None.get”:

例外:

java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:787)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:322)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215)
...

Caused by: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker$.metrics(BasicWriteStatsTracker.scala:173)
at org.apache.spark.sql.execution.command.DataWritingCommand$class.metrics(DataWritingCommand.scala:51)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics$lzycompute(InsertIntoHadoopFsRelationCommand.scala:47)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics(InsertIntoHadoopFsRelationCommand.scala:47)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics$lzycompute(commands.scala:100)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics(commands.scala:100)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:56)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)

看起来这是与 SparkContext 相关的问题。 我没有显式创建 SparkContext 的实例,而是仅在我的源代码中使用 SparkSession。

final SparkSession sparkSession = SparkSession
            .builder()
            .appName("Java Spark SQL job")
            .getOrCreate();

ds.write().mode("overwrite").parquet(path);

有什么建议或解决方法吗?谢谢

更新 1:

ds的创建有点复杂,但我将尝试列出主要的调用堆栈如下:

流程一:

    1. session.read().parquet(path) 作为源;
    1. ds.createOrReplaceTempView(view);
    1. sparkSession.sql(sql) as ds1;
    1. sparkSession.sql(sql) as ds2;
    1. ds1.save()
    1. ds2.save()

流程2:

在第 6 步之后,我使用相同的 spark 会话循环回到第 1 步以进行下一个流程。 最后 sparkSession.stop() 在所有处理后被调用。

我可以找到进程1完成后的日志,看起来表明SparkContext在进程2之前已经被销毁:

INFO SparkContext: Successfully stopped SparkContext




  

【问题讨论】:

  • 能否请您说明数据框“ds”是如何创建的?
  • 使用 SparkSession 时,无需创建显式 SparkContext。所以你可以排除这个问题。您的数据框的创建方式可能有问题。
  • 感谢 LizardKing 和 anuj saxena,详情请参阅我的更新 1

标签: apache-spark


【解决方案1】:

只需删除 sparkSession.stop() 即可解决此问题。

【讨论】:

    猜你喜欢
    • 1970-01-01
    • 1970-01-01
    • 2017-07-11
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 1970-01-01
    • 2013-03-02
    相关资源
    最近更新 更多