【发布时间】:2021-03-07 21:40:36
【问题描述】:
当我尝试将数据集作为镶木地板保存到 s3 存储时,出现异常“java.util.NoSuchElementException: None.get”:
例外:
java.lang.IllegalStateException: Failed to execute CommandLineRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:787)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:768)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:322)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215)
...
Caused by: java.util.NoSuchElementException: None.get
at scala.None$.get(Option.scala:347)
at scala.None$.get(Option.scala:345)
at org.apache.spark.sql.execution.datasources.BasicWriteJobStatsTracker$.metrics(BasicWriteStatsTracker.scala:173)
at org.apache.spark.sql.execution.command.DataWritingCommand$class.metrics(DataWritingCommand.scala:51)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics$lzycompute(InsertIntoHadoopFsRelationCommand.scala:47)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.metrics(InsertIntoHadoopFsRelationCommand.scala:47)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics$lzycompute(commands.scala:100)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.metrics(commands.scala:100)
at org.apache.spark.sql.execution.SparkPlanInfo$.fromSparkPlan(SparkPlanInfo.scala:56)
at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:76)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:229)
at org.apache.spark.sql.DataFrameWriter.parquet(DataFrameWriter.scala:566)
看起来这是与 SparkContext 相关的问题。 我没有显式创建 SparkContext 的实例,而是仅在我的源代码中使用 SparkSession。
final SparkSession sparkSession = SparkSession
.builder()
.appName("Java Spark SQL job")
.getOrCreate();
ds.write().mode("overwrite").parquet(path);
有什么建议或解决方法吗?谢谢
更新 1:
ds的创建有点复杂,但我将尝试列出主要的调用堆栈如下:
流程一:
-
- session.read().parquet(path) 作为源;
-
- ds.createOrReplaceTempView(view);
-
- sparkSession.sql(sql) as ds1;
-
- sparkSession.sql(sql) as ds2;
-
- ds1.save()
-
- ds2.save()
流程2:
在第 6 步之后,我使用相同的 spark 会话循环回到第 1 步以进行下一个流程。 最后 sparkSession.stop() 在所有处理后被调用。
我可以找到进程1完成后的日志,看起来表明SparkContext在进程2之前已经被销毁:
INFO SparkContext: Successfully stopped SparkContext
【问题讨论】:
-
能否请您说明数据框“ds”是如何创建的?
-
使用 SparkSession 时,无需创建显式 SparkContext。所以你可以排除这个问题。您的数据框的创建方式可能有问题。
-
感谢 LizardKing 和 anuj saxena,详情请参阅我的更新 1
标签: apache-spark