【发布时间】:2019-04-30 16:30:46
【问题描述】:
Java 1.8.0_161 (Scala 2.11.12) 上的 Spark 2.4.0
运行命令:spark-shell --jars=spark-avro_2.11-2.4.0.jar
目前正在使用小型 avro 文件处理一些 POC,我希望能够读取(单个)AVRO 文件,进行更改,然后将其写回。
阅读很好:
val myAv = spark.read.format("avro").load("myAvFile.avro")
但是,我在尝试写回时收到此错误(甚至在进行任何更改之前):
scala> myAv.write.format("avro").save("./output-av-file.avro")
org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
;
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:281)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
... 49 elided
我尝试手动指定数据框的架构,但无济于事:
.write.option("avroSchema", c_schema.toString).format("avro") ...
【问题讨论】:
标签: scala apache-spark apache-spark-sql avro