在 Spark 2.4 中从 spark-shell 编写 AVRO答案

【问题标题】：Write AVRO from spark-shell in Spark 2.4在 Spark 2.4 中从 spark-shell 编写 AVRO
【发布时间】：2019-04-30 16:30:46
【问题描述】：

Java 1.8.0_161 (Scala 2.11.12) 上的 Spark 2.4.0

运行命令：spark-shell --jars=spark-avro_2.11-2.4.0.jar

目前正在使用小型 avro 文件处理一些 POC，我希望能够读取（单个）AVRO 文件，进行更改，然后将其写回。

阅读很好： val myAv = spark.read.format("avro").load("myAvFile.avro")

但是，我在尝试写回时收到此错误（甚至在进行任何更改之前）：

scala> myAv.write.format("avro").save("./output-av-file.avro")

org.apache.spark.sql.AnalysisException:
Datasource does not support writing empty or nested empty schemas.
Please make sure the data schema has at least one or more column(s).
         ;
  at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$validateSchema(DataSource.scala:733)
  at org.apache.spark.sql.execution.datasources.DataSource.planForWriting(DataSource.scala:523)
  at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:281)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:270)
  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:228)
  ... 49 elided

我尝试手动指定数据框的架构，但无济于事： .write.option("avroSchema", c_schema.toString).format("avro") ...

【问题讨论】：

标签： scala apache-spark apache-spark-sql avro

【解决方案1】：

原因很明显，架构是空的。 see here from code

if (hasEmptySchema(schema)) {
      throw new AnalysisException(
        s"""
           |Datasource does not support writing empty or nested empty schemas.
           |Please make sure the data schema has at least one or more column(s).
         """.stripMargin)
    }

【讨论】：