【问题标题】:Serializer for Avro SchemaAvro Schema 的序列化器
【发布时间】:2021-05-12 16:24:56
【问题描述】:

我是 Avro Schema 的新手。我已经根据参考 JSON 创建了以下架构,但我无法为此创建序列化程序。

{
  "name": "Name",
  "type": "record",
  "namespace": "NameSpace",
  "fields": [
    {
      "name": "discussions",
      "comment": "discussion ID.",
      "type": {
        "type": "array",
        "items": {
          "name": "discussionsRecord",
          "comment": "discussion Identifier.",
          "type": "record",
          "fields": [
            {
              "name": "discussionId",
              "type": "long"
            },
            {
              "name": "channelType",
              "comment": "channel Type Identification.",
              "type": "int"
            },
            {
              "name": "data",
              "comment": "The following block is to capture channel values.",
              "type": {
                "type": "array",
                "items": 
                [
                   {
                      "name": "dataRecord",
                      "type": "record",
                      "fields": [
                        {
                          "name": "pulse",
                          "comment": "Pulse.",
                          "type": "long"
                        },
                        {
                          "name": "communicationName",
                          "comment": "communication Identification.",
                          "type": {
                          "name": "communicationNameEnumType",
                          "comment": "enum for communication Names.",
                          "type": "enum",
                          "symbols": [
                          "cold", "rainIntensity", "heat"
                                     ]
                                  }
                        },
                        {
                          "name": "communicationValue",
                          "comment": "communication Values.",
                          "type": "double"
                        },
                        {
                          "name": "classValue",
                          "comment": "communication class.",
                          "type": {
                          "name": "classValueEnumType",
                          "comment": "enum for Class types.",
                          "type": "enum",
                          "symbols": [
                          "Dark", "Logical"
                                     ]
                                  }
                        }
                      ]
                    }
                ]
              }
            }
          ]
        }
      }
    }
  ]
}

【问题讨论】:

    标签: apache-spark avro spark-avro


    【解决方案1】:

    如果你有 AVSC 模式,你可以像这样创建 SparkSQL 模式 (scala)

    import org.apache.avro.Schema
    import org.apache.spark.sql._
    import org.apache.spark.sql.avro.SchemaConverters
    
    val avroSchema : String = ...
    val sparkSchema = SchemaConverters.toSqlType(new Schema.Parser().parse(avroSchema))
    

    否则,to_avro() 将现有数据帧及其架构序列化到 Avro 输出

    【讨论】:

      猜你喜欢
      • 2021-06-12
      • 1970-01-01
      • 2020-06-26
      • 2019-12-21
      • 1970-01-01
      • 1970-01-01
      • 1970-01-01
      • 2015-08-01
      • 2020-05-21
      相关资源
      最近更新 更多