Avro Schema Evolution with Enum – 反序列化崩溃答案

【问题标题】：Avro Schema Evolution with Enum – Deserialization CrashesAvro Schema Evolution with Enum – 反序列化崩溃
【发布时间】：2020-06-26 14:31:24
【问题描述】：

我在两个单独的 AVCS 架构文件中定义了两个版本的记录。我使用命名空间来区分版本 SimpleV1.avsc

{
  "type" : "record",
  "name" : "Simple",
  "namespace" : "test.simple.v1",
  "fields" : [ 
      {
        "name" : "name",
        "type" : "string"
      }, 
      {
        "name" : "status",
        "type" : {
          "type" : "enum",
          "name" : "Status",
          "symbols" : [ "ON", "OFF" ]
        },
        "default" : "ON"
      }
   ]
}

JSON 示例

{"name":"A","status":"ON"}

版本 2 只是有一个附加的描述字段，具有默认值。

SimpleV2.avsc

{
  "type" : "record",
  "name" : "Simple",
  "namespace" : "test.simple.v2",
  "fields" : [ 
      {
        "name" : "name",
        "type" : "string"
      }, 
      {
        "name" : "description",
        "type" : "string",
        "default" : ""
      }, 
      {
        "name" : "status",
        "type" : {
          "type" : "enum",
          "name" : "Status",
          "symbols" : [ "ON", "OFF" ]
        },
        "default" : "ON"
      }
   ]
}

JSON 示例

{"name":"B","description":"b","status":"ON"}

两种模式都被序列化为 Java 类。在我的示例中，我将测试向后兼容性。 V1 写入的记录应由使用 V2 的读取器读取。我想看看是否插入了默认值。只要我不使用枚举，这就是有效的。

public class EnumEvolutionExample {

    public static void main(String[] args) throws IOException {
        Schema schemaV1 = new org.apache.avro.Schema.Parser().parse(new File("./src/main/resources/SimpleV1.avsc"));
        //works as well
        //Schema schemaV1 = test.simple.v1.Simple.getClassSchema();
        Schema schemaV2 = new org.apache.avro.Schema.Parser().parse(new File("./src/main/resources/SimpleV2.avsc"));

        test.simple.v1.Simple simpleV1 = test.simple.v1.Simple.newBuilder()
                .setName("A")
                .setStatus(test.simple.v1.Status.ON)
                .build();
        
        
        SchemaPairCompatibility schemaCompatibility = SchemaCompatibility.checkReaderWriterCompatibility(
                schemaV2,
                schemaV1);
        //Checks that writing v1 and reading v2 schemas is compatible
        Assert.assertEquals(SchemaCompatibilityType.COMPATIBLE, schemaCompatibility.getType());
        
        byte[] binaryV1 = serealizeBinary(simpleV1);
        
        //Crashes with: AvroTypeException: Found test.simple.v1.Status, expecting test.simple.v2.Status
        test.simple.v2.Simple v2 = deSerealizeBinary(binaryV1, new test.simple.v2.Simple(), schemaV1);
        
    }
    
    public static byte[] serealizeBinary(SpecificRecord record) {
        DatumWriter<SpecificRecord> writer = new SpecificDatumWriter<>(record.getSchema());
        byte[] data = new byte[0];
        ByteArrayOutputStream stream = new ByteArrayOutputStream();
        Encoder binaryEncoder = EncoderFactory.get()
            .binaryEncoder(stream, null);
        try {
            writer.write(record, binaryEncoder);
            binaryEncoder.flush();
            data = stream.toByteArray();
        } catch (IOException e) {
            System.out.println("Serialization error " + e.getMessage());
        }

        return data;
    }
    
    public static <T extends SpecificRecord> T deSerealizeBinary(byte[] data, T reuse, Schema writer) {
        Decoder decoder = DecoderFactory.get().binaryDecoder(data, null);
        DatumReader<T> datumReader = new SpecificDatumReader<>(writer, reuse.getSchema());
        try {
            T datum = datumReader.read(null, decoder);
            return datum;
        } catch (IOException e) {
            System.out.println("Deserialization error" + e.getMessage());
        }
        return null;
    }

}

checkReaderWriterCompatibility 方法确认模式是兼容的。但是当我反序列化时，我得到了以下异常

Exception in thread "main" org.apache.avro.AvroTypeException: Found test.simple.v1.Status, expecting test.simple.v2.Status
    at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:309)
    at org.apache.avro.io.parsing.Parser.advance(Parser.java:86)
    at org.apache.avro.io.ResolvingDecoder.readEnum(ResolvingDecoder.java:260)
    at org.apache.avro.generic.GenericDatumReader.readEnum(GenericDatumReader.java:267)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:181)
    at org.apache.avro.specific.SpecificDatumReader.readField(SpecificDatumReader.java:136)
    at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:247)
    at org.apache.avro.specific.SpecificDatumReader.readRecord(SpecificDatumReader.java:123)
    at org.apache.avro.generic.GenericDatumReader.readWithoutConversion(GenericDatumReader.java:179)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:160)
    at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:153)
    at test.EnumEvolutionExample.deSerealizeBinary(EnumEvolutionExample.java:70)
    at test.EnumEvolutionExample.main(EnumEvolutionExample.java:45)

我不明白为什么 Avro 认为它获得了 v1.Status。命名空间不是编码的一部分。 这是一个错误还是有人知道如何让它运行？

【问题讨论】：

标签： avro

【解决方案1】：

尝试添加@aliases。

例如：

{
  "type" : "record",
  "name" : "Simple",
  "namespace" : "test.simple.v1",
  "fields" : [ 
      {
        "name" : "name",
        "type" : "string"
      }, 
      {
        "name" : "status",
        "type" : {
          "type" : "enum",
          "name" : "Status",
          "symbols" : [ "ON", "OFF" ]
        },
        "default" : "ON"
      }
   ]
}

{
  "type" : "record",
  "name" : "Simple",
  "namespace" : "test.simple.v2",
  "fields" : [ 
      {
        "name" : "name",
        "type" : "string"
      }, 
      {
        "name" : "description",
        "type" : "string",
        "default" : ""
      }, 
      {
        "name" : "status",
        "type" : {
          "type" : "enum",
          "name" : "Status",
          "aliases" : [ "test.simple.v1.Status" ]
          "symbols" : [ "ON", "OFF" ]
        },
        "default" : "ON"
      }
   ]
}

【讨论】：

有同样的问题，添加别名就可以了，这应该被接受答案@Bertram。

【解决方案2】：

找到了解决方法。我将枚举移动到“未版本化”的命名空间。所以它在两个版本中都是一样的。但实际上它对我来说似乎是一个错误。转换记录不是问题，但枚举不起作用。两者都是 Avro 中的复杂类型。

{
  "type" : "record",
  "name" : "Simple",
  "namespace" : "test.simple.v1",
  "fields" : [ 
      {
        "name" : "name",
        "type" : "string"
      }, 
      {
        "name" : "status",
        "type" : {
          "type" : "enum",
          "name" : "Status",
          "namespace" : "test.model.unversioned",
          "symbols" : [ "ON", "OFF" ]
        },
        "default" : "ON"
      }
   ]
}

【讨论】：

我也有同样的问题 :(