Advanced Avro: Schema Design & Reuse – Part 4

Advanced Avro: Schema Design & Reuse – Part 4

Introduction

This is the fourth part in multi part series that talks about Apache Avro Schema Design. As it goes with any data modeling, schema design is very crucial that can be set as a contract for systems to interoperate. Unfortunately, Avro doesn’t have out of the box support for Schema inclusion to reuse schemas and enable schema extensibility. In this post, we will see how to work around to create a great schema that’s extensible. Don’t forget to check out our previous [Part 1, Part 2, and Part 3] blog posts in this series.

Use Case

This time, we will go little crazy with our schema modeling and make it complex by nesting elements of both primitive and complex types. Later, we will find ways to split this complex schema into reusable small sub-schemas and include the base schema into the main schema for easy schema management and extensibility.

What we want to do:

  • Create a Schema with nested sub-schemas
  • Create Multiple sub-schemas and reuse
  • Write a Java program that includes sub-schemas into main schema
  • Convert the JSON file into binary Avro, and from binary Avro to JSON file using Avro Tools

Solution

Create a Schema with nested sub-schemas:

  • We will expand our e-commerce Schema to something interesting and make it little complex such that it will have nested sub-schemas. The below data model is very obvious that starts with Order which contains multiple order line items that in turn has product details. We have used product schema extensively in our previous posts and the same will be reused here.

 Order

  • Time to represent the above data model in Avro schema. This is good as we made it little complex by nesting primitive and complex types. Starting with Order as type record that contains array of OrderDetail and each OrderDetail has Product information.

Create Multiple sub-schemas and reuse:

  • It is very obvious from the above schema that the above schema can easily become overwhelming if we add more and more details. For example, Customer Object that has all customer information including home and shipping address.
  • Let’s think about ShoppingCart Object that again contains all the order line items (OrderDetail) that’s in the Cart but not purchased yet. There is no way to reuse OrderDetail Object with ShoppingCart and the only way is to duplicate the whole schema. Unfortunately, Avro doesn’t support schema reuse or composition pattern.
  • Luckily with some custom programming, we should be able to split the schemas into sub-schemas and reuse them as we wish. This is nothing new and there are already some blogs addressing these.
  • Let’s first split the above schema into multiple sub-schemas:
  • Observe how order schema refers to OrderDetail and OrderDetail refers to Product. If we have to create a ShoppinCart then all we can easily reuse OrderDetail that already has Product. Isn’t this cool? But wait, this involves quite a bit of work to have the reuse capability as Avro doesn’t support multiple sub-schemas out of the box. Avro expects everything to be in one schema. Let’s see what it needs t make this happen in the next section.

Write a Java program that includes sub-schemas into main schema:

  • This is the tricky part that parses and resolves the sub-schema and merges into the final schema.
  •  Verify:

Convert the JSON file into binary Avro, and from binary Avro to JSON file using Avro Tools

  • Convert from JSON to Binary Avro:
    • Use Avro Tools to convert the JSON data to Binary Avro
      • Put order.avsc in schemas directory under any working directory
      • Put order.json in input directory
      • Execute the below Java command which is fairly self explanatory
    • Verify the above command successfully executed by checking the output directory for the Avro file. The above command should return without any errors if it executed successfully.
  • Peek into Binary Avro:
    • Convert from Binary Avro to JSON:
      • It’s time to do the reverse to get the JSON data back from the binary Avro that we created. Use Avro Tools to convert the Binary Avro to JSON data by specifying the binary Avro file location and the location of the JSON data.
      • The above command will create the json file on successful execution which is very similar to the previous JSON input file.

Conclusion

  • With some custom programming, it is possible to create sub-schemas and can be reused as desired rather than having everything in monolithic big schema.
  • Let’s hope that future releases of Avro will include composition, aggregation, inheritance, and polymorphism during schema generation.

References

 

 

17300 Views 1 Views Today
  • Tudor Lapusan

    Thanks for your effort of writing this article. It helped me to go further with my data modeling.