Table of Content
This is the third part in multi part series that talks about Apache Avro interoperability. It is becoming increasingly important to have a common data format so that multiple systems can interoperate in an efficient manner. Apache Avro’s standard data file format self-describing always containing the full schema for the data, compression support such as Snappy, and a compact binary is the perfect message format that enables the interoperability work. Avro has become the de-facto data format within the Big Data ecosystem both in real time with Apache Kafka and Storm as well as batch processing with Hadoop system.
Let’s see how the avro file written in one programming language can be read by the other programming language. It is advisable to check out Part1 and Part2 of this series as we will be referring to it in this post.
What we want to do:
- Create an Avro binary using Avro Java Tools
- Read the Avro binary and output the data using Avro Python Tools
Create an Avro binary using Avro Java Tools:
- Follow the below sections from the first blog in this series to download the Avro Java Tools, create a schema, create data file, and use Avro tools to convert the data file using the schema into binary Avro.
Read the Avro binary and output the data using Avro Python Tools:
- Follow the below sections from the second blog in this series to setup Python, Avro Python Tools, and use Avro tools to convert binary Avro and filter the data
- The purpose of keeping this blog post short is to show how easy to achieve the Avro’s interoperability feature without any extra plumbing work.
- Avro’s schema written into the binary and always available for the reader that makes the binary avro compact. There is an option of using more compression techniques with Snappy to further compress the binary if desired.
- Data Interoperability with Apache Avro:http://blog.cloudera.com/blog/2011/07/avro-data-interop/