Capturing messages in Event Hubs to Blob Storage

This post builds on the post Streaming messages from Kafka to EventHub with MirrorMaker where the messages are streamed into Event Hubs which is the datasource for this post.

Messages in Event Hubs

Messages in Event Hubs are stored in Avro format. If we wish to capture them in Blob Storage, they will be stored in the same format – Avro. Apache Avro is a binary data serialization system.

The whole infrastructure is seen here, from the Scala script that produces messages to the Blob Storage where they are stored.
Important: Avro is not a tool for capturing messages, but a file format!

Installing avro-tools

The simplest way to manipulate an Avro file is by using avro-tools. Download the avro-tools jar file.

wget https://repo1.maven.org/maven2/org/apache/avro/avro-tools/1.9.0/avro-tools-1.9.0.jar -P /opt

Reading Avro file from Blob Storage

For the purpose of testing if messages produced in Kafka landed in the Blob Storage, one file is manually downloaded and checked. It is best to copy the URL from Azure Storage accounts -> Storage Explorer -> BLOB CONTAINERS -> EVENTHUB_NAMESPACE. On the right side, drill down until the avro file is visible.

Choose the file and click on the Copy URL button. Download the file to the local computer by executing the below command (alter the URL link accordingly):

wget https://STORAGE_ACC_NAME.blob.core.windows.net/STORAGE_CONTAINER_NAME/EVENTHUB_NAMESPACE/prod.test1/0/2019/08/07/15/17/19.avro -P /tmp

The file is saved to folder /tmp and can now be read using avro-tools.

Comparing messages in Kafka with messages in Blob Storage

100 messages have been produced and saved to 5 different topics in Kafka. If topic prod.test1 is taken as an example, from CLI the messages can be listed executing the following command:

$KAFKA_HOME/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic prod.test1 --from-beginning

The output (first three and last three messages):

This is message number 2
This is message number 32
This is message number 52
.
.
.
This is message number 95
This is message number 98
This is message number 100

Checking Avro file that corresponds to the topic prod.test1 by running the following command:

java -jar /opt/avro-tools-1.9.0.jar tojson /tmp/19.avro

Returns the following rows:

{“SequenceNumber”:0,”Offset”:”0″,”EnqueuedTimeUtc”:”8/8/2019 9:02:31 AM”,”SystemProperties”:{“x-opt-kafka-key”:{“bytes”:”\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001″}},”Properties”:{},”Body”:{“bytes”:”This is message number 2“}}
{“SequenceNumber”:1,”Offset”:”560″,”EnqueuedTimeUtc”:”8/8/2019 9:02:31 AM”,”SystemProperties”:{“x-opt-kafka-key”:{“bytes”:”\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001″}},”Properties”:{},”Body”:{“bytes”:”This is message number 32“}}
{“SequenceNumber”:2,”Offset”:”608″,”EnqueuedTimeUtc”:”8/8/2019 9:02:31 AM”,”SystemProperties”:{“x-opt-kafka-key”:{“bytes”:”\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001″}},”Properties”:{},”Body”:{“bytes”:”This is message number 52“}}
.
.
.
{“SequenceNumber”:8,”Offset”:”896″,”EnqueuedTimeUtc”:”8/8/2019 9:02:31 AM”,”SystemProperties”:{“x-opt-kafka-key”:{“bytes”:”\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001″}},”Properties”:{},”Body”:{“bytes”:”This is message number 95“}}
{“SequenceNumber”:9,”Offset”:”944″,”EnqueuedTimeUtc”:”8/8/2019 9:02:31 AM”,”SystemProperties”:{“x-opt-kafka-key”:{“bytes”:”\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001″}},”Properties”:{},”Body”:{“bytes”:”This is message number 98“}}
{“SequenceNumber”:10,”Offset”:”992″,”EnqueuedTimeUtc”:”8/8/2019 9:02:31 AM”,”SystemProperties”:{“x-opt-kafka-key”:{“bytes”:”\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0001″}},”Properties”:{},”Body”:{“bytes”:”This is message number 100“}}

The message numbers match. The messages the initially were produced in Kafka are now stored in Avro format in Azure Blob Storage.

There are services in Azure that work on top of Event Hubs or Blob Storage. That topic is covered in the next blog post.

One thought on “Capturing messages in Event Hubs to Blob Storage

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s