Cassandra – Datastax Java Driver

Cassandra – Datastax Java Driver

Introduction

The NoSQL phenomenon has become the center of attraction in the past few years because there is a rising demand to accommodate high volumes of real-time data. Hence, major internet companies have popularized the use of data storage solutions that differ from traditional RDBMS.

Apache Cassandra

One good solution for data storage is Apache Cassandra- a distributed database management system. It was originally developed by Face book. Cassandra is an integration of a schema-flexible data model, (from Google’s BigTable) with a fully distributed, shared-nothing design (from Amazon’s Dynamo). This structure of Cassandra offers high availability, linear scalability and high performance while relaxing some consistency guarantees.

This blog deals with Cassandra’s interaction using Datastax Java driver, to create a perfect data model for our application.

Preference of Datastax over other drivers:

Datastax is the one of the Java client driver for Apache Cassandra. This driver works exclusively with the Cassandra Query Language version 3 (CQL3) which is similar to SQL, and Cassandra’s binary protocol. CQL3 is considered to be simple and better suited API for Cassandra than thrift API. Other Cassandra client drivers appear to be complex while interacting with Cassandra, and writing Queries.

Use Cases

Let’s have a use case starting from the basic DML operation with Cassandra, using Datastax Java driver in Java.

What we need to do:

  • Pre-requisites
  • Understand the terms, and data storage structure of Cassandra.
  • Design a perfect Data Model in a convenient way to read it fast
  • Create a Java Program to do DML operations

Solution

Pre-requisites:

Read the following to understand the terms, data storage structure, and different types of Data Model in Cassandra

The best way to approach data modeling for Cassandra is to start with the queries and work backwards from there. Think about the actions our application needs to perform, how we want to access the data, and then design column families to support those access patterns.

To understand the Cassandra data model, we need to get accustomed with the conventions of RDMS (Relational Data Base Management System) and their naming structure in Cassandra.

The following table shows the terms used in RDBMS and CassandraCassandra 1

RDBMS

Untitled

  • Limitation in Column size
  • Column name are same for entire rows of a table.
  • Column name does not store any value

Cassandra 

  • If we are familiar with JAVA then it is easy to understand how Cassandra stores the data.
  • It stores the data as a Map of a Map: an outer Map keyed by a row key, and an inner Map keyed by a Column name/key, where both maps are sorted.

Untitled

                   SortedMap<RowKey, SortedMap<ColumnKey, ColumnValue>>

Column name/key will vary for all rows of a Column family but Column datatype will be same for the entire column family. This is because the Column name/key stores value.

In Cassandra, there is no limitation in Column size as we can store billions of columns with a single row key. These types of columns are called wide-rows.

Before designing the Data Model, please remember the following:

  • Avoid thinking in context of relational table
  • Model the column families around query patterns
  • De-normalize and duplicate for read performance
  • There are many ways to model data in Cassandra
  • Indexing is not an afterthought, anymore
  • Think of physical storage structure

Design a Data Model in a convenient way to retrieve it fast

 Create a Java Program to do the DML operation

 
Challenges:

  • Throws java.lang.UnsupportedClassVersionError org/apache/cassandra/transport/FrameCompressor : Unsupported major.minor version 51.0 because while trying to connect Cassandra  with the combination of higher version of Cassandra with lower version of Datastax Java driver.

Solution: Check version of Datastax java driver in the classpath and update it to the current or upgraded version of Cassandra.

  • Throws NoHostAvailableException because the classpath Cassandra version is not matched with the Cassandra instance version.
    Solution: Change the Cassandra classpath jars to the existing Cassandra version we are trying to connect.
  • Throws com.datastax.driver.core.exceptions.InvalidTypeException: Invalid type for value 0 of CQL type text, expecting class Java.lang.String but class java.lang.Long provided this is because of the miss match of data while binding with the data type of the specific column.
    Solution: Check with the data type we passed and while binding with the CQL query using Bound Statement.

Conclusion

We are able to connect Cassandra through Datastax Java driver and complete DML operation using CQL (Cassandra Query Language) in Java, which is comparatively easy as the jdbc.odbc driver in MySQL.

Reference

8057 Views 1 Views Today