Here you will find articles from the technical staff at eBuddy — we are sharing ideas and other tidbits here about our work, or anything that we find interesting in the realm of software development. Enjoy!

Here you will find articles from the technical staff at eBuddy — we are sharing ideas and other tidbits here about our work, or anything that we find interesting in the realm of software development. Enjoy!
This series of articles is about building a data access layer in Java for Cassandra. We will begin here with general principles and design objectives. In future installments we will design an extensible framework and build some example domain level Data Access Objects (DAO). The job of a well-written DAO is to provide access to a data source at the same level of abstraction as the application domain model. It should be easy to use, testable, thread-safe, strongly-typed, and efficient. The lower level data access framework should ideally be reusable for different data source vendors that use a similar data format, e.g. in the style of Spring’s JDBCOperations for relational data.
The interfaces for the lower level data access framework should be interoperable with other data source vendors that use a similar data model. JDBC is a good example of a framework that is interoperable with various data sources that have a relational data model.
Cassandra makes an interesting case study here because it has a fairly unique data model consisting of keyspaces, rows, column families and super column families. There is a temptation to build a data access framework that is specific to Cassandra or to a particular client implementation of Cassandra such as Hector. It is better, however, to keep the interfaces abstract enough so that they can be implemented for any data source that has keyspaces (or namespaces), rows, and key-value columns. Not only does this serve the purpose of not making it tied to one data source vendor, it also makes sure that the interfaces are written at the right level of abstraction.
It is also important for the implementation to be easily unit tested. What this means in practice is that the dependencies are referenced by their interfaces so they can be easily mocked. It is also important that the design has the right division of responsibilities between actors and at the right levels of abstraction. A DAO object should be focused on the domain structure of the data and delegate the specifics of the data source API to the underlying data access framework. That way we can unit test just the domain level functionality in the DAO without having to also unit test the code that operates directly on the data source. We will go into more detail about testing in a future article.
Thread safety is important because a DAO is best used by business logic as a singleton which is injected by the Java EE container. That way the data-source specific configuration can be done once in a central location. To this end, a DAO singleton should be either stateless or completely immutable.
The DAO’s themselves should be strongly typed at the level of abstraction of the application domain. It is precisely the responsibility of a DAO to translate the lower level data structures of the data source into domain objects and vice-versa.
For efficiency and/or for the necessity of data isolation, operations on data sources often need to be done in batch or units of work. A DAO is therefore required to support transactions and the propagation of transaction context, which may extend across multiple invocations of possibly multiple DAO instances that operate on the same data source. Transactions may also of course be distributed across multiple data sources, but this is beyond the scope of this article.
For the Cassandra data model, batch operations are supported on the same Keyspace. Although a batch operation does not have full ACID properties, we can still call them “transactions” to denote units of work. So we can start with a simple interface for KeyspaceOperations that defines operations for starting and ending a unit of work:
As already mentioned, the responsibility of the data access layer is to transform the lower level, rather multipurpose data structures of a data source into the application domain object model, and vice-versa. It is of course also necessary for this transformation to be as efficient as possible since there may be a large amount of data to be transformed.
Since ease-of-use is always important, a well-written data access layer should provide a public API that follows the principle of least astonishment. Probably the most familiar API used by Java developers for accessing data sources is JDBC, and one of the most popular frameworks for building DAO objects using JDBC is the Spring JDBC abstraction framework. Therefore it is advantageous from an ease-of-use perspective to model a data access layer for Cassandra after this framework.
The following diagram compares the layers in data access between a JDBC relational data source and a column family data source, and is the model we will follow in this design.
In the next installment of this series, we will continue with more interfaces for the data access framework.
In Part 1 we discussed design objectives of a data access layer in general, and now we turn to the interface for the operations on a column family. This corresponds to the JDBCOperations interface in Spring JDBC, the implementation of which is JDBCTemplate.
The Operations object is responsible for providing access to the data source in terms of generic types, which a DAO object can then use to bind to the application domain types.
It should be mentioned here that there are already classes in the Hector project that provide functionality at this level, namely ColumnFamilyTemplate and SuperColumnFamilyTemplate in the me.prettyprint.cassandra.service.template package. The design here is influenced by those ideas, but it is also an attempt to improve on those classes by meeting more of the design objectives discussed in Part 1. Later in this series we will come back to compare the templates provided in Hector with the design and implementation arrived at here.
At the level of a column family, we deal with three generic types: The row key type (K), the column name type (N), and the column value type (V). This is the same nomenclature used for the templates in Hector, and we will stick with this.
We will of course need read, write, and delete operations, but let’s start with the interface definition itself:
It just so happens that it is convenient to be able to begin and commit batch operations using a ColumnFamilyOperations so that you don’t necessarily need to have a KeyspaceOperations object around as well. For example, it is perfectly reasonable to do batch operations on a single column family, in which case it would make sense to just use a single DAO for that column family. However, extending KeyspaceOperations is an opportunistic design decision that doesn’t really make sense logically. For example, later there might be operations that are added to KeyspaceOperations that can only be done at the Keyspace level, such as listColumnFamilies. Then all of a sudden it doesn’t make sense for ColumnFamilyOperations to extend KeyspaceOperations. Therefore, we will simply add the methods from KeyspaceOperations that also make sense for column families, which just happens to be all of them we have so far:
Ideally the TransactionContext would be propagated by an aspect using thread locals so that it would not need to be explicit as a parameter, but this is beyond the scope of this article.
Now it is a matter of adding read, write, and delete operations for working with column families.
The write and delete operations are fairly straight forward:
We can now take a look at a very simple example of code that shows a batch operation of deleting two rows:
In the next part of this series, we will introduce read operations and the associated mappers required for efficient transformation into domain objects.
We are excited to announce the first Netherlands Cassandra Users Meetup, hosted by eBuddy.
The meetup will be on Tuesday, 28 May.
The first presentation will be an “Introduction to DataStax and Cassandra”, given by John Glendenning and Hayato Shimizu of DataStax.
The second one will be about “Developing a Data Access Layer for Cassandra in Java”, given by Eric Zoerner.
For more information see Netherlands Cassandra Users Meetup.
We are excited to announce the next Netherlands Cassandra Users Meetup, hosted by eBuddy.
The meetup will be on Thursday, 26 September, from 6:30 PM to 9:30 PM.
The first presentation will be BlueConic: Creating an online, real-time, cross-channel engagement platform using Cassandra for 150 million profiles and 5 billion interactions, given by Martijn van Berkum of GX Software.
We are still looking for a Second Presentation . Want to share your experience with Cassandra? If you want to present it in this meetup , please contact Eric (ezoerner@ebuddy.com) an Noha (nelsherif@ebuddy.com) to plan it.
For more information see Netherlands Cassandra Users Meetup.