Welcome to the eBuddy Tech Blog

February 1, 2013, 6:20 am

≫ Next: Developing a Data Access Layer for Cassandra, Part 1

Here you will find articles from the technical staff at eBuddy — we are sharing ideas and other tidbits here about our work, or anything that we find interesting in the realm of software development. Enjoy!

↧

Developing a Data Access Layer for Cassandra, Part 1

March 26, 2013, 1:40 am

≫ Next: Developing a Data Access Layer for Cassandra, Part 2

≪ Previous: Welcome to the eBuddy Tech Blog

Introduction

This series of articles is about building a data access layer in Java for Cassandra. We will begin here with general principles and design objectives. In future installments we will design an extensible framework and build some example domain level Data Access Objects (DAO). The job of a well-written DAO is to provide access to a data source at the same level of abstraction as the application domain model. It should be easy to use, testable, thread-safe, strongly-typed, and efficient. The lower level data access framework should ideally be reusable for different data source vendors that use a similar data format, e.g. in the style of Spring’s JDBCOperations for relational data.

Desirable characteristics of a Data Access Layer

Framework: Data source agnostic interfaces

The interfaces for the lower level data access framework should be interoperable with other data source vendors that use a similar data model. JDBC is a good example of a framework that is interoperable with various data sources that have a relational data model.

Cassandra makes an interesting case study here because it has a fairly unique data model consisting of keyspaces, rows, column families and super column families. There is a temptation to build a data access framework that is specific to Cassandra or to a particular client implementation of Cassandra such as Hector. It is better, however, to keep the interfaces abstract enough so that they can be implemented for any data source that has keyspaces (or namespaces), rows, and key-value columns. Not only does this serve the purpose of not making it tied to one data source vendor, it also makes sure that the interfaces are written at the right level of abstraction.

Testability

It is also important for the implementation to be easily unit tested. What this means in practice is that the dependencies are referenced by their interfaces so they can be easily mocked. It is also important that the design has the right division of responsibilities between actors and at the right levels of abstraction. A DAO object should be focused on the domain structure of the data and delegate the specifics of the data source API to the underlying data access framework. That way we can unit test just the domain level functionality in the DAO without having to also unit test the code that operates directly on the data source. We will go into more detail about testing in a future article.

Thread safety

Thread safety is important because a DAO is best used by business logic as a singleton which is injected by the Java EE container. That way the data-source specific configuration can be done once in a central location. To this end, a DAO singleton should be either stateless or completely immutable.

Strongly typed

The DAO’s themselves should be strongly typed at the level of abstraction of the application domain. It is precisely the responsibility of a DAO to translate the lower level data structures of the data source into domain objects and vice-versa.

Supports batch operations/transactions

For efficiency and/or for the necessity of data isolation, operations on data sources often need to be done in batch or units of work. A DAO is therefore required to support transactions and the propagation of transaction context, which may extend across multiple invocations of possibly multiple DAO instances that operate on the same data source. Transactions may also of course be distributed across multiple data sources, but this is beyond the scope of this article.

For the Cassandra data model, batch operations are supported on the same Keyspace. Although a batch operation does not have full ACID properties, we can still call them “transactions” to denote units of work. So we can start with a simple interface for KeyspaceOperations that defines operations for starting and ending a unit of work:

Efficient mapping to domain objects

As already mentioned, the responsibility of the data access layer is to transform the lower level, rather multipurpose data structures of a data source into the application domain object model, and vice-versa. It is of course also necessary for this transformation to be as efficient as possible since there may be a large amount of data to be transformed.

Follows familiar patterns

Since ease-of-use is always important, a well-written data access layer should provide a public API that follows the principle of least astonishment. Probably the most familiar API used by Java developers for accessing data sources is JDBC, and one of the most popular frameworks for building DAO objects using JDBC is the Spring JDBC abstraction framework. Therefore it is advantageous from an ease-of-use perspective to model a data access layer for Cassandra after this framework.

The following diagram compares the layers in data access between a JDBC relational data source and a column family data source, and is the model we will follow in this design.

In the next installment of this series, we will continue with more interfaces for the data access framework.

↧

Developing a Data Access Layer for Cassandra, Part 2

April 24, 2013, 2:37 am

≫ Next: First Netherlands Cassandra Users Meetup—hosted by eBuddy

≪ Previous: Developing a Data Access Layer for Cassandra, Part 1

In Part 1 we discussed design objectives of a data access layer in general, and now we turn to the interface for the operations on a column family. This corresponds to the JDBCOperations interface in Spring JDBC, the implementation of which is JDBCTemplate.

Column Family Operations

The Operations object is responsible for providing access to the data source in terms of generic types, which a DAO object can then use to bind to the application domain types.

It should be mentioned here that there are already classes in the Hector project that provide functionality at this level, namely ColumnFamilyTemplate and SuperColumnFamilyTemplate in the me.prettyprint.cassandra.service.template package. The design here is influenced by those ideas, but it is also an attempt to improve on those classes by meeting more of the design objectives discussed in Part 1. Later in this series we will come back to compare the templates provided in Hector with the design and implementation arrived at here.

At the level of a column family, we deal with three generic types: The row key type (K), the column name type (N), and the column value type (V). This is the same nomenclature used for the templates in Hector, and we will stick with this.

We will of course need read, write, and delete operations, but let’s start with the interface definition itself:

Should `ColumnFamilyOperations` extend `KeyspaceOperations`?

It just so happens that it is convenient to be able to begin and commit batch operations using a ColumnFamilyOperations so that you don’t necessarily need to have a KeyspaceOperations object around as well. For example, it is perfectly reasonable to do batch operations on a single column family, in which case it would make sense to just use a single DAO for that column family. However, extending KeyspaceOperations is an opportunistic design decision that doesn’t really make sense logically. For example, later there might be operations that are added to KeyspaceOperations that can only be done at the Keyspace level, such as listColumnFamilies. Then all of a sudden it doesn’t make sense for ColumnFamilyOperations to extend KeyspaceOperations. Therefore, we will simply add the methods from KeyspaceOperations that also make sense for column families, which just happens to be all of them we have so far:

Ideally the TransactionContext would be propagated by an aspect using thread locals so that it would not need to be explicit as a parameter, but this is beyond the scope of this article.

Now it is a matter of adding read, write, and delete operations for working with column families.

Write and Delete Operations

The write and delete operations are fairly straight forward:

We can now take a look at a very simple example of code that shows a batch operation of deleting two rows:

In the next part of this series, we will introduce read operations and the associated mappers required for efficient transformation into domain objects.

↧

First Netherlands Cassandra Users Meetup—hosted by eBuddy

April 25, 2013, 7:11 am

≫ Next: Cassandra Data Access Presentation

≪ Previous: Developing a Data Access Layer for Cassandra, Part 2

We are excited to announce the first Netherlands Cassandra Users Meetup, hosted by eBuddy.

The meetup will be on Tuesday, 28 May.

The first presentation will be an “Introduction to DataStax and Cassandra”, given by John Glendenning and Hayato Shimizu of DataStax.

The second one will be about “Developing a Data Access Layer for Cassandra in Java”, given by Eric Zoerner.

For more information see Netherlands Cassandra Users Meetup.

↧

Cassandra Data Access Presentation

May 30, 2013, 5:02 am

≫ Next: Netherlands Cassandra Users Meetup—hosted by eBuddy

≪ Previous: First Netherlands Cassandra Users Meetup—hosted by eBuddy

Cassandra Data Access Presentation (from Netherlands Cassandra Users Meet Up 28 May 2013)

↧

Netherlands Cassandra Users Meetup—hosted by eBuddy

August 23, 2013, 9:23 am

≫ Next: Announcing C* Path

≪ Previous: Cassandra Data Access Presentation

We are excited to announce the next Netherlands Cassandra Users Meetup, hosted by eBuddy.

The meetup will be on Thursday, 26 September, from 6:30 PM to 9:30 PM.

The first presentation will be BlueConic: Creating an online, real-time, cross-channel engagement platform using Cassandra for 150 million profiles and 5 billion interactions, given by Martijn van Berkum of GX Software.

We are still looking for a Second Presentation . Want to share your experience with Cassandra? If you want to present it in this meetup , please contact Eric (ezoerner@ebuddy.com) an Noha (nelsherif@ebuddy.com) to plan it.

For more information see Netherlands Cassandra Users Meetup.

↧

Announcing C* Path

October 22, 2013, 9:51 pm

≫ Next: Overview of C* Path

≪ Previous: Netherlands Cassandra Users Meetup—hosted by eBuddy

eBuddy has just open-sourced a library called C* Path, a Java library for reading and writing structured objects in Cassandra. Instead of serializing or normalizing objects, the library decomposes structured data into hierarchical paths allowing access by path in whole or in part. A simple-to-use API has implementations for both Thrift and CQL. The library is […]

↧

Overview of C* Path

October 28, 2013, 4:42 am

≪ Previous: Announcing C* Path

C* Path is a new open-source Java library for reading and writing structured objects in Cassandra. Instead of serializing or normalizing objects, the library decomposes structured data into hierarchical paths allowing access by path in whole or in part. The advantages of C* Path include:

Simple to use API
Implementations included for both Thrift and CQL (with no difference in application code)
Allows structural access to data at any level
Good for denormalizing data, can read or write large complex objects to a single partition
Objects are decomposed based on jackson annotations (fasterxml version), which may already exist in your domain classes

↧