What the Heck is No-SQL? December 18, 2009Posted by Peter Varhol in Architectures, Software development, Software platforms.
In looking at data access in the cloud, I started by noting that there were good reasons why it was difficult or undesirable to run a SQL database in a server instance in a cloud. This problem has lead to something that might be termed the “no-SQL” movement. The goal is to persist data in some form in the cloud, in a way that is also readily available to the application. Of course, it’s also important to bring that data back into enterprise data repositories and warehouses at some point.
Because large applications are likely to be distributed in cloud clusters, the data store has to be fast. To be fast, it is likely distributed; you don’t want writes to be your application bottleneck. So for these and I’m sure other reasons, developers building large enterprise applications targeting the cloud prefer avoiding relational databases and the use of SQL as a query language.
Probably the most common data management technique discussed in these circumstances is MapReduce, first used by Google to manage queries across its vast server farm. MapReduce (Hadoop is an open source implementation of MapReduce for Java), breaks large problems up into small components and assigns them to individual servers in the cluster. When the small components are solved, MapReduce reassembles them into the larger solution. It is said to be especially useful for very large data sets and relatively simple queries.
MemcacheDB is another alternative. MemcacheDB is an implementation of the Berkeley DB that provides a distributed memory caching system and persistent store used to speed up database-driven applications by caching data and objects in memory.
There are other alternatives, although I’m not the person to judge them technically. In-memory databases such as Oracle TimesTen offers high performance for a distributed system, but is a commercial product requiring costly licenses. Object-oriented databases such as ObjectStore from Progress Software can eliminate the need for object-relational mapping, but is once again a costly commercial product.
Incidentally, my friends at 1060 Research tell me that in the company’s NetKernel representational state transfer (REST) middleware (which they call Resource-Oriented Computing, or ROC), data caching is a fundamental part of the architecture, and almost seems to me to be a side effect of the very elegant design of the product.
Whichever technique you use in the cloud (and this list is by no means comprehensive), you still need an additional step – getting the data from a persistent store in the cloud to your enterprise databases, which are almost certainly relational and use SQL as the primary query language (you may also be using XQuery or other XML-based query mechanism in the enterprise, of course).
So ultimately you have to map your cloud store into the relational store, albeit probably not in real time. I’ll write about that next time.