jump to navigation

The Cloud Data Store Dilemma December 16, 2009

Posted by Peter Varhol in Architectures, Software platforms, Strategy.

I’m going to be writing on and off for a while about data in the cloud.  Before I do, I have a disclosure – I don’t understand the topic very well.  I’m hoping that by researching some things as I write that the topic becomes clearer to me.  I also hope that those of you who understand it better correct some of my more egregious statements.

I’m going to start with SQL.  After a 30-plus year investment in SQL databases, we may finally be reaching some of the limits of its utility.  A prime design goal of the Structured Query Language as defined by E. F. Codd was data integrity; things like lossless joins are important if we want to have absolute confidence that the result of a query is correct.

Performance has some role in SQL and relational database design; it is the primary reason we break up data into discrete tables.  But performance takes a back seat to integrity.  And there seem to be real life use cases where performance of relational databases is can be improved upon by other persistence techniques.

People also seem reluctant to run relational databases in a cloud instance.  Probably a part of this is the license cost; replicating existing data center commercial databases in cloud instances can get expensive.  That’s at least one reason why the Amazon Relational Database Service (RDS) uses MySQL.  While not free for commercial use, MySQL support cost substantially less than that of traditional commercial databases.

Another part is that the relational database can be a hindrance to performance and scalability in the cloud.  Writing to disk takes a long time relative to processing a transaction.  Last, many IT professionals perceive, with some reason, that data not persisted locally is data out of their direct control.

The almost universal use of object-oriented programming languages for new development today also contributes to issues with SQL and relational databases.  Object-relational mapping requires that we serialize objects we are going to write to the database, while fetching data requires de-serialization.  While O-R solutions such as Hibernate also provide data caching, which will speed up transactions, it probably doesn’t help a lot with writing to the database.

Most IT groups would probably prefer that data be persisted directly to the enterprise relational databases in the local data center, but the latency is such that it would prohibit scalability of enterprise applications.

These perceived limits have given rise to the nascent “no-SQL” movement, the idea that if we need to process large amounts of simple data, SQL isn’t necessarily the best or fastest way to do so.  And because (as near as I can tell) the available no-SQL solutions are all open source, the license cost of using them is nothing (the cost in learning the skills may, however, be quite high).

I’ll look at the no-SQL techniques, their advantages, and ther role in cloud computing in another post in the near future.



1. What the Heck is No-SQL? « Cutting Edge Computing - December 18, 2009

[…] The Cloud Data Store Dilemma […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: