July 27th, 2010
Now, depending on your viewpoint, that title could just as well read Keep a Hadoop Cluster in Your Back Pocket. So we’re talking about when it makes sense to combine an old fashioned SQL RDBMS with a fancy and modern NoSQL system from both angles.
Awkward it may be, but SQL is a lot more succint and readable than multiple lines of API calls or crazy, math-like relational algebra languages. And there’s nothing intrinsically slow about the language itself. If you could run “SELECT * FROM table WHERE …” on Cassandra, it would be no slower than specifying the same conditions via API calls.
Netezza blogger Phil Francisco, on the other hand, explains how it makes sense for some of their customers to use Hadoop as large online archive for their colder data.
We have seen customers deploy [patterns] in which the Hadoop Cluster is used for long-term data retention, or as a “queryable archive”. Here one could think of Hadoop as a complementary analytic extension of the Netezza TwinFin when there is far less premium placed on low-latency or high-performance. [...] the queryable archive could also retain long-term copies of structured data that had previously been loaded into the high-performance TwinFin appliance.
There you go. Let me know if you have any other thoughts about how to combine SQL and NoSQL for useful use cases.