This just came in: Teradata acquiring Aster Data. Database consolidation wars is full speed ahead, last month, it was HP acquiring Vertica, last year IBM bought Netezza, and EMC bought Greenplum. So within six months, the four biggest and most promising MPP DB vendors have found a new owner.

What’s in a Name

December 2nd, 2010

Following up on last week’s The meaning of NoSQL, there are quite a number of well thought out articles around CouchOne’s decision to not associate itself with NoSQL:

  • Why Cloud Computing Sells and NoSQL Is Fading about the fact that Cloud was at the same position three years ago, but climbed to hill to success, whereas Grid Computing didn’t, and NoSQL likely won’t
  • The beginning of the end of NoSQL notes that Big Data, with which NoSQL has become closely associated, is endangered as well because there’s a trademark on that term. Massive Data and Big Analytics are emerging in its place.
  • Use cases are driving the divergence, and the convergence, of NoSQL solutions argues that it’s time for a use case driven description of NoSQL solutions instead of just defining itself by what it isn’t (or, in many cases, what it only partially is, as many now start to support [limited] SQL anyway). At the same time the membase folks suggest that maybe Cloud Database could be better than NoSQL… elegantly solving the dilemma that GigaOm points out in the first post above.

10 Signs You Need A Big Data Retention Solution by Rainstor’s Ramon Chen.

The reality, driven by more stringent legislation, governance and extended on-demand accessibility to historical data, is that structured data retention is now fast becoming the #1 imperative for businesses worldwide.

The article outlines key signs you need a dedicated solution for Big Data retention.

The meaning of NoSQL

November 19th, 2010

If anyone NoSQL solution vendor’s marketing skills stand out, then it’s certainly CouchOne. First the rebranding from CouchIO to CouchOne, then the awareness of the shift in the general perception of NoSQL away from the generic meaning of ‘not (only) SQL’ to meaning ‘Big Data’ solution. In Moving Away from NoSQL: Why Size Matters and Small is Better they explain why they no longer want to be associated with NoSQL:

For our part, we’ll let others chase the “NoSQL” thing, while we focus on enabling offline data and applications. We want to fix the “Achilles heel of the cloud”, so that everyone has their data with them at all times regardless of Internet connectivity. Sometimes thinking big means getting small, but unfortunately that doesn’t fit conveniently with the emerging definition of “NoSQL”.

Via CAOS Links.

You’ve likely seen Facebook announcement of it’s new Messaging system. They now also have a blog about The Underlying Technology of Messages. And Todd Hoff is dissecting the blog article and adding some context in Facebook’s New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month. This is just crazy. I wonder how many servers they have for the DB, or rather how fast it’s going to scale, and it’s probably better to measure how many servers they’ll need by terabyte of data managed, since it’s going to increase so quickly!

Doug Henschen’s The Big Data Era: How Data Strategy Will Change is another good piece about the future of BI and Big Data (registration required, or ask Google for freely accessible copies of the article). He mentions lots of different companies, their challenges and how they solved them, so good to get an overview of the different facets.

Very good article about What’s Essential – And What’s Not – In Big Data Analytics. Starts with a Big Data Analytics overview, then dives into the columnar vs. row based DBs debate (only to find that that’s ultimately not generally important, as all these systems are built to scale, and it depends on your data and requirements which DB engine handles it best). In-database analytics is introduced (with MapReduce as an example) as a concept that’ll become more important, and the final point is, rightfully so, that whichever RDBMS you choose, it has to play nice with your data management tools ecosystem.

Addressing Big-Data Challenges

October 15th, 2010

Ramon Chen, of Rainstor, in a long, Q&A-styled article about Addressing Big-Data Challenges. Unfortunately I know this behaviour he mentions just too well:

At a more granular level, organizations continue to retain critical, structured, transactional data in production system environments far longer than is legally required. These primary systems quickly become bloated and require ongoing capacity planning to accommodate anticipated growth.

Tape is often used not only as an operational backup, but also as a long-term archive. If you’ve got thousands of tapes, that quickly becomes unmanageable…

Stephen O’Grady puts some thoughts into words in Why There Won’t Be a LAMP For Big Data that I also had when reading Edd Dumbill’s The SMAQ stack for big data (Storage, MapReduce and Query).

It is not clear to me that we will have, at any point in the future, a LAMP equivalent for big data. The above notwithstanding, I think Edd’s excellent piece tacitly acknowledges this, as the components of his acronym are abstractions rather than projects.

Go read both posts!

The problems with ACID

September 2nd, 2010

Danial Abadi throws a lengthy post at the DB world: The problems with ACID, and how to fix them without going NoSQL, introducing an even longer paper that they’re going to present at this months VLDB2010 in Singapore: The Case for Determinism in Database Systems.

This is good stuff, basically argueing that the reason people are going NoSQL to scale is because traditional ACID compliant RDBMS can’t scale as well, so he’s now looking at ways to get around the scaleability problem implications of ACID.