Yahoo is considering to turn Hadoop into a business, as reported by the Wall Street Journal. Ovum’s Tony Baer has a more detailed analysis at his blog in Yahoo to Hadoop: Show me the Money.

In the long run, we also expect IBM to make a stab at Hadoop and related technologies by extending its InfoSphere offerings -– it can see Cloudera-Informatica and Cloudera-MicroStrategy raise it one with its own InfoSphere DataStage and Cognos offerings, before it even talks about partnerships. Today we saw a shot from left field – Yahoo which invented the technology – is now saying it might spin off its Hadoop business to go up against Cloudera, and potentially IBM. In a way, its closing the doors after the horses left the barn as the creator of Hadoop is now part of Cloudera.



For Yahoo, this would clearly be a shot out of its comfort zone, as it is not a tools company. But it is hungry for monetizing its intellectual property, even if that property has already been open sourced. It’s redolent of Sun striving to monetize Java and we all know how that went. Obviously this will be an uphill battle for Yahoo, but at least this would be a spinoff so hopefully there won’t be distractions from the mother ship. Given Yahoo’s fortunes, we shouldn’t be surprised that they are now looking to maximize what they can get out of the family jewels.

More commercial offerings in NoSQL can only be a good thing.

Google’s Megastore

April 26th, 2011

I don’t think I’ve written about Google’s Megastore yet, so here’s a quick summary of worthwile resources.

Megastore is the data engine supporting the Google Application Engine. It’s a scalable structured data store providing full ACID semantics within partitions but lower consistency guarantees across partitions.

James Hamilton’s take on Google Megastore: The Data Engine Behind GAE. His blog is worth following for people interested in scaling infrastructure in general, not just DBs. Todd Hoff’s write-up is about Google Megastore – 3 Billion Writes and 20 Billion Read Transactions Daily, his blog is about everything High Scalability. And last but not least the Storage Mojo take on Google’s Megastore, from a storage insider.

The 451 group’s Matt Aslett argues that Necessity is the mother of NoSQL.

Necessity is particularly relevant when looking at the history of the NoSQL databases. While it is easy for the incumbent database vendor to dismiss the various NoSQL projects as development playthings, it is clear that the vast majority of NoSQL projects were developed by companies and individuals in response to the fact that the existing database products and vendors were not suitable to meet their requirements with regards to the other five factors: scalability, performance, relaxed consistency, agility and intricacy.


The fact that Facebook, LinkedIn, Google and Amazon have had to develop and support their own database infrastructure is not a healthy sign. In a perfect world, they would all have better things to do than focus on developing and managing database platforms. That explains why the companies have also all chosen to share their projects. Google and Amazon did so through the publication of research papers, which enabled the likes of Powerset, Facebook, Zvents and Linkedin to create their own implementations. These implementations were then shared through the publication of source code, which has enabled the likes of Yahoo, Digg and Twitter to collaborate with each other and additional companies on their ongoing development.

He also posts an interesting chart of the evolution of  NoSQL.

RainStor Database Technology Embedded Within HP Investigation Solution. I don’t know how many people have a need for investigation solutions, but there are certainly manz who have some requirements that point into the same direction, namely providing on-line (SQL) access to large amounts of relatively structured information (think logs or messages) for a long time (up to ten years or more). The announcement is also interested given that HP is probably still looking to grow its DB portfolio, so maybe there’s a new acquisition ahead if this partnership works out?

Already 3 months gone by? April 2011 Critical Patch Update Released (direct link to Database vulnerabilities). Mostly obscure components that aren’t in widespread use in the DB world, but who knows…

State of the MySQL Ecosystem

April 20th, 2011

Brian Aker wrote a good article about MySQL, State of the Ecosystem on his blog. Glad to see a key figure for MySQL be positive about the future of the ecosystem!

The 451 Group recently released a report about “How will the database incumbents respond to NoSQL and NewSQL?” Unfortunately it’s only available to their subscribers, so I can’t get the details… anyway – they followed up with a blog post What we talk about when we talk about NewSQL, which includes a pretty complete list of companies and technologies that we have to watch outside the established RDBMS vendors.

“NewSQL” is our shorthand for the various new scalable/high performance SQL database vendors. We have previously referred to these products as ‘ScalableSQL’ to differentiate them from the  incumbent relational database products. Since this implies horizontal  scalability, which is not necessarily a feature of all the products, we  adopted the term ‘NewSQL’ in the new report.


So who would be consider to be the NewSQL vendors? Like NoSQL, NewSQL  is used to describe a loosely-affiliated group of companies, but what  they have in common is the development of new relational database  products and services designed to bring the benefits of the relational  model to distributed architectures, or to improve the performance of  relational databases to the extent that horizontal scalability is no  longer a necessity.

In the first group we would include (in no particular order)  Clustrix, GenieDB, ScalArc, Schooner, VoltDB, RethinkDB, ScaleDB,  Akiban, CodeFutures, ScaleBase, Translattice, and NimbusDB, as well as  Drizzle, MySQL Cluster with NDB, and MySQL with HandlerSocket. The  latter group includes Tokutek and JustOne DB. The associated  “NewSQL-as-a-service” category includes Amazon Relational Database  Service, Microsoft SQL Azure, Xeround, and FathomDB.

I’ve been following most of them, but some of them were still news to me.

MySQL just announced a pre-release snapshot which comes with an integrated Memcached plugin accessing the InnoDB storage engine directly: NoSQL to InnoDB with Memcached

The ever-increasing performance demands of web-based services have generated significant interest in providing NoSQL access methods to MySQL. Today, MySQL is announcing the preview of the NoSQL to InnoDB via memcached. This offering provides users with the best of both worlds – maintain all of the advantages of rich SQL query language, while providing better performance for simple queries via direct access to shared data. In this preview release, memcached is implemented as a MySQL plugin daemon, accessing InnoDB directly via the native InnoDB API

I wouldn’t be surprised to see more of this kind of integration also from other DB vendors.

Now that’s a thing: Kevin Closson Joins EMC Data Computing Division To Focus On Greenplum Performance Engineering! Kevin’s been the public voice of Exadata in the blogosphere for much of four years, so that’s quite a loss for the folks at Oracle. And a big win for EMC, I would say. Good luck, Kevin!

Great Bloomberg interview with Cloudera CEO Mike Olson on open source and big data.

Via the 451 group