With a flurry of recent BI-oriented partnerships, it’s no surprise Cloudera is attracting so much interest.

via How Cloudera Became a Leader in BI/Hadoop

You’ve likely seen Facebook announcement of it’s new Messaging system. They now also have a blog about The Underlying Technology of Messages. And Todd Hoff is dissecting the blog article and adding some context in Facebook’s New Real-time Messaging System: HBase to Store 135+ Billion Messages a Month. This is just crazy. I wonder how many servers they have for the DB, or rather how fast it’s going to scale, and it’s probably better to measure how many servers they’ll need by terabyte of data managed, since it’s going to increase so quickly!

MapReduce and Hadoop Future

October 12th, 2010

Following up on Google dumping MapReduce, there are now a couple articles available that shed more light onto that decision and what it means for MapReduce. Go read MapReduce and Hadoop Future and then Google’s Dremel – or, Can MapReduce Itself Handle Fast, Interactive Querying? for additional thoughts on why Google’s decision isn’t the end of MapReduce.

Cloudera and Netezza Team Up to Bring Hadoop to Customers, so we read. All these connectors being announced makes me think there’s somebody out there with a matrix of RDBMS and NoSQL systems, looking at which combinations don’t have a marketable connector yet so he can be first to market.

Via 451 CAOS Theory, and GigaOM comments as well.

Hadoop Update

June 30th, 2010

A slew of updates on Apache Hadoop, nicely compiled by the nice folks at the 451 group:

  • Cloudera launched v3 of its Distribution for Hadoop and released v1 of Cloudera Enterprise.
  • Karmasphere released new Professional and Analyst Editions of its Hadoop development and deployment studio.
  • Talend announced that its Integration Suite now offers native support for Hadoop.
  • Yahoo announced the beta release of Hadoop with Security and Oozie, Yahoo’s workflow engine for Hadoop.
  • Datameer announced a strategic partnership with Zementis for predictive analytics on Hadoop.
  • The Register reported that Twitter is set to open source its MySQL-to-Hadoop tool.
  • MicroStrategy announced support for Apache Hadoop as a data source for MicroStrategy 9.
  • Appistry announced Hadoop-based strategic alliances Concurrent, Datameer and Kitenga.
  • GOTO Metrics released Data Analytics Platform, a Hadoop-based business intelligence platform.

And Monash can tell us a bit more about Cloudera Enterprise. He actually mentions

Financial services uses for Hadoop include: Internal trading rule enforcement/fraud detection, Complex ETL,Portfolio risk assessment (typically overnight).

Which is slightly different from what Teradata’s CTO told me at their CTO Road Show in Zurich last week, where I did raise the question about MapReduce in the financial industry. But then again maybe he was just saying that none of their Finance customers uses Teradata’s new MapReduce engine yet…

Quest to combine Oracle with Hadoop: another one to show that Oracle is the clear market leader, and everybody trying to position themselves around them.

Quest Software has announced a new partnership with Cloudera to create an Oracle connector for the Apache Hadoop database. […] The new tool not only handles data transfers, but also implements the meta information in Hadoop classes, allowing applications to be run with Oracle as well as Hadoop. Ora-Oop is to ensure that data can be exchanged equally well in both directions.

Cloudera are looking at Considerations for Hadoop and BI in a little series of two articles (2nd part). BI tools traditionally were designed for small volumes of structured data where Hadoop generally stores data in complex formats at scale and processes data on read using MapReduce, so that can be quite a problem, and it’s good to see some guidance around this, because I know it’ll be one of the first questions when we look at it here too. Even though I guess our main interest would first be in storing large amount of non-relational data and query it with custom tools (i.e. MapReduce jobs), and BI only as an afterthought. BI tools is still how people think about this kind of problem.