David Menninger writes a nice intro to Splunk in Splunk Makes Machine-Generated Big Data Serve Analytics:

Splunk focuses on a specific segment of the big-data market: machine-generated data. This type of data originates constantly from many sources throughout an organization and in large quantities. The other common characteristic of machine-generated data is that generally it is less structured than data in typical relational databases. Often the information is captured as logs consisting of text files containing various record lengths and record structures. To effectively utilize this loosely structured information in real time, two challenges must be overcome: loading the data quickly and easily navigating through and analyzing the information once it is loaded.

I’m apparently not the only one having difficulties succinctly defining what Big Data is – let alone is there agreement in the industry, as to what the Big Data category should or should not include, as seen in Monash’s latest rambling “Big data” has jumped the shark. Over time Big Data as a term will likely either start to mean everything involving a lot of data (in anybody’s definition of “a lot”), or be replaced with a better term.

You don’t have to wait for long… Oracle Introduces Oracle Exadata Storage Expansion Rack, just days after I blogged about it at Forthcoming Oracle Appliances. Configurations from half rack to several full racks can be combined for massive storage. Interesting that Oracle wants us to use this not only for relational data but all sorts or other stuff as well:

The Oracle Exadata Storage Expansion Rack is ideal for storing massive amounts of structured and unstructured data including historical relational data; backups of Oracle Exadata Database Machine; weblogs; documents, images, LOBs and XML files

Curt Monash about Forthcoming Oracle appliances, based on information from Oracle’s earnings call (full transcript) last week. There will be an IMDB appliance based on TimesTen for high speed analytics, and a Hadoop appliance for MapReduce jobs, targetted at data preprocessing and feeding into Oracle. It really looks like Oracle is full steam ahead on the appliance strategy, and is also starting to embrace the MapReduce and massively parallel models. All of that is likely to be announced in more details at Oracle Open World.

Good 28 page whitepaper on NoSQL for SQL Server developers, first familiarizing the reader with NoSQL, then showing what NoSQL options there are in the Microsoft and Azure stack. Also a fair bit of positioning and what are appropriate use cases for NoSQL.

A nice howto on AlwaysOn, the combined Mirroring and Clustering HA/DR solution in SQL Denali, the next version of SQL.

Reported and analysed by Tony Baer in OnStrategies Perspectives, and reported by Derrick Harris in GigaOm’s in EMC, NetApp Make It a Big Day for Big Data Star Hadoop, we learn that EMC is using the on-going EMC World conference to its potential, and is announcing that they’re growing the Database division with the decision to sell their own Hadoop distribution with value add management tools and integration. I expect to see more soon.

Yahoo is considering to turn Hadoop into a business, as reported by the Wall Street Journal. Ovum’s Tony Baer has a more detailed analysis at his blog in Yahoo to Hadoop: Show me the Money.

In the long run, we also expect IBM to make a stab at Hadoop and related technologies by extending its InfoSphere offerings -– it can see Cloudera-Informatica and Cloudera-MicroStrategy raise it one with its own InfoSphere DataStage and Cognos offerings, before it even talks about partnerships. Today we saw a shot from left field – Yahoo which invented the technology – is now saying it might spin off its Hadoop business to go up against Cloudera, and potentially IBM. In a way, its closing the doors after the horses left the barn as the creator of Hadoop is now part of Cloudera.



For Yahoo, this would clearly be a shot out of its comfort zone, as it is not a tools company. But it is hungry for monetizing its intellectual property, even if that property has already been open sourced. It’s redolent of Sun striving to monetize Java and we all know how that went. Obviously this will be an uphill battle for Yahoo, but at least this would be a spinoff so hopefully there won’t be distractions from the mother ship. Given Yahoo’s fortunes, we shouldn’t be surprised that they are now looking to maximize what they can get out of the family jewels.

More commercial offerings in NoSQL can only be a good thing.

Google’s Megastore

April 26th, 2011

I don’t think I’ve written about Google’s Megastore yet, so here’s a quick summary of worthwile resources.

Megastore is the data engine supporting the Google Application Engine. It’s a scalable structured data store providing full ACID semantics within partitions but lower consistency guarantees across partitions.

James Hamilton’s take on Google Megastore: The Data Engine Behind GAE. His blog is worth following for people interested in scaling infrastructure in general, not just DBs. Todd Hoff’s write-up is about Google Megastore – 3 Billion Writes and 20 Billion Read Transactions Daily, his blog is about everything High Scalability. And last but not least the Storage Mojo take on Google’s Megastore, from a storage insider.

RainStor Database Technology Embedded Within HP Investigation Solution. I don’t know how many people have a need for investigation solutions, but there are certainly manz who have some requirements that point into the same direction, namely providing on-line (SQL) access to large amounts of relatively structured information (think logs or messages) for a long time (up to ten years or more). The announcement is also interested given that HP is probably still looking to grow its DB portfolio, so maybe there’s a new acquisition ahead if this partnership works out?