How big data might mean better business for big banks: interesting vision how banks and credit card companies can use publicly available data and big analytics to separate bad from good risks for the retail lending business. There are some more links in that article that are worth following.

I think some might already be using similar technology to vet new very rich private banking customers (identify those where the reputation risk of serving them might be too big) or help with CRM for the existing top few hundred wealth management clients. But scaling this to thousands and millions of potential customers is a whole different challenge for big analytics, that I’m eager to see how soon they can make it work economically.

As expected: HP yokes Autonomy, Vertica together for big data push

Oracle doing Hadoop and NoSQL

September 30th, 2011

Oracle docs show plans for Hadoop, NoSQL about the Oracle Loader for Hadoop, and Added Session: Big Data Appliance about Oracle’s Big Data Appliance.

(Big) Data Pyramid

September 23rd, 2011

Something not completely technical – but still relevant: Is That Information…And Do I Care?

the discussion of how data and information relate actually continues to knowledge and wisdom as well. We’ve summarized the distinction between these concepts below based on Hey’s paper and Dr. Russell Ackoff work with one of the most common representations of the relationships: the pyramid diagram.

via The Big Data Maslow’s Pyramid.

Focus on Big Data and Exadata Mini

September 19th, 2011

Oracle Openworld must be close, because the rumour mill starts heating up… Piper Jaffray is predicting that Oracle will release an Exadata Mini machine that will fit under ones desk (via DBMS2). And Jean-Pierre Dijcks compiled a list of Big Data related sessions at Openworld, Big Data may very well be the key note topic, I hear, so it’s worth spending some time at these sessions.

David Menninger writes a nice intro to Splunk in Splunk Makes Machine-Generated Big Data Serve Analytics:

Splunk focuses on a specific segment of the big-data market: machine-generated data. This type of data originates constantly from many sources throughout an organization and in large quantities. The other common characteristic of machine-generated data is that generally it is less structured than data in typical relational databases. Often the information is captured as logs consisting of text files containing various record lengths and record structures. To effectively utilize this loosely structured information in real time, two challenges must be overcome: loading the data quickly and easily navigating through and analyzing the information once it is loaded.

I’m apparently not the only one having difficulties succinctly defining what Big Data is – let alone is there agreement in the industry, as to what the Big Data category should or should not include, as seen in Monash’s latest rambling “Big data” has jumped the shark. Over time Big Data as a term will likely either start to mean everything involving a lot of data (in anybody’s definition of “a lot”), or be replaced with a better term.

Big Data Application Platform

September 6th, 2011

Nati Shalom throws one in for Big Data Application Platforms:

Big Data Application platforms are unique in the sense that they need to be able handle massive amounts of data and therefore need to come with built-in support for things like Map/Reduce, Integration with external NoSQL databases, parallel processing, and data distribution services and on top of that, they should make the use of those new patterns simple from a development perspective.Below is a more concrete list of the specific characteristics and features that define what Big Data Application Platform ought to be. I’ve tried to point to the specific Java EE equivalent API and how it would need be extended to support Big Data application.

via the High Scalability blog.

Structure Big Data Roundup

March 29th, 2011

Good number of articles from Derrick Harris over at GigaOm rounding up the Structure Big Data Conference. First, there’s a look at Hadoop, Cloudera, and alternatives to Cloudera from IBM, DataStax, Hadapt etc. in As Big Data Takes Off, the Hadoop Wars Begin, and second there’s a piece about Why Big Data Startups Should Take a Narrow View:

[…] analyzing social media data is not the same, either in technique or in purpose, as analyzing user data to feed a recommendation engine for a site like Netflix. And herein lies the opportunity. […] It’s a situation just begging for startups to fill the void between big data tools and actually using them for a particular task.

So where are the NoSQL startups targetting the financial industry?

Great Bloomberg interview with Cloudera CEO Mike Olson on open source and big data.

Via the 451 group