Eric Brewer about CAP Twelve Years Later: How the “Rules” Have Changed:

The CAP theorem asserts that any net­worked shared-data system can have only two of three desirable properties. How­ever, by explicitly handling partitions, designers can optimize consistency and availability, thereby achieving some trade-off of all three. In the decade since its introduction, designers and researchers have used (and sometimes abused) the CAP theorem as a reason to explore a wide variety of novel distributed systems. The NoSQL movement also has applied it as an argument against traditional databases. […]

The “2 of 3” formulation was always misleading because it tended to oversimplify the tensions among properties. Now such nuances matter. CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare. Although designers still need to choose between consistency and availability when partitions are present, there is an incredible range of flexibility for handling partitions and recovering from them. The modern CAP goal should be to maximize combinations of consistency and availability that make sense for the specific application. Such an approach incorporates plans for operation during a partition and for recovery afterward, thus helping designers think about CAP beyond its historically perceived limitations.

And Todd Hoff recently wrote about a later presentation Brewer gave, and that motivated me to finally blog about above article… Myth: Eric Brewer on Why Banks are BASE Not ACID – Availability Is Revenue:

In NoSQL: Past, Present, FutureEric Brewer has a particularly fine section on explaining the often hard to understand ideas of BASE (Basically Available, Soft State, Eventually Consistent), ACID (Atomicity, Consistency, Isolation, Durability), CAP (Consistency Availability, Partition Tolerance), in terms of a pernicious long standing myth about the sanctity of consistency in banking.

Some good examples about banking and ACID requirements… or the lack thereof, and how that risk is contained.

Dataguise Presents 10 Best Practices for Securing Sensitive Data in Hadoop. Yeah, you gotta hop over to read it at myNoSQL…

Mumps: the proto-database (or how to build your own NoSQL database):

I think that one of the problems with Mumps as a database technology, and something that many people don’t like about the Mumps database is that it is a very basic and low-level engine, without any of the frills and value-added things that people expect from a database these days.

Interesting, these guys have been around for decades, yet (almost) nobody is using them as foundation for their own NoSQL store? Maybe here’s why –A Case of the MUMPS:

You may not realize it, but the majority of us developers have been living a sheltered professional life. Sure, we’ve got that living disaster of a C++ application and that ridiculous interface between PHP and COBOL written by the boss, but I can assure you, that all pales in comparison to what many, less fortunate programmers have to work with each day. These programmers remain mostly forgotten, toiling away at a dead-end career maintaining ancient information systems whose ridiculously shoddy architecture is surpassed only by the tools used to create it. Bryan H lived in such a world for over two years. Specifically, he worked at a “MUMPS shop.”

Via myNoSQL.

EMC to Hadoop competition: “See ya, wouldn’t wanna be ya.”:

EMC Greenplum rolled out a new Hadoop distribution that fuses the popular big data platform with its flagship MPP database technology. Co-founder Scott Yara thinks the company’s huge investment puts it in the catbird seat among Hadoop vendors.

Greenplum HAWQ: yet another Hadoop distribution, this time Greenplum RDBMS tied to HDFS.

NoSQL on MySQL: stating the obvious:

Some of the NoSQL vendors seemed to have stirred up a mild controversy with their reactions to the launch of NoSQL access to InnoDB in MySQL 5.6 and their suggestions that NoSQL access is only a part of the NoSQL story.

First they ignore you, then they laugh at you… oh wait, who is who in this fight of NoSQL vendors vs. Oracle?

MySQL version history:

I’ve created a graph about the MySQL version history. It’s mysql-graph-history on github. Please let me know if this is correct or if I’m forgetting some versions.

And while we’re talking about MySQL, here’s Monty Widenius About NoSQL, Big Data, and Obvioulsy MySQL and MariaDB, and Mr myNoSQL is tearing into some of the arguments that Monty makes without backing them up:

The interview Dmitry Sotnikov1 had with Monty Widenius was published on so many places that I had a hard time deciding which to link to. Anyways, there are a couple of comments and corrections that I’d like to suggest.



Updated Database Landscape Map

January 5th, 2013

Updated database landscape graphic:

I recently published an updated version but noted that there were a group of database vendors that had emerged in 2012 that didn’t easily fit into the segments we’d created.

It’s so much better… Good overview for anybody interested in understanding the DB world beyond just whatever one or two products or technologies they already know.

Cloudera makes SQL a first-class citizen in Hadoop:

Not content to watch its competitors leave it in the dust, veteran big data startup Cloudera is fundamentally changing the face of its flagship Hadoop distribution into something much more appealing.

Monash also writes about it: Quick notes on Impala and More on Cloudera Impala.

NoSQL Data Modeling Techniques

August 20th, 2012

Data modeling reference for NoSQL, good stuff!

In this article I provide a short comparison of NoSQL system families from the data modeling point of view and digest several common modeling techniques.

via NoSQL Data Modeling Techniques « Highly Scalable Blog.

Analytics for the Sysadmin… used to be called Event Management, and taking ages to deploy, configure and optimize. Now the guys can just create their own analytics jobs. Sounds like a winner to me!

Rather than just produce a stream of tweet-like alerts to sysadmins, Nodeable would actually alert them to anomalies and emerging patterns that might signify a bigger problem to come. In doing that, Rosenberg said, the company realized it had actually created a real-time complement for Hadoop.

via Nodeable gives Hadoop a real-time boost with StreamReduce.