GigaOm kicked off some good discussion in Why the days are numbered for Hadoop as we know it:
Hadoop is everywhere. For better or worse, it has become synonymous with big data. In just a few years it has gone from a fringe technology to the de facto standard. Want to be big bata or enterprise analytics or BI-compliant? You better play well with Hadoop.
It’s therefore far from controversial to say that Hadoop is firmly planted in the enterprise as the big data standard and will likely remain firmly entrenched for at least another decade. But, building on some previous discussion, I’m going to go out on a limb and ask, “Is the enterprise buying into a technology whose best day has already passed?”
Realtime is king. Low latency his queen. Or the other way around 😉 I don’t think anybody who would refuse getting their results faster, they’re just not complaining about Hadoop and MapReduce today because they don’t know better. Or rather, because better solutions aren’t available at the right price point yet.
Comments
2 responses to “Why the days are numbered for Hadoop as we know it”
I’m really surprised that Hadoop already found its way to the datacenter world.
It’s still a highly experimental field in the high performance area. Other technologies like the distributed memory model took more then a decade to be adopted by classic IT. At least we see it with products like the Oracle Exadata and Exalogic. From my point of view classic high performance clusters, using off the shelf servers with a relatively small CPU count and Infiniband as an interconnect. It’s a good business for Oracle, I talked to several former SUN customers which told me that all components had a price rise of up to four times. While the prices of the final system is still in the range of classic large SMP boxes, Oracle by itself reduced their costs significantly. Good strategy, at least for Oracle 🙂
When looking at the top clusters in the world, there is an amazing development in data processing ongoing. Systems with multiple 100’000 cores are producing a gigantic data volume and storage is a major issue. There are many interesting projects which are basically visualizing the results on the fly. Instead of storing the data and trying to do magic like MapReduce, the data is concluded during calculation and transfered directly to the visualization system. Instead of submitting jobs to a queue, the scientist sits in a box with 3D projections on every wall and “travels” through the data. Changing input parameters are visualized on the fly. Of course a system has to be reserved for some time, but the time to result is probably much faster compared to the classic batch oriented way.
Once those people are leaving the academic environment and left into the wild, we may see completely new approaches to handle large data. Maybe also a shift from storage to compute power. With the increasing power per CPU / Watt / $$$ it may be more interesting to recalculate stuff and keep it in the machine instead of trying to save it to disks.
Thx for the insight, Beat.