April 25th, 2013
The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers. This article provides a brief tour through the “visualization zoo,” showcasing techniques for visualizing and interacting with diverse data sets. In many situations, simple data graphics will not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated and unusual techniques that deal with complex data sets. After all, you don’t go to the zoo to see Chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the terrifying Sumatran tiger. Analogously, we cover some of the more exotic (but practically useful!) forms of visual data representation.
Great stuff for all of us who skipped the advanced statistics degree…
April 20th, 2013
Building scalable system is becoming a hotter and hotter topic. Mainly because more and more people are using computer these days, both the transaction volume and their performance expectation has grown tremendously. This one covers general considerations.
The mathematical approach is explained in Scalability at the Cost of Availability:
Do you associate scalability with availability? Sometimes these go hand-in-hand but sometimes these are at odds with each other. We’re obviously big proponents of architecting your systems so that you have the necessary scalability when you need it but we’re also realistic.
And last but not least, let’s also consider backend vs. frontent in Performance vs. Scalability:
If we speak about web systems now, it looks like we can roughly separate two main components in response time (which is the main performance metric): backend (server-side) time and frontend (network and client-side time).
Some good articles (also follow their links for even more good articles)
April 17th, 2013
Data scientist might be the sexiest job of the 21st century, but it’s hardly an easy gig to land. Here is some advice from practitioners at Netflix, Orbitz and Hortonworks on how get hired and even do the hiring.
I didn’t find more from Netflix and Orbitz online, but here’s the Hortonworks blog about How to build a Hadoop data science team?
Data scientists are in high demand these days. Everyone seems to be hiring a team of data scientists, yet many are still not quite sure what data science is all about, and what skill set they need to look for in a data scientist to build a stellar Hadoop data science team. We at Hortonworks believe data science is an evolving discipline that will continue to grow in demand in the coming years, especially with the growth of Hadoop adoption. This role requires experience and knowledge in math, statistics and machine learning, programming and scripting, as well as visualization techniques
And last but not least, a view from the inside, from a statistics guy, about the challenges of interdisciplinary working in this area – Statistics and the Science Club:
We are discovering that we can either teach people to apply the statistical methods to their data, or we can just do it ourselves! [...] However, I think as a field, we desperately need to promote both kinds of people, if only because we are the best people for the job. We need to expand the tent of statistics and include people who are using their statistical training to lead the new science. They may not be publishing papers in the Annals of Statistics or in JASA, but they are statisticians. If we do not move more in this direction, we risk missing out on one of the most exciting developments of our lifetime.
IBM said Thursday it would spend $1 billion to support flash storage in more of its products and open 12 facilities worldwide to show enterprises what a difference flash can make. It also unveiled its new FlashSystems line of flash-storage appliances.
And another billion… sounds like inflation to me!
April 12th, 2013
Having to develop in multiple languages on a daily basis – bane or boon?
Dataguise Presents 10 Best Practices for Securing Sensitive Data in Hadoop. Yeah, you gotta hop over to read it at myNoSQL…
April 11th, 2013
IT security pros are typically delighted to do away with employees’ option for consensual impersonation, and indeed, privileged identity management systems work really hard to make it impossible for those with superuser powers to do. But I suspect the consumer world isn’t quite ready for widespread two-step verification that cuts off this option.
I don’t know if that’s really so much a use case? Then again, I’m just one of these corporate IT security obsessed guys…
April 10th, 2013
We dubbed it “our Christmas,” the day when all our good engineering deeds would pay off and we’d be rewarded with a bounty of happy active users for our mobile app, Avocado.
Nice one, if your mobile app is growing like crazy. Via Check Yourself Before You Wreck Yourself – Avocado’s 5 Early Stages of Architecture Evolution.
April 9th, 2013
When the definitive story looking back on the effort to turn around the flailing technology giant Hewlett-Packard is written, today may be seen as a turning point, perhaps for the good, perhaps not so good.
But, as it turned out, HP’s Moonshot Gives Analysts a Case of the “Mehs”:
For all the hopes that have been pinned to Hewlett-Packard’s new line of tiny servers known as Moonshot, announced for shipment today, analysts certainly weren’t feeling it. Reserving judgment, two analysts wondered in notes to clients today if even the most optimistic outcome for Moonshot is enough to get HP on track.
But what can you expect from something that’s been talked about for more than 18 months? These kind of things never surprise the analysts…
April 9th, 2013
At the beginning of 2008 Sun Microsystems purchased MySQL AB, and ever since then there have been divisions in the ecosystem. As with any software community or ecosystem, where there are divisions there are usually forks, both in the community and the software itself.
MySQL history lesson.