May 25th, 2013
Fascinating interview with a (former) blackhat:
One ‘blackhat,’ who asked to be called Adam, that I have spoken to a lot has recently says he’s decided to go legit. During this life-changing transition, he offered to give an interview so that the rest of the security community could learn from his point of view. Not every blackhat wants to talk, for obvious reasons, so this is a rare opportunity to see the world through his eyes, even if we’re unable to verify any of the claims made. [...]
“I like to watch the news; especially the financial side of it. Say if a target just started up and it suddenly sky rocketed in online sales that’ll become a target. Most of these websites have admins behind them who have no practical experience of being the bad guy and how the bad guys think. This leaves them hugely vulnerable.”
“One thing that did hugely affect bot infection rates was the mass removal of Java. When news of a java 0-day gets published people panic (rightly so) and un-install it or patch but as we all know java never stays secure for long.”
“It’s super hard to gather evidence for the crime, and even so the money is impossible to find. Ten or eleven mil over 10-13 years for a 10-15 year sentence. I can’t really say what it’d be like without freedom as I’ve always had it so I can’t imagine losing it.”
May 16th, 2013
I couldn’t help but think of these two together, because I happened to read them within hours.
I think that the data revolution is just getting started. Datasets are currently being, or have already been, collected that contain, hidden in their complexity, important truths waiting to be discovered. These discoveries will increase the scientific understanding of our world. Statisticians should be excited and ready to play an important role in the new scientific renaissance driven by the measurement revolution.
And Stephen Few ranting about the term “Big Data” in A More Thoughtful but No More Convincing View of Big Data:
I have a problem with Big Data. As someone who makes his living working with data and helping others do the same as effectively as possible, my objection doesn’t stem from a problem with data itself, but instead from the misleading claims that people often make about data when they refer to it as Big Data. I have frequently described Big Data as nothing more than a marketing campaign cooked up by companies that sell information technologies either directly (software and hardware vendors) or indirectly (analyst groups such as Gartner and Forrester).
Isn’t Big Data just a (marketing) term for the category of data sets that are difficult to store and analyze with traditional tools? Obviously what size and tools we’re talking about is changing over time…
May 6th, 2013
I am lazy. If there is a shortcut I will take it. I love feeling accomplished, but I don’t always love the hard work it takes to get there.
In the past the hurricane days were exception. And when those days were over, I would be amazed at my achievements and ponder: “If only I could do this everyday, how organized/successful/happy would I be?” I would find my weekends had gone by and I hadn’t really done anything. I needed to make a change.
Here’s your May dose of productivity and anti-procrastination posts…
May 5th, 2013
Good wrap up of the big iron storage industry.
EMC has been gaining marketshare over the last several years. The world’s largest data storage company is getting larger. [...] EMC’s position is analogous to IBM’s in the 70s: EMC has the most successful scale-up OLTP arrays; offers better support; and keeps adding useful features. [...] Expect to see a several of the dwarves leave the big iron storage array business. Let’s look at each of the competitors in turn.
EMC vs. Oracle, Hitachi, Dell, NetApp, HP, IBM and Fujitsu.
May 2nd, 2013
Eric Brewer about CAP Twelve Years Later: How the “Rules” Have Changed:
The CAP theorem asserts that any networked shared-data system can have only two of three desirable properties. However, by explicitly handling partitions, designers can optimize consistency and availability, thereby achieving some trade-off of all three. In the decade since its introduction, designers and researchers have used (and sometimes abused) the CAP theorem as a reason to explore a wide variety of novel distributed systems. The NoSQL movement also has applied it as an argument against traditional databases. [...]
The “2 of 3″ formulation was always misleading because it tended to oversimplify the tensions among properties. Now such nuances matter. CAP prohibits only a tiny part of the design space: perfect availability and consistency in the presence of partitions, which are rare. Although designers still need to choose between consistency and availability when partitions are present, there is an incredible range of flexibility for handling partitions and recovering from them. The modern CAP goal should be to maximize combinations of consistency and availability that make sense for the specific application. Such an approach incorporates plans for operation during a partition and for recovery afterward, thus helping designers think about CAP beyond its historically perceived limitations.
And Todd Hoff recently wrote about a later presentation Brewer gave, and that motivated me to finally blog about above article… Myth: Eric Brewer on Why Banks are BASE Not ACID – Availability Is Revenue:
In NoSQL: Past, Present, FutureEric Brewer has a particularly fine section on explaining the often hard to understand ideas of BASE (Basically Available, Soft State, Eventually Consistent), ACID (Atomicity, Consistency, Isolation, Durability), CAP (Consistency Availability, Partition Tolerance), in terms of a pernicious long standing myth about the sanctity of consistency in banking.
Some good examples about banking and ACID requirements… or the lack thereof, and how that risk is contained.
April 28th, 2013
Few people in Silicon Valley wear as many hats as Aneel Bhusri. Currently known primarily for his role as co-CEO of Workday, the cloud-based human resources software company that floated in an IPO last year, he also maintains an active role as a partner at venture capital firm Greylock Partners.
On leveraging your architecture and your customers’ data to expand into new markets: Financials and Big Data.
April 28th, 2013
Not everybody knows How to Tell a Story with Data:
So how does a visual designer tell a story with a visualization? The analysis has to find the story that the data supports. Traditional journalism does this all the time, and journalists have become very good at storytelling with visualization via infographics. In that vein, here are some journalistic strategies on telling a good story that apply to data visualizations as well.
Stephen Wolfram is one of them, after a year of collecting Facebook users’ data. Data Science of the Facebook World:
More than a million people have now used our Wolfram|Alpha Personal Analytics for Facebook. And as part of our latest update, in addition to collecting some anonymized statistics, we launched a Data Donor program that allows people to contribute detailed data to us for research purposes.
Well done, and interesting read.
April 25th, 2013
The goal of visualization is to aid our understanding of data by leveraging the human visual system’s highly tuned ability to see patterns, spot trends, and identify outliers. This article provides a brief tour through the “visualization zoo,” showcasing techniques for visualizing and interacting with diverse data sets. In many situations, simple data graphics will not only suffice, they may also be preferable. Here we focus on a few of the more sophisticated and unusual techniques that deal with complex data sets. After all, you don’t go to the zoo to see Chihuahuas and raccoons; you go to admire the majestic polar bear, the graceful zebra, and the terrifying Sumatran tiger. Analogously, we cover some of the more exotic (but practically useful!) forms of visual data representation.
Great stuff for all of us who skipped the advanced statistics degree…
April 20th, 2013
Building scalable system is becoming a hotter and hotter topic. Mainly because more and more people are using computer these days, both the transaction volume and their performance expectation has grown tremendously. This one covers general considerations.
The mathematical approach is explained in Scalability at the Cost of Availability:
Do you associate scalability with availability? Sometimes these go hand-in-hand but sometimes these are at odds with each other. We’re obviously big proponents of architecting your systems so that you have the necessary scalability when you need it but we’re also realistic.
And last but not least, let’s also consider backend vs. frontent in Performance vs. Scalability:
If we speak about web systems now, it looks like we can roughly separate two main components in response time (which is the main performance metric): backend (server-side) time and frontend (network and client-side time).
Some good articles (also follow their links for even more good articles)
April 17th, 2013
Data scientist might be the sexiest job of the 21st century, but it’s hardly an easy gig to land. Here is some advice from practitioners at Netflix, Orbitz and Hortonworks on how get hired and even do the hiring.
I didn’t find more from Netflix and Orbitz online, but here’s the Hortonworks blog about How to build a Hadoop data science team?
Data scientists are in high demand these days. Everyone seems to be hiring a team of data scientists, yet many are still not quite sure what data science is all about, and what skill set they need to look for in a data scientist to build a stellar Hadoop data science team. We at Hortonworks believe data science is an evolving discipline that will continue to grow in demand in the coming years, especially with the growth of Hadoop adoption. This role requires experience and knowledge in math, statistics and machine learning, programming and scripting, as well as visualization techniques
And last but not least, a view from the inside, from a statistics guy, about the challenges of interdisciplinary working in this area – Statistics and the Science Club:
We are discovering that we can either teach people to apply the statistical methods to their data, or we can just do it ourselves! [...] However, I think as a field, we desperately need to promote both kinds of people, if only because we are the best people for the job. We need to expand the tent of statistics and include people who are using their statistical training to lead the new science. They may not be publishing papers in the Annals of Statistics or in JASA, but they are statisticians. If we do not move more in this direction, we risk missing out on one of the most exciting developments of our lifetime.