Facebook is using heatmaps to monitor its thousands of servers and services. They built their own tool for that, Claspin, and recently explained it a bit: Monitoring cache with Claspin.
That’s an awesome approach to proactive monitoring, and very slick UI to present the tribal knowledge about bad values in servers. I wonder if they’re also using it to monitor end to end services, or that’s something they don’t need, and only a more traditional enterprise company might need.
More coverage also in GigaOm and TechCrunch.