musings and one liners

Designing Database Systems for Failure

Inspired by Amazon’s recent downtime:

As relates to disaster recovery of databases, public cloud customers need three things:

  • Safe Data Guarantees: To have live, fully up to date and fully consistent copies of all your databases in a location of your own choice. That might be your corporate datacenter, a portable USB drive or an archive facility in a bunker under Nebraska. It might be more than one location.
  • Continuity of Service: To have a database system that runs concurrently in multiple datacenters and/or cloud availability zones with guarantees of consistency in all locations, and resilience to failure of any of those locations.
  • Capacity Recovery: To have the ability to add computers to a running database system to rebuild capacity that may have been lost due to a datacenter or region going down. And to have a database that can restart rapidly in a new location from a Safe Copy of the database see point 1, should all datacenters fail.

via the NuoDB blog: Amazon Downtime – Designing for Failure.