Sunday 1st July 2012

Network 01/07/2012 downtime explanation

Our report from the incident on 01/07/2012 is as follows.
 
Issue

Complete loss of power at Delta House, Manchester

Underlying cause

The core UPS infrastructure suffered a fatal error causing power to be cut on the entire “A” feed - causing a data centre site-wide outage affecting over 50 racks.

Symptoms

Complete loss of power and subsequent access to any equipment in Delta House

Resolution

A Sonassi engineer was on-site within 15 minutes of the incident. Data centre electricians had diagnosed the fault to be within the UPS and promptly bypassed it, to run power via mains directly.

Our engineer manually booted and tested each server in every rack to ensure it started cleanly with all HTTP services running. All machines started without fault or issue.

Engineers from the UPS manufacturer were dispatched to the data centre and are currently implementing a repair.