Disruption
Post-Mortem
Our report from the incident is as follows.
Issue
Loss of connectivity from some ISPs.
Outage Length
The duration was 6 minutes.
Underlying cause
The outage experienced was due to one of our transit providers suffering an unexpected reboot of a router in Manchester. High CPU was noted at the time of the reboot which was what was responsible for dropped packets prior to reboot.
Symptoms
Our monitoring probes immediately reported the packet loss, which affected approximately 30% of our total inbound traffic.
Resolution
Our network operations team immediately shut down the connectivity to the affected transit provider and re-routed traffic around them. This restored full connectivity within seconds. The affected transit provider will remain "shut down" until we have seen consistent healthy performance, after which, it will be added to our transit pool.
Sonassi maintains connectivity from multiple independent transit providers to provide internet connectivity resilience. In this instance, a single provider failed, resulting in some traffic being briefly dropped prior to re-routing.