Saturday 30th December 2017

Network Network Interruption

Disruption

  • Update (10:18): Our team are currently investigating what appears to be a major outage.
  • Update (10:31): One of our transit providers appears to have suffered a major outage and we are actively re-routing traffic around them.
  • Update (10:43): Traffic is now flowing as normal and we're in active contact with one of our upstream providers to understand the nature of the issue.
  • Update (10:56): The outage has been confirmed and acknowledged by one of our transit providers, caused by the failure of a core router. We are continuing to route traffic around this provider.

Post-Mortem

Our report from the incident is as follows.

Issue

Loss of connectivity from some ISPs.

Outage Length

The duration was 6 minutes.

Underlying cause

The outage experienced was due to one of our transit providers suffering an unexpected reboot of a router in Manchester. High CPU was noted at the time of the reboot which was what was responsible for dropped packets prior to reboot.

Symptoms

Our monitoring probes immediately reported the packet loss, which affected approximately 30% of our total inbound traffic.

Resolution

Our network operations team immediately shut down the connectivity to the affected transit provider and re-routed traffic around them. This restored full connectivity within seconds. The affected transit provider will remain "shut down" until we have seen consistent healthy performance, after which, it will be added to our transit pool.

Sonassi maintains connectivity from multiple independent transit providers to provide internet connectivity resilience. In this instance, a single provider failed, resulting in some traffic being briefly dropped prior to re-routing.