Thursday 14th May 2015

Network Network Interruption

Experiencing some packet loss to some internal routes.

  • Update (15:32): The issue has been traced to an upstream provider and an incidient has been raised with them. Updates to follow.
  • Update (15:37): The provider is looking into the issue.
  • Update (15:49): We are starting to see routes coming back up again.
  • Update (15:51): We have had confirmation that the provider in question has reverted a change they made at 15:30 to restore the previous configuration. Routing looks to be “normal”. We will continue to monitor.
  • Update (16:06): We are seeing full stability with the previously affected routes. Once an RFO has been provided from our upstream provider, we will update this incident with a post-mortem. -

Post-Mortem

Our report from the incident is as follows.

Issue

A very small number of IP subnets encountered a routing loop and some customers servers were inaccessible.

Outage Length

The duration was 23 minutes.

Underlying cause

The affected IP ranges are those we carry as a legacy from a historic provider - they do not form part of our multihomed BGP network and as such are subject to possible outages should the transit provider supplying them encouter problems.

The routing loop occured because of a configuration change on the upstream providers network.

Symptoms

Access to servers behind the affected subnets was impossible, a routing loop was visible on a traceroute.

Resolution

The provider reverted the change and service was immediately restored.

The long-term plan is to renumber all these IP addresses into our own, so that they can be announced over our resilient, mutlihomed BGP network. This task is already underway and customers will be contacted soon to arrange for changeover dates.