Experiencing packet loss within network core.
Our report from the incident is as follows.
Significant packet loss, causing over 50% of packets to be dropped to a single rack of equipment and a secondary symptomatic effect of <10% loss within the network core. This had a significant effect on servers at Joule House, causing a total service loss to the servers connected to the respective access switch stack.
The duration was 63 minutes.
Flooding within a single access switch, causing significant control plane CPU consumption within other network devices.
Our external monitoring probes immediately reported the fault. End users will have noticed the issue as it had an effect on overall service.
The switch generating the traffic was observed to be consuming 100% CPU, it was initially power cycled in the hope that the device would become responsive again. Unfortunately, the issue propagated to the remaining 5 switches within the stack (in a single rack), generating further problems.
To avoid major network disruption for the entire location, all access switches were powered off simultaneously, then powered back up one at a time. This restored service and resolved the issue at hand.
Technical reports will be submitted to the vendor for analysis, however, with upcoming access-layer networking upgrades, it is unlikely this will be pursued further.