Call us: +44 (0)330 043 1625
All systems are operational

Past Incidents

Friday 3rd January 2014

Network Network connectivity fault

We have become aware of core network issue with one of our providers. Updates to follow.

  • Update (14:10): The fault has been identified with a single provider, changes are being made to take this offline and to use an alternate until such time that it is fixed.
  • Update (14:20): The affected provider has been isolated and temporarily dropped from our network and connectivity is restored. Updates to follow when this provider is restored again.

Post-Mortem

Our report from the incident is as follows.

Issue

Partial packet loss affecting global connectivity.

Outage Length

5 minutes.

Underlying cause

One of our carriers saw a large burst of traffic within their network resulting in a loss of connectivity.

Symptoms

Our internal and external monitoring probes immediately reported a fault.

Resolution

No active steps needed to be taken. Connectivity gracefully fell over to another carrier. A manual de-preference of that carrier was added to prevent its use until connectivity continuity was restored.

Thursday 2nd January 2014

No incidents reported

Wednesday 1st January 2014

No incidents reported

Tuesday 31st December 2013

No incidents reported

Monday 30th December 2013

No incidents reported

Sunday 29th December 2013

Network Emergency Maintenance

sms-jay

Estimated Downtime

20 minutes

Actions

Since the restart, I/O wait has been higher than expected on sms-jay. We feel that a kernel update would be a wise manoeuvre given the age of the current release. 

  • Update (11:26): The new kernel is installed and all tests are complete.

Saturday 28th December 2013

Network Sms-jay restarted

sms-jay has stopped responding.

  • Update (11:30): sms-jay had restarted (cause currently unknown), upon reboot, as the server had been up for over 347 days without a reboot, a file system check must be carried out. 
    image
  • Update (11:32): Estimated completion time is approximately another 30 minutes.
  • Update (11:39): Progress is currently at 87.8%
  • Update (11:43): Automatic fsck failed, a manual fsck is now being run and overseen to correct file system errors.
  • Update (11:51): The manual fsck is complete and the errors have been fixed. The server is now rebooting and should be up shortly.
  • Update (11:56): The server is now up and operational and web services are being started.
  • Update (11:59): Web services are operational, load will be high (and thus the server will be slower than usual) whilst the RAID array rebuilds (estimated 8 hours).