Friday 25th May 2012

Network 25/04/2012 downtime explaination

Our report from the incident on 25/04/2012 is as follows.
 
Issue

sms-akuma restarted

Underlying cause

Watchdog triggered a reboot

Symptoms

Complete loss of service on sms-akuma.

Resolution

The automatic watchdog monitoring service restarted the server after detecting a non-recoverable error. After an automatic fsck - the system came back up successfully following 36 minutes downtime.