All systems are operational

Past Incidents

Wednesday 9th May 2012

No incidents reported

Tuesday 8th May 2012

No incidents reported

Monday 7th May 2012

No incidents reported

Sunday 6th May 2012

No incidents reported

Saturday 5th May 2012

Network 05/04/2012 downtime explaination
13 years ago

Our report from the incident on 05/04/2012 is as follows.
 
Issue

sms-sagat unresponsive

Underlying cause

Memory page fault caused a kernel panic

Symptoms

Complete loss of service on sms-sagat

Resolution

  1. After detecting the server was down, the machine’s serial console output was reviewed to show a kernel panic.
  2. The system was powered down, memory re-seated, and powered up into a rescue environment to run memtest+   
  3. Memtest completed 1 pass without error
  4. Server was powered back on into normal run level

Continual memory tests are running on the system, but so far have shown without error. It is assumed it was a software fault (not hardware).

The RAID array is also degraded and being re-built, so performance is limited.

Follow Up

A SMART test was run on all drives and one drive reported bad sectors. As a result, this drive has been removed and replaced and the RAID array is rebuilding. An off-line snapshot has been taken of the system whilst the RAID array is degraded.

Network Sms-sagat unresponsive
13 years ago

sms-sagat has stopped responding to ping, we are investigating now.

Friday 4th May 2012

No incidents reported

Thursday 3rd May 2012

No incidents reported