Our report from the incident on 05/04/2012 is as follows.
Memory page fault caused a kernel panic
Complete loss of service on sms-sagat
After detecting the server was down, the machine’s serial console output was reviewed to show a kernel panic.
The system was powered down, memory re-seated, and powered up into a rescue environment to run memtest+
Memtest completed 1 pass without error
Server was powered back on into normal run level
Continual memory tests are running on the system, but so far have shown without error. It is assumed it was a software fault (not hardware).
The RAID array is also degraded and being re-built, so performance is limited.
A SMART test was run on all drives and one drive reported bad sectors. As a result, this drive has been removed and replaced and the RAID array is rebuilding. An off-line snapshot has been taken of the system whilst the RAID array is degraded.