Our report from the incident on 05/04/2012 is as follows.
Issue
sms-sagat unresponsive
Underlying cause
Memory page fault caused a kernel panic
Symptoms
Complete loss of service on sms-sagat
Resolution
Continual memory tests are running on the system, but so far have shown without error. It is assumed it was a software fault (not hardware).
The RAID array is also degraded and being re-built, so performance is limited.
–
Follow Up
A SMART test was run on all drives and one drive reported bad sectors. As a result, this drive has been removed and replaced and the RAID array is rebuilding. An off-line snapshot has been taken of the system whilst the RAID array is degraded.