Power Supply failure – the wrong type of failure
I’ve lost an external DAS (Direct Attached Storage) today. Not lost as in could not find, but lost due to power supply failure. I’ve been home, got a phone call saying all the company’s Unix storage, which is contained on a DAS including two 72GB HDDs in a mirror (DiskSuite on Solaris8) is not available. Remotely, I could not reach it. I/O Error on each request (ls, cd, etc). I’ve had to reach the place. There I’ve noticed lots of error messages generated by the kernel when trying to access the disks. After lots of games (I will not describe the procedure, but it included replacing external SCSI cable, disconnecting one of the disks, etc), I have replaced the DAS module, and put the older disks in it, making sure I use the same LUN the disks had before (for DiskSuite’s sake).
Conclusion – Power Supply failure, but not an absolute failure. The lights remained working, and disks could spin-up, but when required to work, the power supply failed to give the disks the whole power capacity the disks required, resulting in read/write errors. Working, but just not quite.
Tracking the problem, and hacking a different SCSI DAS module required almost two hours of my life. I hope never to encounter such a problem again.