Disabling failing memory on a 4800
2007-12-25 1:40:00
on vacation, so I do not have any of the documentation on site.
Said 4800 is clocking correctable errors on a DIMM and also lost a
CPU. We took the box last night and remote hardware man replaced the
failed CPU and moved the DIMM. We not have all 8 CPUs again... but
the DIMM is still failing and filling up every log with verbose
messages. I used to work on the CS6400 line (precursor to the E10K)
and know something of the capabilities of DR as they used to be.
Digging in SunSolve has not been very helpful, so I'm going to try
the list.
Is there any way I can successfully "blacklist" some part of memory
on the fly while we wait for the replacement memory to arrive? We
really don't want to take another downtime. My understanding is this
box should be DR'able. I understand this would potentially make a
mess of the interleaving... but I'd like to know if this is possible.
Thanks
Tim
--
Tim Kirby 651-605-9074
trk at cray.com Cray Inc. Information Systems
Comments
Got something to say?
You must be logged in to post a comment.

