E250 reboot

2007-12-25 11:40:00

Original message: (See bottom)

Thank you for all of the people who give me the answers:

Christian Pinheiro <pinheiro@veritel.com.br>

Bruce Cheng <bcheng@corio.com>

Ulan Mamytov <tgr@ns2.kyrnet.kg>

John Chrisoulakis <john.chrisoulakis@antdiv.gov.au>

Mohammed <abusakit@pop.dnvr.uswest.net>

Salman Farooq TNG <Salman2@wipro.co.in>

Sue <Sue_Thielen@psdi.com>

Mike Watts <mikewatts@traverse.com>

H.S. Yann <yann@veritel.com.br>

Summary of the suggested solutions:

-----------------------------------

* Disconnect, clean and reconnect CPU and memories.

* CPU hardware problem, check /var/adm with grep "cpu" carefully.

  Upgrade the latest kernel patches. ("uname -a" shows the patche version)

* Check the DIMMs. It is possible that ECC error cause the problem.

* Try Solstice DiskSuite 4.1 patch 104172. Without the patch, if root

  partition is mirrored, it may cause crashing.

* Read the core dump to analyze the problem.

* Make sure power supply is ok. Also pay attention to power safe mode.

The solution for our case:

--------------------------

It is CPU hardware problem. We found the following information inside

/var/adm/messages.* file (more than one entry). However, most time,

the machine just hang without any messages.

Dec 3 17:57:31 hostnm unix: BAD TRAP: cpu=0

     type=0x10 rp=0x30437898 addr=0x6188567c mmu_fsr=0x0

Nov 22 11:39:50 hostnm unix: panic[cpu0]/thread=0x30023e80:

     CPU0 Ecache SRAM Data Parity Error: AFSR 0x00000000

     80400500 AFAR 0x00000000 000fff60

     

Sun says "CPU0 Ecache SRAM Data Parity Error" is a hardware problem,

and they shiped a new CPU to us. We replaced the CPU-0, and the

problem is fixed. So far machine is stable.

Original question:

------------------

>I have an Ultra Enterprise 250, 2 CPUs, Soalris 2.6,

>Sun Solstice Disk Suite 4.1 installed, and Sun

>2.6 Recommaned patches are installed, Y2K patches

>installed.

>

>The problem is it sometimes reboots by itself. After

>the reboot, there is no error message in /var/adm/messages

>file, no error message in /var/log/* files. We called

>sun, sun shipped us new motherboard and new new CPUs .

>After the replacement, the situtaiton got much better,

>but still sometimes reboots or hang by itself.

>

>Sometimes, just after reboot, I can see the error message

>when I type the command 'dmesg', such as:

>

>panic[cpu0]/thread=0x30023e80:

>

Comments

Got something to say?

You must be logged in to post a comment.