How to identify a specific cpu on E450

2007-12-25 11:40:00

My appreciation to:

Darren Dunham

Dave Foster

Matthew Lee Stier

Mamoon Kundi

Special thanks to Darren for exchanging several personal e-mails of trying to

lead me in finding the solution.

Some userful commands are:

prtdiag

psrinfo

Some useful system documentation:

http://docs.sun.com:80/ab2/@LegacyPageView?toc=SUNWab_137_1:/safedir/space3/coll1/SUNWabtmo/toc/UL450OWNG:1075;bt=Sun+Ultra+450+Workstation+Owner%27s+Guide;ps=ps/SUNWab_137_1/UL450OWNG/Hardware_Configuration#11

Solution:

After looking into all the info, I was still puzzled to identify the cpu1 which

is in question.

Then I decide to search the sunsolve and here is the info I found:

The order of CPU installation is not the same as the way the O/S numbers the

CPUs.

                      O/S probes the UPA slots from top to bottom (UPA slot

1=TOP 4=BOTTOM)

                      UPA slot 1 = CPU 0 J0101

                      UPA slot 2 = CPU 1 J0201

                      UPA slot 3 = CPU 2 J0301

                      UPA slot 4 = CPU 3 J0401

                      If system panics on bootup or keeps rebooting with RED

STATE EXCEPTION

                      and a CPU is a suspect disable one CPU at a time.

                      At the ok prompt:

                      ok setenv upa-port-skip-list <CPU#>

                      ok boot

                      Once the Failing CPU is found system should boot up with

the CPU

                      disabled. When CPU is replaced set the NVRAM variable back

 to original setting.

                      ok setenv upa-port-skip-list none

you can find above info on the NET at:

http://sunsolve.sun.com/private-cgi/retrieve.pl?doc=srdb%2F21220&zone_32=e450%20cpu

So to my cpu1 panic message, the CPU1 is the J0201 and it is on second slot.

Zion

=============================

Here is the original question:

To All Helpers:

I have an E450 system with three CPUs in it running Solaris 2.6. Recently it

crashed with Panic[cpu1] error message indicating the Ecache SRAM data parity

error.

I have got a new CPU to replace, after looking into the Server Owner's Guide, I

am kind of puzzle about which CPU this panic message indicated and which CPU

should I pull.

Could anyone give me some tip on this?

Thanks,

Zion

Here is the part of messages file when system was rebooted:

Dec 4 15:38:21 ds01tenfocus unix: cpu0: SUNW,UltraSPARC-II (upaid 1 impl 0x11 v

er 0x20 clock 296 MHz)

Dec 4 15:38:21 ds01tenfocus unix: cpu1: SUNW,UltraSPARC-II (upaid 2 impl 0x11 v

er 0x20 clock 296 MHz)

Dec 4 15:38:21 ds01tenfocus unix: cpu2: SUNW,UltraSPARC-II (upaid 3 impl 0x11 v

er 0x20 clock 296 MHz)

........

Dec 4 15:38:21 ds01tenfocus unix: PCI-device: SUNW,m64B@4, m64 #0

Dec 4 15:38:21 ds01tenfocus unix: SUNW,m64B0 is /pci@1f,4000/SUNW,m64B@4

Dec 4 15:38:21 ds01tenfocus unix: m64#0: 1152x900, 2M mappable, rev 4754.9a

Dec 4 15:38:21 ds01tenfocus unix: stdout is </pci@1f,4000/SUNW,m64B@4> major <6

4> minor <0>

Dec 4 15:38:21 ds01tenfocus unix: cpu 2 initialization complete - online

Dec 4 15:38:21 ds01tenfocus unix: cpu 3 initialization complete - online

Comments

Got something to say?

You must be logged in to post a comment.