LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   AIX (https://www.linuxquestions.org/questions/aix-43/)
-   -   SP2 node issue - Auto-join helping or hurting? (https://www.linuxquestions.org/questions/aix-43/sp2-node-issue-auto-join-helping-or-hurting-209582/)

zepplin611 07-25-2004 06:05 PM

SP2 node issue - Auto-join helping or hurting?
 
Greetings fellow AIX'ers,

I administer an SP2 (AIX 4.3.3 and PSSP 3.2)....recently I typed:

spmon -d (on the control work station -- CWS) and the following was outputted for one of our nodes:

----------------------------------- Frame 1 -------------------------------------------------------------
Host Switch Key Env Front Panel LCD/LED
Slot Node Type Power Responds Responds Switch Error LCD/LED Flashes
---- ---- ----- ----- -------- -------- ------- ----- ---------------- ----------------------------------------

15 15 thin on yes autojn N/A no LCDs are blank no

So node 15 has a Switch responds values of: "autojn" What exactly does this mean?


The output for other nodes in this frame look similar to:

14 14 thin on yes yes N/A no LCDs are blank no

where a value of "yes" sits in the Switch Responds column....indicating the node is able to speak
over the switch (SP Switch)

I went into the "Perspectives" Gui and i opened up the node 15 configuration properties and what
was listed under the "SWITCH RESPONDS" was: "Fenced with Autojoin"....I clicked through to
"unfence" this node and the following error popped up:

Unfencing nodes:
15
Eunfence: 0028-162 Node specified by Node Number 15 is not currently fenced.
Unable to unfence the following nodes:
n15 -- SP node number 15 Not Fenced


Can any provide an indication what the "Autojoin" indicator is, and why it does not allow me to
unfence this seemingly fenced off node???

thanks a bunch

zepp

screwloose 07-27-2004 10:59 AM

The autojoin was a feature that was added to PSSP (in one of the 3.x versions, I think) that will let the nodes automatically join the switch instead of having to do it manually.

Check the node itself and see if the worm is running:
# ps –ef | grep –i worm
root 19954 1 0 Jul 10 - 3:22 /usr/lpp/ssp/css/fault_service_Wo
rm_RTG_SP -r 0 -b 1 -s 5 -p 3 -a TB3 -t 28

If it isn’t running then run the rc.switch that is in /etc/inittab to start it:
# /usr/lpp/ssp/css/rc.switch

Also, since it shows fenced but replies unfenced, you can try to fence the node:
# fence 15
followed by:
# unfenced 15

Sometimes it can get messed up and show fenced/unfenced when nodes have all been brought down/up without having Equiesced the switch first. But in this case it should tell you when trying to unfenced the node to start the switch, which is done with ‘Estart’.

The log files are in /var/adm/SPlogs, and the ones that you can look at are: rc.switch.log and worm.trace which should identify problems with the node in question. I would look on the primary node for the SPlogs, too. You can find the primary node (and backup) by typing ‘Eprimary’.

screwloose 07-27-2004 11:01 AM

It is 'unfence' though I mistakenly typed 'unfenced' a couple of times.

zepplin611 08-10-2004 02:04 PM

thanks for the help screwloose...and sorry about the delay:

i was able to Efence the node...and the current line now reads (after typing spmon -d: )

15 15 thin on yes no N/A no LCDs are blank no

So under the column: Switch Responds...there is now the label: No

How do i get this to go to: Yes?

Thanks....

screwloose 08-11-2004 11:33 AM

Did you run 'Eunfence 15' on the node after running Efence? Also make sure the worm daemon is running on node15 using 'ps -ef | grep -i worm'. If the worm is not running then run '/usr/lpp/ssp/css/rc.switch'.

Let me know the results.

zepplin611 08-11-2004 12:28 PM

root@CWS:/var/adm/SPlogs# Eunfence 15
Eunfence: 0028-162 Node specified by Node Number 15 is not currently fenced.
Unable to unfence the following nodes:
N15, SP node number 15 Not Fenced


# ps -ef |grep -i worm
root 7264 1 0 Jul 17 - 0:00 /usr/lpp/ssp/css/fault_service_Worm_RTG_SP -r 14 -b 1 -s 7 -p 2 -a TB3 -t 28


So the worm is running on node15 and according to the Eunfence command...it is not fenced!!!

in circles we go...any help?

iainr 08-11-2004 02:17 PM

Be cautious about this one, because it could disrupt the switch network if it currently working.

Try

Eclock -d
Estart

That resychronises the switch internal clock between the nodes and then restarts the switch.

zepplin611 08-11-2004 02:45 PM

iainr...thanks a bunch...this worked!!!!

Question though...when the switch-responds indicated: "No" <--- is it correct to say that the node
was "off" of the switch...that is, communication between that node and say, another node, through
the switch, was not possible?

thanks....

zepp

screwloose 08-12-2004 11:28 AM

If you had tried to Efence/Eunfence other nodes and had the same problem then you would have known the switch wasn’t running. It is possible that you could have ran ‘Estart’ without clocking the switch and the nodes would have rejoined. I have seen this happen where one or two nodes are not on the switch and all other nodes report they are joined. However, if you try to run anything it will respond the switch isn’t running.


All times are GMT -5. The time now is 02:28 AM.