AIXThis forum is for the discussion of IBM AIX.
eserver and other IBM related questions are also on topic.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I've been looking for descriptions of these services, but can't find a lot of information on them, yet. I've been regularly getting the following errors:
IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
864D2CE3 0501145606 P S topsvcs NIM thread blocked
FA723BD9 0501145606 I S topsvcs Deadman Switch (DMS) close to trigger
864D2CE3 0501145606 P S topsvcs NIM thread blocked
.
.
.
.
3C81E43F 0501145606 P U topsvcs Late in sending heartbeat
Now, from what little I've been able to gather, this sounds like a buffer overrun problem; however, the response to the original poster's questions was unclear.
Does anybody know what I'm looking at here? I'm hoping to understand what is behind the problem, find the solution, and then apply it; but, I'm hoping to learn the 'why's' =).
In this context - NIM is the abbreviation for "Network Interface Module" and should not be confused with nim aka "network installation manager".
What you should be trying to look at is the output of the command:
lssrc -ls topsvcs
I dont have a HACMP cluster handy atm, but among other things it will tell you about how well the heartbeats are being passed.
It appears that you have only one network in your cluster configuration. A non-IP network is needed (read required) to prevent network failures (or NIM failures) from creating a partitioned cluster.
Basically, the function of the deadman switch is to keep track of when the node has last been able to tell the other active nodes that it is still active. A message or heartbeat sent over ANY of the networks is enough to satisfy the deadman switch requirement. (all networks (note plural) is not a single failure (SPOF) and HACMP is designed to handle a single SPOF - that it often handles more is a bonus, not design.
Next step for here at least will be a verbose errpt output:
errpt -aJ 864D2CE3
I am hoping there will be more information about which interface is failing.
And it helps to verify you have the latest fixes installed, etc..
I apologize for taking so long to get back to you, but I've been out of town on a business trip...
Here's the result of errrpt -aj....
Code:
---------------------------------------------------------------------------
LABEL: TS_NIM_ERROR_STUCK_
IDENTIFIER: 864D2CE3
Date/Time: Mon May 1 14:56:49 ADT
Sequence Number: 9112
Machine Id: 00C0853E4C00
Node Id: akkotz
Class: S
Type: PERM
Resource Name: topsvcs
Description
NIM thread blocked
Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39,5455
ERROR ID
6XnGH40l6dJ2/j1T1/w7k.1...................
REFERENCE CODE
Thread which was blocked
receive thread
Interval in seconds during which process was blocked
229
Interface name
en1
---------------------------------------------------------------------------
LABEL: TS_NIM_ERROR_STUCK_
IDENTIFIER: 864D2CE3
Date/Time: Mon May 1 14:56:49 ADT
Sequence Number: 9110
Machine Id: 00C0853E4C00
Node Id: akkotz
Class: S
Type: PERM
Resource Name: topsvcs
Description
NIM thread blocked
Probable Causes
A thread in a Topology Services Network Interface Module (NIM) process
was blocked
Topology Services NIM process cannot get timely access to CPU
User Causes
Excessive memory consumption is causing high memory contention
Excessive disk I/O is causing high memory contention
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Failure Causes
Excessive virtual memory activity prevents NIM from making progress
Excessive disk I/O traffic is interfering with paging I/O
Recommended Actions
Examine I/O and memory activity on the system
Reduce load on the system
Tune virtual memory parameters
Call IBM Service if problem persists
Detail Data
DETECTING MODULE
rsct,nim_control.C,1.39,5455
ERROR ID
6XnGH40l6dJ2/1HV./w7k.1...................
REFERENCE CODE
Thread which was blocked
receive thread
Interval in seconds during which process was blocked
228
Interface name
en2
I ran diagnostics on the card (netstat, etc.) & AIX did not find any problems with the card itself. I don't have the error code, but AIX reported that it is either the cable connection to our switch or the port on the switch itself.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.