DG/UX and RedHat Linux Having TCP/IP Stack Issues
I know this will probably be a reach asking for help on something like this, but you never know. The public community can sometimes steer you in the right direction.
For the sake of this issue, I'll narrow the picture down to just what's involved. I have a DG/UX 4.20MU07 AViiON machine and a RedHat Linux ES 3.0 machine. Both are patched as well as can be and otherwise run fine with no errors. I have a 3rd party database application that runs on both machines (different binaries of course). The app talks between the machines with a dedicated listener on each machine and clients below that. Basically, a standard 2 tier database solution... middle tier (DG) provides user interface and database tier (Linux) provides database access.
The process is as follows:
1) A user on the DG requests information from the database.
2) The request is passed to the local database client app which hands it off to the local network app.
3) The local network app sends it across the network to the remote network app which in turn drops it down to the remote client app.
4) The remote client app gathers the requested information and hands it back up to the remote network app which sends it back, once again, to the local network app, then to the local client app, then to the user.
This process works great 24/7, 99.9% of the time. Every once in a great while a process gets "hung" on the middle tier. Our database software vendor used some debugging tools to determine that (as far as he can tell) the packet is making it from the user, to the local client, to the local network client, across the wire to the remote network client, to the remote client, the information is pulled from the database, the returning packet is then sent to the remote network client where it is dropped on the Linux TCP/IP stack and that's as far as it can be traced with the tools he has. He best guess is that the packet is a) not leaving the Linux box or b) not being received by the DG box. He determined that the local process is sitting there waiting for that return packet (which never arrives), so it appears to be "hung".
About the only thing I can come up with to help troubleshoot something of this nature is a massive network packet capture program. Only thing is, I can't replicate the issue and it happens at random, infrequently. I would have to run the capture 24/7 and purge every day (or less) to keep capture log sizes manageable. Is there a better way? Am I barking up the wrong tree? I have contacted our DG Support Team and our Database Support Team, but RedHat/Linux doesn't have much of a support offering.
Thanks in advance for any suggestions.
|