Disclaimer: I know nothing about Oracle Apps and/or Java
We've moved our Oracle 11i Apps Middle Tier from a server running RHEL4u6 to RHEL4u8, both are 64-bit running with 16g of RAM and 4-dual core Intel CPUs.
After moving to the new server, we frequently see some Oracle tools hanging on futex_wait and hanging out forever.
I am having problems figuring out what's going on and how to debug this. I have a feeling that because Oracle is still 32-bit, that we missed something.
Here's what I have so far on debugging:
BackTrace of one hung app:
Code:
gdb - 10269
GNU gdb Red Hat Linux (6.3.0.0-1.162.el4rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...-: No such file or directory.
Attaching to process 10269
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/bin/opmn...done.
Using host libthread_db library "/lib64/tls/libthread_db.so.1".
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0xffffe000
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libmodapi.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libmodapi.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/lib/libclntsh.so.10.1...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/lib/libclntsh.so.10.1
Reading symbols from /lib/tls/libpthread.so.0...done.
[Thread debugging using libthread_db enabled]
[New Thread 4146100992 (LWP 10269)]
[New Thread 3954174880 (LWP 11688)]
[New Thread 3976698784 (LWP 11569)]
[New Thread 3943685024 (LWP 11486)]
[New Thread 3987299232 (LWP 11454)]
[New Thread 3964664736 (LWP 10578)]
[New Thread 3997789088 (LWP 10291)]
[New Thread 4008278944 (LWP 10290)]
[New Thread 4018768800 (LWP 10289)]
[New Thread 4029258656 (LWP 10288)]
[New Thread 4039748512 (LWP 10287)]
[New Thread 4050238368 (LWP 10286)]
[New Thread 4071218080 (LWP 10284)]
[New Thread 4081707936 (LWP 10283)]
[New Thread 4092197792 (LWP 10282)]
[New Thread 4102687648 (LWP 10281)]
[New Thread 4113177504 (LWP 10280)]
[New Thread 4123667360 (LWP 10272)]
[New Thread 4134165408 (LWP 10271)]
[New Thread 4144655264 (LWP 10270)]
Loaded symbols for /lib/tls/libpthread.so.0
Reading symbols from /lib/libnsl.so.1...done.
Loaded symbols for /lib/libnsl.so.1
Reading symbols from /lib/tls/libm.so.6...done.
Loaded symbols for /lib/tls/libm.so.6
Reading symbols from /lib/libcrypt.so.1...done.
Loaded symbols for /lib/libcrypt.so.1
Reading symbols from /lib/libdl.so.2...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/tls/libc.so.6...done.
Loaded symbols for /lib/tls/libc.so.6
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/lib/libnnz10.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/lib/libnnz10.so
Reading symbols from /lib/ld-linux.so.2...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from /lib/libnss_files.so.2...done.
Loaded symbols for /lib/libnss_files.so.2
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnohs.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnohs.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnoc4j.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnoc4j.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmncustom.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmncustom.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnwc.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnwc.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmniaspt.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmniaspt.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmndisco.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmndisco.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnip.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnip.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnoid.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnoid.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnwireless.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnwireless.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnreports.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnreports.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/liblogloader.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/liblogloader.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmndcmdaemon.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmndcmdaemon.so
Reading symbols from /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnbam.so...done.
Loaded symbols for /ORACLE/whq/10.1.4.0.1_oid/opmn/lib/libopmnbam.so
Reading symbols from /lib/libnss_dns.so.2...done.
Loaded symbols for /lib/libnss_dns.so.2
Reading symbols from /lib/libresolv.so.2...done.
Loaded symbols for /lib/libresolv.so.2
Reading symbols from /lib/libgcc_s.so.1...done.
Loaded symbols for /lib/libgcc_s.so.1
0xffffe410 in __kernel_vsyscall ()
(gdb) bt
#0 0xffffe410 in __kernel_vsyscall ()
#1 0xf7413f7c in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2 0xf74143f5 in pthread_cond_timedwait@GLIBC_2.0 () from /lib/tls/libpthread.so.0
#3 0x0807a173 in pmScheduler ()
#4 0x00000000 in ?? ()
(gdb)
I also ran LDD, but I'm pretty sure this is just showing me JAVA's libraries, not the java application's libraries:
Code:
ldd -v /APPS/11i10/common/util/java/1.4/j2sdk1.4.2_04/bin/java
linux-gate.so.1 => (0xffffe000)
libpthread.so.0 => /lib/tls/libpthread.so.0 (0xf7fda000)
libdl.so.2 => /lib/libdl.so.2 (0xf7fd6000)
libc.so.6 => /lib/tls/libc.so.6 (0x00318000)
/lib/ld-linux.so.2 (0x002fe000)
Version information:
/APPS/common/util/java/1.4/j2sdk1.4.2_04/bin/java:
libdl.so.2 (GLIBC_2.1) => /lib/libdl.so.2
libdl.so.2 (GLIBC_2.0) => /lib/libdl.so.2
libc.so.6 (GLIBC_2.1) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/tls/libc.so.6
/lib/tls/libpthread.so.0:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
libc.so.6 (GLIBC_2.1.3) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.3.2) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.2) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.1) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_PRIVATE) => /lib/tls/libc.so.6
/lib/libdl.so.2:
libc.so.6 (GLIBC_2.1.3) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.1) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_2.0) => /lib/tls/libc.so.6
libc.so.6 (GLIBC_PRIVATE) => /lib/tls/libc.so.6
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
/lib/tls/libc.so.6:
ld-linux.so.2 (GLIBC_2.1) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.3) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_PRIVATE) => /lib/ld-linux.so.2
ld-linux.so.2 (GLIBC_2.0) => /lib/ld-linux.so.2
strace output:
Code:
strace -p 10266
Process 10266 attached - interrupt to quit
[ Process PID=10266 runs in 32 bit mode. ]
waitpid(10269, <unfinished ...>
Process 10266 detached
[root@etcap1a cron]# strace -p 10269
Process 10269 attached - interrupt to quit
[ Process PID=10269 runs in 32 bit mode. ]
clock_gettime(CLOCK_REALTIME, {1249316067, 565479000}) = 0
futex(0x81bcdc4, FUTEX_WAIT, 175659, {3, 434521000}) = -1 ETIMEDOUT (Connection timed out)
futex(0x81a7fa8, FUTEX_WAKE, 1) = 0
time(NULL) = 1249316071
futex(0x81bcd74, FUTEX_WAKE, 1) = 1
time(NULL) = 1249316071
clock_gettime(CLOCK_REALTIME, {1249316071, 2856000}) = 0
futex(0x81bcdc4, FUTEX_WAIT, 175661, {11, 997144000}) = -1 ETIMEDOUT (Connection timed out)
futex(0x81a7fa8, FUTEX_WAKE, 1) = 0
I also noticed that there is no compat libc library on this server. Does that matter? I think this has something to do with running 32-bit apps to run on 64-bit OS, but I really have no idea.
What's the best way to debug this? Any way to figure out what it's waiting on (what process/event)?