I am a newbie so I apologize if this is an RTFM-type question.
I have an application that I am trying to run that seems to be failing due to a memory limit. I can run the job just fine on someone else's server that has 24GB of memory, but when I try to run it on one of my servers, it fails. The most recent attempt was on a Dell R815 server that has 128GB of RAM and is running 64-bit RHEL6.
The application is a bioinformatics program called HAPCUT and is distributed as an x86_64 binary.
I ran the job as root (even though running things as root is generally a bad idea, I did in this case in an effort to get it to work).
First I ran
ulimit -c unlimited
to enable core files.
Then I ran ulimit -a
and got the following output:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 1032072
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 1024
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Then when the program was running, I ran top (sorted by %MEM) in another window. This is what was on the screen when the program terminated with the eror:
Segmentation fault (core dumped)
top - 15:30:47 up 267 days, 23:54, 3 users, load average: 0.54, 0.23, 0.34
Tasks: 1191 total, 2 running, 1189 sleeping, 0 stopped, 0 zombie
Cpu(s): 1.6%us, 0.0%sy, 0.0%ni, 98.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 132124744k total, 23287424k used, 108837320k free, 135916k buffers
Swap: 2097144k total, 11300k used, 2085844k free, 15601464k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
64621 root 20 0 7502m 5.3g 592 R 100.0 4.2 0:46.30 HAPCUT
If I'm reading this correctly, it was using only 4.2% of the RAM, so it should have been ok.
It generated a core dump and I ran gdb on it. Here is the output from that:
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
Copyright (C) 2010 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
Reading symbols from /mnt/thor/home/sselvara/phasing/CAST.129/haplotype/software/HAPCUT-latest/HAPCUT...(no debugging symbols found)...done.
[New Thread 64692]
Missing separate debuginfo for
Try: yum --disablerepo='*' --enablerepo='*-debug*' install /usr/lib/debug/.build-id/80/1b9608daa2cd5f7035ad415e9c7dd06ebdb0a2
Reading symbols from /lib64/libm.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libm.so.6
Reading symbols from /lib64/libc.so.6...(no debugging symbols found)...done.
Loaded symbols for /lib64/libc.so.6
Reading symbols from /lib64/ld-linux-x86-64.so.2...(no debugging symbols found)...done.
Loaded symbols for /lib64/ld-linux-x86-64.so.2
Core was generated by `/mnt/thor/home/sselvara/phasing/CAST.129/haplotype/software/HAPCUT-latest/HAPCU'.
Program terminated with signal 11, Segmentation fault.
#0 0x0000000000403d3c in label_node ()
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6.x86_64
Does anyone have any thoughts/suggestions about things I can check/try?
Thank you very much.