LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-09-2012, 11:56 AM   #1
Draeguin
LQ Newbie
 
Registered: Jan 2012
Posts: 3

Rep: Reputation: Disabled
Processes show 'Killed' at random during compile jobs under any user


Hello all,

* The plea:

Sadly, this issue exhausted my google-fu, bashed it over the head then unceremoniously set it on fire and left it to rot. I really hope there's someone out there who can point me in the right direction, this has been driving me nuts! I'll be sure to report the eventual solution here, to prevent any future poor soul having to go through the same thing.


* Problem:

At random, any single recently-started process will simply show as 'Killed', which can be anything from a cp command, to about 5-6 different processes during a compile. There is no pattern to it I can see, and it causes issues like having to retry a compile from 1-6 times until either no kills occured, or they happened to insignificant tests during ./configure. It seems to happen to roughly every 100-400th process.

The random kills don't immediatley dump you to the command line, but they often cause compile jobs fail with bizarre problems ranging from access denied, file missing, gcc internal errors, undeclared functions, missing includes and headers, etc.


* Attempted solutions:
  • I have run a memtest & cpu test with no errors.
  • dmesg, /var/log/messages and /var/log/debug show nothing whenever a process is selected for the guillotine.
  • I have fiddled with ulimits, made them unlimited or very roomy
  • Tried compiling under root as well as the user
  • Checked every log under /var/log for anything relevent, such as syslog, messages, debug, secure, nothing is written to them when a kill occurs

I have included some system info at the end, including uname, ulimit output.

* Random examples of the issue manifesting when compiling Unrealircd:
Code:
Attempt #1:
gcc: Internal error: Killed (program collect2)

#2
configure: creating ./config.status
./configure: line 16906:  9341 Killed                  cat >>$CONFIG_STATUS  <<_ACEOF

#3:
../libtool: line 5803:  3337 Done                    $echo "X$obj"
      3338 Killed                  | $Xsed -e 's%^.*/%%'

#4:
checking whether we are using the GNU C compiler... ./configure: line 3200:  5878 Killed                  cat confdefs.h >>conftest.$ac_ext

#5:
checking for struct in6_addr... ./configure: line 19057: 14120 Killed                  rm -f core conftest.err conftest.$ac_objext conftest.$ac_ext

Neostats #1:
checking for strcmpi... no
checking deeper for strcmpi... ./configure: line 13589: 19381 Killed                  grep -v '^ *+' conftest.er1 >conftest.err

#2:
checking types of arguments and return type for send... int,const void *,size_t,int,int
./configure: line 17008: 21550 Killed

#3:
/bin/sh: line 9:  5741 Killed                  make $local_target
make[1]: *** [all-recursive] Error 1[...]
System info:

Slackware 13.0, 64bit
Type: Dedicated server at OVH, no VPS or virtual apps installed.
RAM: 4gb
HDD: 2x 750GB RAID1
CPU: Intel E8400 3ghz x2 core.

uname -a:
Code:
Linux phoenix 2.6.38.2-grsec-xxxx-grs-ipv6-64 #2 SMP Thu Aug 25 16:40:22 UTC 2011 x86_64 Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz GenuineIntel GNU/Linux
ulimit -a (as user):
Code:
core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited pending signals (-i) 31300 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size (kbytes, -s) 128000 cpu time (seconds, -t) unlimited max user processes (-u) 31300 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited
ulimit -a (as root):
Code:
core file size (blocks, -c) unlimited data seg size (kbytes, -d) unlimited file size (blocks, -f) unlimited pending signals (-i) 31300 max locked memory (kbytes, -l) 64 max memory size (kbytes, -m) unlimited open files (-n) 4096 pipe size (512 bytes, -p) 8 POSIX message queues (bytes, -q) 819200 stack size (kbytes, -s) 128000 cpu time (seconds, -t) unlimited max user processes (-u) 31300 virtual memory (kbytes, -v) unlimited file locks (-x) unlimited

Last edited by Draeguin; 01-09-2012 at 11:57 AM. Reason: Spelling
 
Old 01-09-2012, 08:49 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 7.7 (?), Centos 8.1
Posts: 17,954

Rep: Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634Reputation: 2634
Couple of things to look at;

1. could be the OOM-Killer https://en.wikipedia.org/wiki/Out_of_memory https://lwn.net/Articles/317814/.

2. if a process has been killed unceremoniously in the past, you can end up with corrupt files that then cause other stuff using them to die as well....
 
Old 01-10-2012, 06:22 AM   #3
Draeguin
LQ Newbie
 
Registered: Jan 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Thanks for the response chrism,

That's a good call, I checked /proc/sys/vm/overcommit_memory, but it has a value of 0, which I assume means no overcommission of memory is allowed, rendering OOM a non-issue.

You may be quite right about corruption, I really have no idea what potentially silent and delicate commands may have failed, I might see some effects of this in the future whenever I do figure this out (and don't just pick the nuclear-option of wiping and rebuilding the system from scratch)

Edit: The system also has around 2-3gb of memory free at the time any of these tasks took place.

Last edited by Draeguin; 01-10-2012 at 06:28 AM.
 
Old 01-11-2012, 07:26 AM   #4
Draeguin
LQ Newbie
 
Registered: Jan 2012
Posts: 3

Original Poster
Rep: Reputation: Disabled
Unhappy

Well, my work to troubleshoot the problem has hit a brick wall. Or a pile of them, infact. But first, here was my attempt to try debug this maddening issue:

- Created a script that touches 5000 files, to let me see the frequency of kills:
Code:
root@phoenix:~/cptest# ./cptest.sh
./cptest.sh: line 7:  7936 Killed                  touch file.i$i
./cptest.sh: line 7:  8657 Killed                  touch file.i$i
./cptest.sh: line 7:  8798 Killed                  touch file.i$i
./cptest.sh: line 7: 10004 Killed                  touch file.i$i
./cptest.sh: line 7: 12061 Killed                  touch file.i$i
./cptest.sh: line 7: 12484 Killed                  touch file.i$i
- Used egrep to scan all files in /var/log for instances of 'Killed', nothing.
- Tried to enable process accounting to get proper logging of all kills
- accton /var/log/pacct: No kernel support

Not wanting to compile the kernel just yet, I try something else. SystemTap has a testcase which allows tracing the source of SIGKILL calls.

- Compiled & installed systemtap
- Needs elfutils
- compiled & installed elfutils
- Systemtap could not find kernel-debuginfo

Well then, looks like I'll need to bite the bullet and compile a custom kernel to enable debugging after all

- Current source 2.6.29.2 is different from running kernel.
- Downloaded 2.6.38.2 source
- Configured kernel, started compile.
- Error: no 64bit gcc. System is a 64bit kernel running in a 32bit software environment.
- Downloaded 64bit slackware iso
- Installed 64bit gcc compiler package
- Try to compile kernel: error: libmpft.so.1: missing library
- Reinstalled 32bit gcc package

Rather than compile gcc from source and deal with tedious package location/cohabitation issues,
I went to see what solutions are out there for a mixed 32/64bit environment for my distribution.

- Downloaded slackware 13.0 multilib 32/64bit packages
- Started install of all packages: upgradepkg --reinstall --install-new *.t?z
... Lots of packages start installing as expected.
... I see some ominous 'Killed' messages, as expected for such an intense installation. a dead chmod here, a decapitated rmdir there, etc.
... Until finally the install process grinds to a halt with this sequence of errors:

Code:
*** Pre-installing package glibc-2.9_multilib-x86_64-5alien_slack13.0...
install/doinst.sh: line 221: /usr/bin/rm: No such file or directory
/sbin/installpkg: line 550: /usr/bin/cp: No such file or directory
/sbin/installpkg: line 551: /usr/bin/chmod: No such file or directory
/sbin/upgradepkg: /sbin/removepkg: /bin/sh: bad interpreter: No such file or directory
Cannot install glibc-zoneinfo-2011i_2011n_multilib-noarch-2alien.txz:  invalid package extension
Normally this would not worry me, just a typical job failing due to a dependency failure caused by a failed command. Though, something catches my eye, /usr/bin/rm missing? cp, chmod too? That's when I start to worry. It looks like something a little too critical got killed, and I guess some part of the install script assumed a certain command would succeed, such as a cd into a directory it knows exists. The cd must have failed while it was in /usr/bin, which then did an rm -rf *., which most certainly didn't fail.

Which left me with this:

root@phoenix:~/13.0# ls
-bash: /bin/ls: No such file or directory
root@phoenix:~/13.0# whereis ls
-bash: /usr/bin/whereis: No such file or directory

Logged out my second ssh connection, tried to login:

/bin/bash: No such file or directory
Connection to xxx closed.

So sadly, I'll never be able to trace the source of those mysterious SIGKILLs, and will need to rebuild the system from scratch again.

Last edited by Draeguin; 01-11-2012 at 07:31 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Restrict ps to show only user own processes on Debian Etch. tmee Linux - Server 5 01-25-2011 09:43 PM
Harden RHEL - Only show processes owned by the user brianmcgee Linux - Enterprise 3 07-18-2008 08:16 AM
show processes than don't belong to user root jianelisj Linux - Newbie 2 03-07-2008 12:31 AM
nohup not preventing jobs from getting killed TrulyTessa Linux - Newbie 4 12-24-2004 12:39 PM
User processes not killed on exit ugenn Linux - General 5 04-25-2002 03:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 01:18 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration