LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 07-17-2008, 05:23 PM   #1
ech310n
Member
 
Registered: Jan 2004
Distribution: RHEL, Fedora
Posts: 46

Rep: Reputation: 16
Processes Randomly Killed


Hi all,

Apologies in advance if this is not the right board to post this on, but I thought I'd give it a shot here first. I'm having a very weird problem on a RedHat EL 4 udpate 5 x86 machine.

Random processes are just being killed without any explanation. There's nothing in any of the logs to explain why. I've googled this a bit and not really found anything.

For example, I was vi'ing a script, and the process just got killed and i was dropped back to my bash prompt. So, every time I try to vi the file again I get this...

Code:
E325: ATTENTION
Found a swap file by the name ".VCSbltftp.sh.swp"
          owned by: root   dated: Thu Jul 17 21:51:34 2008
         file name: /usr/local/bin/VCS/VCSbltftp.sh
          modified: YES
         user name: root   host name: hostname.mydomain.com
        process ID: 31854
While opening file "VCSbltftp.sh"
             dated: Thu Jul 17 21:28:24 2008

(1) Another program may be editing the same file.
    If this is the case, be careful not to end up with two
    different instances of the same file when making changes.
    Quit, or continue with caution.

(2) An edit session for this file crashed.
    If this is the case, use ":recover" or "vim -r VCSbltftp.sh"
    to recover the changes (see ":help recovery").
    If you did this already, delete the swap file ".VCSbltftp.sh.swp"
    to avoid this message.

Swap file ".VCSbltftp.sh.swp" already exists!
[O]pen Read-Only, (E)dit anyway, (R)ecover, (Q)uit, (A)bort, (D)elete it:
All, perfectly normal... but I've found I can reproduce the problem, as If I just leave this and don't answer, within a few seconds it just comes up with the word "Killed" and I'm dropped back to bash.

So, I can the vi and then an strace on it's pid... which comes up with this...

Code:
# strace -p 9509
Process 9509 attached - interrupt to quit
select(1, [0], NULL, [0], NULL)         = ? ERESTARTNOHAND (To be restarted)
+++ killed by SIGKILL +++
Process 9509 detached
...and this on another attempt with some different options (still vi'ing the same file)

Code:
# strace -c -v -p 25900
Process 25900 attached - interrupt to quit
Process 25900 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 49.35    0.000800         267         3           write
 12.77    0.000207          30         7         1 open
  7.83    0.000127          14         9         1 stat64
  5.18    0.000084          12         7           close
  4.75    0.000077          26         3           read
  2.90    0.000047          47         1           connect
  2.71    0.000044           9         5           fstat64
  2.41    0.000039          39         1           unlink
  1.73    0.000028           9         3           ioctl
  1.54    0.000025          13         2           mmap2
  1.23    0.000020          20         1           send
  1.17    0.000019          19         1           socket
  1.05    0.000017          17         1           munmap
  0.99    0.000016           8         2           fcntl64
  0.93    0.000015          15         1           pread64
  0.86    0.000014          14         1         1 kill
  0.86    0.000014          14         1           recvmsg
  0.62    0.000010           5         2           poll
  0.31    0.000005           5         1           brk
  0.25    0.000004           4         1           access
  0.25    0.000004           2         2         1 select
  0.19    0.000003           3         1           uname
  0.12    0.000002           2         1           getuid32
------ ----------- ----------- --------- --------- ----------------
100.00    0.001621                    57         4 total
Nothing obvious. I've read a few posts about OOM killing procs hogging resources, however, in my case I doubt this is the problem... as you can see, I've plenty of free memory etc...

# free -m
total used free shared buffers cached
Mem: 8114 1374 6739 0 157 505
-/+ buffers/cache: 711 7403
Swap: 8136 0 8136

BTW this is a 2 x Quad Core AMD Opteron machine (HP ProLiant DL385 G5). Kernel version is 2.6.9-55.ELsmp.

I've tried disabling SELinux too, but that's made no difference.

Does anyone have any ideas, this is driving me mad.

Cheers!
 
Old 07-17-2008, 10:25 PM   #2
pruneau
Member
 
Registered: Jul 2008
Location: Montreal
Distribution: Debian/Fedora/RHEL
Posts: 45

Rep: Reputation: 15
Hmm, interesting problem indeed.
What I would suggest, to find the culprit is to install a process accounting suite (like psacct), and filter event only for process using the kill() system call.
If nothing shows up, it comes from the kernel, otherwise, you should be able to correlate somehow who is the culprit.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
processes randomly dying on my server fmillion Linux - Server 8 06-03-2008 02:49 AM
find history of a job killed by "kernel: Out of Memory: Killed process" poulacou Linux - Server 3 09-20-2007 05:24 PM
Limitations of System Processes and Oracle Processes in RHEL AS3.0 sathyguy Linux - Enterprise 0 03-03-2007 12:52 AM
processes that can't be killed :-\ Syncrm Linux - General 8 06-24-2002 03:33 PM
User processes not killed on exit ugenn Linux - General 5 04-25-2002 03:19 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 06:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration