LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 10-08-2002, 02:08 PM   #1
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Rep: Reputation: 0
System Crashes


Usually my system runs a large variety of processes, but lately after about
12-24 hours of usage, it starts thrashing real hard and before you know it
the system is too slow to log in without having a timeout happen on the password
prompt. I run the 2.4.19 kernel and the latest stable versions of software.

The system is a 233MHz pentium based processor with 200 megs of memory in my system
and usually have about 475 megabytes of swap (which I turned off below so I could
demonstrate the issues I was having, but yes, lots of swap was available at the
time of these imcidents.)

The memory usgae thoughout the next part of this showed that all ecept about 5
megs of the RAM was being used, and 20 megs of swap was being utilized too.

Last night the system started it's usual lockup and was showing unusally high load
averages. When this happened I was compiling a kernel. I cancelled the compiling
and checked the load average.

Code:
2:31am  up 12:25,  1 user,  load average: 3.44, 3.49, 3.28
So I shut down apache and mysql ...

Code:
2:45am  up 12:39,  1 user,  load average: 3.06, 4.99, 4.7
which showed results that seemed pretty normal to me. except that while I was
shutting stuff down the load averge in the 1 minute colum spiked up 4.99.

So finally I shut down most of my other services, nfsd, qmail, cron daemon,
sysklogd and inetd (which was running a CVS pserver).

Code:
2:57am  up 12:50,  1 user,  load average: 0.58, 1.55, 3.00
Finally things seem like they are starting to improve, but 0.58 is still way
to high for a system that isn't doing anything except handle a single sshd session.

So finally after about a half hour of not touching the system, the load average hit
somewhere in the range of 0.08.

So now I start looking at the current memory usage (which is not very different from what
it was before. except that the swap was down to 1 megabyte. So I decided to see if I could
get any answers by trying to break the system in a controlled fashion.

I then swapped of the swap sspace which took about 15 seconds to do. Immediately me ssh
session died. So I go to the console and see an out of memory error. I decide to continue
screwing around with it.

Code:
             total       used       free     shared    buffers     cached
Mem:        192676     188324       4352          0        356       1060
-/+ buffers/cache:     186908       5768
Swap:            0          0          0


   procs                      memory      swap          io     system      cpu
 r  b  w   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id
 0  0  0      0   4344    336   1076   62   13   160   103  179   370  9  2 89


  3:53am  up 13:47,  3 users,  load average: 0.00, 0.00, 0.11
14 processes: 13 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  8.9% user,  2.3% system,  0.0% nice,  2.2% idle
Mem:   192676K av,  188788K used,    3888K free,       0K shrd,     348K buff
         1504K Active,               2428K Inactive
Swap:       0K av,       0K used,       0K free                    1412K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
 6826 root      19   0   868  864   680 R     0.8  0.4   0:00 top -c -b -n 1
    1 root       8   0   136  132    52 S     0.0  0.0   0:06 init [3]
    2 root       9   0     0    0     0 SW    0.0  0.0   0:00 keventd
    3 root      19  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU0
    4 root      11   0     0    0     0 SW    0.0  0.0   1:00 kswapd
    5 root       9   0     0    0     0 SW    0.0  0.0   0:00 bdflush
    6 root      10   0     0    0     0 SW    0.0  0.0   0:04 kupdated
    7 root       9   0     0    0     0 SW    0.0  0.0   0:00 kreiserfsd
21159 root       9   0    72   72     0 S     0.0  0.0   0:00 dhcpcd eth0
22673 root       9   0   304  304   180 S     0.0  0.1   0:21 /usr/sbin/sshd
19398 root       9   0   712  712    52 S     0.0  0.3   0:00 -bash
16918 root       9   0    68   68     0 S     0.0  0.0   0:00 /sbin/agetty tty1 9600
  336 root      13   0   516  516   336 S     0.0  0.2   0:00 /usr/sbin/sshd
18441 root      16   0   996  992   344 S     0.0  0.5   0:00 -bash^M
Lets crash it...
I decided to see if I could allocate any memory that was left into a file.

Code:
mount -t ramfs /dev/ram0 /mnt
cd /mnt
dd if=/dev/zero of=/mnt/memuseup bs=512k count=20
This should have created a 10 megabyte file in memory, but obviously it didn't finish
and dd, bash and my login was killed.

Ok cool, I now have all my memory used up, so I let it sit there until morning to see
if the system would recover any more.

Sleep ......

So today about noon-thirty, I went over to the console and there was no recovery, there
was more memory being used up, because agetty was dying with an out of memory error and
then respawning

Any help any one could offer is much appreciated because I have no idea what to do next.
Sorry for the long post but I wanted to provide you with as much information as I could.
 
Old 10-08-2002, 04:21 PM   #2
NSKL
Senior Member
 
Registered: Jan 2002
Location: Rome, Italy ; Novi Sad, Srbija; Brisbane, Australia
Distribution: Ubuntu / ITOS2008
Posts: 1,207

Rep: Reputation: 47
Im guessing here, but it sounds like a memory leak, there was a thread about ,memory leaks a while back if i remember correctly, Try to find it (search the board) and meanwhile im sure some of the more knowledgable people will help you out more.
Also you might want to get a program called memtest86 to ckeck your RAM in case you suspect the RAM sticks are corrupted.
Sorry i couldnt be of much help...
-NSKL
 
Old 10-08-2002, 04:45 PM   #3
mikek147
Member
 
Registered: Mar 2002
Location: Elyria, Ohio
Distribution: Debian, Nothing else required
Posts: 141

Rep: Reputation: 15
Personally. I would like to see the output of top with your system normally loaded. Obviously something starts running that eats your ram causing your VMM to start thrashing stuff into and out of swap. Or, something like logrotate, logcheck and maybe aide are running at the same time. Since they are all doing disk access, this can really bog down a system with a slower processor.

Just tossing out some ideas. Your mileage may vary. -mk
 
Old 10-08-2002, 11:28 PM   #4
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Original Poster
Rep: Reputation: 0
Here is the output of top on a pretty normal day.

Code:
 11:30pm  up 10:22,  1 user,  load average: 0.04, 0.05, 0.01
66 processes: 65 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.0% user,  0.0% system,  0.0% nice, 99.8% idle
Mem:   192676K av,   99852K used,   92824K free,       0K shrd,   35600K buff
        10496K Active,              80932K Inactive
Swap:  465876K av,       0K used,  465876K free                   45644K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
26884 root      17   0   892  888   680 R     3.8  0.4   0:00 top -c -b -n 1
    1 root       9   0   528  524   452 S     0.0  0.2   0:03 init [3]
    2 root       9   0     0    0     0 SW    0.0  0.0   0:00 keventd
    3 root      19  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU0
    4 root       9   0     0    0     0 SW    0.0  0.0   0:00 kswapd
    5 root       9   0     0    0     0 SW    0.0  0.0   0:00 bdflush
    6 root       9   0     0    0     0 SW    0.0  0.0   0:00 kupdated
    7 root       9   0     0    0     0 SW    0.0  0.0   0:00 kreiserfsd
 8338 root       9   0   676  676   556 S     0.0  0.3   0:00 /usr/sbin/syslogd -m 0
29005 root       9   0   548  548   404 S     0.0  0.2   0:00 /usr/sbin/klogd
22666 root       9   0   472  472   400 S     0.0  0.2   0:00 dhcpcd eth0
 2257 bin        9   0   524  524   440 S     0.0  0.2   0:00 /sbin/portmap
 9133 root       9   0   820  820   672 S     0.0  0.4   0:00 /usr/sbin/rpc.mountd
 4048 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
13697 root       9   0     0    0     0 SW    0.0  0.0   0:00 lockd
10311 root       9   0     0    0     0 SW    0.0  0.0   0:00 rpciod
12097 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
10432 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
18465 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
30406 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
 7239 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
19878 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
14648 root       9   0     0    0     0 SW    0.0  0.0   0:00 nfsd
20219 root       9   0   772  772   648 S     0.0  0.4   0:00 /usr/sbin/rpc.statd
 6867 root       9   0   464  464   376 S     0.0  0.2   0:00 /usr/sbin/rpc.rquotad
 8155 root       8   0   564  564   472 S     0.0  0.2   0:00 /usr/sbin/madcron
 2314 root       9   0  1300 1300  1180 S     0.0  0.6   0:05 /usr/sbin/sshd
 8705 root       9   0  1124 1124   912 S     0.0  0.5   0:00 /bin/sh /usr/bin/safe_mysqld --datadir=/var/mysql --pid-file=/var/mysql/server1.pid
27978 mysql      9   0  4340 4340  1584 S     0.0  2.2   0:00 /usr/bin/mysqld --basedir=/usr --datadir=/var/mysql --user=mysql --pid-file=/var/mysql/server1.
31594 fetchmai   9   0   936  936   768 S     0.0  0.4   0:00 fetchmail --daemon 300 --syslog -f /etc/fetchmailrc
 6500 root       9   0   496  496   424 S     0.0  0.2   0:00 inetd /etc/inetd.conf
 9242 root       9   0   532  532   464 S     0.0  0.2   0:00 /sbin/agetty tty1 9600
 5488 root       9   0   532  532   464 S     0.0  0.2   0:00 /sbin/agetty tty2 9600
 1044 mysql      9   0  4340 4340  1584 S     0.0  2.2   0:00 /usr/bin/mysqld --basedir=/usr --datadir=/var/mysql --user=mysql --pid-file=/var/mysql/server1.
  478 mysql      9   0  4340 4340  1584 S     0.0  2.2   0:00 /usr/bin/mysqld --basedir=/usr --datadir=/var/mysql --user=mysql --pid-file=/var/mysql/server1.
 7350 mysql      9   0  4340 4340  1584 S     0.0  2.2   0:00 /usr/bin/mysqld --basedir=/usr --datadir=/var/mysql --user=mysql --pid-file=/var/mysql/server1.
 2868 root       8   0  4520 4520  4332 S     0.0  2.3   0:00 /usr/bin/httpd -DSSL
 3446 nobody     9   0  4636 4636  4400 S     0.0  2.4   0:00 /usr/bin/httpd -DSSL
28014 nobody     9   0  4648 4648  4404 S     0.0  2.4   0:00 /usr/bin/httpd -DSSL
25743 nobody     9   0  4636 4636  4400 S     0.0  2.4   0:00 /usr/bin/httpd -DSSL
25613 nobody     9   0  4636 4636  4400 S     0.0  2.4   0:00 /usr/bin/httpd -DSSL
17479 nobody     9   0  4636 4636  4400 S     0.0  2.4   0:00 /usr/bin/httpd -DSSL
28563 root      13   0  1764 1764  1576 S     0.0  0.9   0:00 /usr/sbin/sshd
19854 root      11   0  1720 1716  1084 S     0.0  0.8   0:00 -bash
13844 root       9   0  1116 1116   912 S     0.0  0.5   0:00 /bin/sh /command/svscanboot
 3109 root       9   0   336  332   260 S     0.0  0.1   0:00 svscan /service
32534 root       9   0   260  260   208 S     0.0  0.1   0:00 readproctitle service errors: .................................................................
 6579 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise qmail-pop3d
26589 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise log
 1609 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise qmail-smtpd
 2060 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise log
31978 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise qmail-send
13374 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise log
27644 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise qmail-pop3ds
31614 root       9   0   312  304   248 S     0.0  0.1   0:00 supervise log
11147 vpopmail   9   0   332  328   272 S     0.0  0.1   0:00 tcpserver -l 0 -R -H -v -u220 -g220 0 110 qmail-popup server1.ecolinux2.servebeer.com /home/vpo
31489 vpopmail   8   0   476  476   400 S     0.0  0.2   0:00 tcpserver -H -R -l 0 -x /home/vpopmail/etc/tcp.smtp.cdb -c 20 -u 220 -g 220 0 smtp qmail-smtpd
13620 qmails     9   0   384  380   292 S     0.0  0.1   0:00 qmail-send
 8421 qmaill     9   0   296  292   236 S     0.0  0.1   0:00 multilog t /var/log/qmail/pop3d
11589 qmaill     9   0   292  288   232 S     0.0  0.1   0:00 multilog t /var/log/qmail/smtpd
 6917 qmaill     9   0   296  292   236 S     0.0  0.1   0:00 multilog t /var/log/qmail
21340 vpopmail   9   0   332  328   272 S     0.0  0.1   0:00 tcpserver -l 0 -R -H -v -u220 -g220 0 995 stunnel -f -p /var/qmail/control/servercert.pem -l qm
18790 qmaill     9   0   296  292   236 S     0.0  0.1   0:00 multilog t /var/log/qmail/pop3ds
 8100 root       9   0   308  304   240 S     0.0  0.1   0:00 qmail-lspawn ./Maildir/
 3326 qmailr     8   0   340  336   268 S     0.0  0.1   0:00 qmail-rspawn
 7095 qmailq     9   0   328  324   260 S     0.0  0.1   0:00 qmail-clean
11633 root      11   0   948  948   756 S     0.0  0.4   0:00 changedfiles -c /etc/sync.conf
20422 root      11   0   948  948   756 S     0.0  0.4   0:00 changedfiles -c /etc/sync.conf
25132 root      11   0   948  948   756 S     0.0  0.4   0:00 changedfiles -c /etc/sync.conf
 6168 root      11   0   948  948   756 S     0.0  0.4   0:00 changedfiles -c /etc/sync.conf
 
Old 10-09-2002, 01:12 AM   #5
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Original Poster
Rep: Reputation: 0
I also want to mention that the following kinds of activity happen in a given day.

1 cron job that runs updatedb
logs are rotated manually, whenever they get to be more than a couple megabytes in size.

The email server probably only handles about 100 messages a day

The web server handles about 250 requests a day. which is about one request every 6 minutes.

NFS transfers about 250 megabytes a day, but it can vary quite a bit.

changedfiles syncs a directory on my system to a directory on another system using ssh sessions. Anywhere from 10-15 transfers a day and varies from 1-15 megabytes

I also do alot of compiling sometimes, anywhere from 30 minutes to 3 hours on a given day.

Let me know if any other information is required. I plan on running memory tests tomorow to see if that shows anything useful.
 
Old 10-09-2002, 06:54 AM   #6
mikek147
Member
 
Registered: Mar 2002
Location: Elyria, Ohio
Distribution: Debian, Nothing else required
Posts: 141

Rep: Reputation: 15
The output of top you posted, is that when the system is healthy or thrashing? -mk
 
Old 10-09-2002, 04:23 PM   #7
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Original Poster
Rep: Reputation: 0
The system was healthy when I grabbed the output of top.

As a side note, I ran memtest86 today and no errrors were detected.
 
Old 10-09-2002, 05:18 PM   #8
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Original Poster
Rep: Reputation: 0
I ran a memory testing script earlier that is supposed to run massive diffs in parallel to test memory management. I don't know how acurate this is supposed to be, but I ended up going into 'super-thrash mode' before the test could complete.

The odd thing is that I had atop running every 60 seconds and logging to a file. It showed that only 15 megs of swap was being used when it crashed and burned. Now I know a 233 processor is nothing great, but I can't understand why the system can't handle paging 15 megs worth of swap space.

If a memory leak is indeed what the problem is, how would I know. Would atop or top show the rogue process using more memory than it should or does memory just disappear off the face of the earth.

I am gonna keep plugging at the problem, so let me know if you have any ideas on what I could or should do.
 
Old 10-10-2002, 03:34 AM   #9
mikek147
Member
 
Registered: Mar 2002
Location: Elyria, Ohio
Distribution: Debian, Nothing else required
Posts: 141

Rep: Reputation: 15
When your system went iinto Super Thrash mode, using top, what were the top 3 cpu intensive programs running? -mk
 
Old 10-10-2002, 03:56 AM   #10
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Original Poster
Rep: Reputation: 0
tar, gzip and diff.

If you think it would help, I can run somekind of test, log the heck out of it and post the results. It's not like I am too worried about crashing it at this point.

Last edited by linuxeco; 10-10-2002 at 03:58 AM.
 
Old 10-10-2002, 05:20 AM   #11
mikek147
Member
 
Registered: Mar 2002
Location: Elyria, Ohio
Distribution: Debian, Nothing else required
Posts: 141

Rep: Reputation: 15
If you would, grab the first 12 lines of output ftom top, when the system is thrashing, and post them here. -mk
 
Old 10-13-2002, 07:03 AM   #12
linuxeco
LQ Newbie
 
Registered: Oct 2002
Posts: 9

Original Poster
Rep: Reputation: 0
9.31 Load Average.....

After about two days of running my system, a single web page hit did this ....

The load average was pretty low before I tried to load a web page. Before I loaded the page even though the load average was down, their was a lot of thrashing going on and lots of spikes to the load average.

Code:
  7:04am  up 1 day, 14:01,  1 user,  load average: 9.31, 4.19, 1.73
77 processes: 76 sleeping, 1 running, 0 zombie, 0 stopped
CPU states:  0.2% user,  2.3% system,  0.0% nice, 97.4% idle
Mem:   192748K av,  188300K used,    4448K free,       0K shrd,     252K buff
         1136K Active,               1336K Inactive
Swap:  465876K av,   16868K used,  449008K free                     964K cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME COMMAND
    4 root      14   0     0    0     0 SW    0.8  0.0   0:13 kswapd
32238 root      14   0   364  252   252 R     0.2  0.1   0:21 top
21843 nobody    11   0  2776  252   252 D     0.1  0.1   0:04 httpd
32290 nobody    10   0  2160  300   284 D     0.1  0.1   0:00 httpd
32292 nobody    10   0  2136  256   244 D     0.1  0.1   0:00 httpd
    1 root       9   0    92   24    24 S     0.0  0.0   0:03 init
    2 root       9   0     0    0     0 SW    0.0  0.0   0:00 keventd
    3 root      19  19     0    0     0 SWN   0.0  0.0   0:00 ksoftirqd_CPU0
    5 root       9   0     0    0     0 SW    0.0  0.0   0:00 bdflush
    6 root       9   0     0    0     0 DW    0.0  0.0   0:06 kupdated
    7 root       9   0     0    0     0 SW    0.0  0.0   0:00 kreiserfsd
   47 root       9   0   224   20    20 S     0.0  0.0   0:00 svscanboot
   55 root       9   0   128   96    96 D     0.0  0.0   0:00 svscan
   56 root       9   0    56    4     4 S     0.0  0.0   0:00 readproctitle
   59 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   60 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   61 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   62 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   63 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   64 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   65 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   66 root       9   0    72   12    12 S     0.0  0.0   0:00 supervise
   68 vpopmail   8   0    96   20    20 S     0.0  0.0   0:00 tcpserver
   69 qmaill     9   0    64    8     8 S     0.0  0.0   0:00 multilog
   71 qmaill     9   0    60    4     4 S     0.0  0.0   0:00 multilog
   72 vpopmail   9   0    68   12    12 S     0.0  0.0   0:00 tcpserver
   73 qmails     9   0    96   16    16 S     0.0  0.0   0:01 qmail-send
   74 qmaill     9   0    64    8     8 S     0.0  0.0   0:00 multilog
   75 vpopmail   9   0    68   12    12 S     0.0  0.0   0:00 tcpserver
   81 qmaill     9   0    64    8     8 S     0.0  0.0   0:00 multilog
   89 root       8   0    76   12    12 S     0.0  0.0   0:00 qmail-lspawn
   90 qmailr     9   0    84   16    16 S     0.0  0.0   0:00 qmail-rspawn
 
Old 10-13-2002, 09:31 AM   #13
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
Top output shows an accumulation of things, what you want to see is a detailed overview what goes on. Try running Atsar or Sysstat (see Freshmeat) with a low interval, and process the logs daily. Also review your system limits and your /proc/sys/vm settings, limits can do all sorts of mucking from denying logins to crashing X11. Proper (for your situation that is) bdflush/kswapd values may result in some performance downgrading but less bursting I/O which could be usefull on an already I/O bound box.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
When Browsing.. System crashes amer_58 Linux - Newbie 2 03-19-2005 04:17 PM
System Crashes amer_58 Linux - Newbie 5 03-11-2005 02:31 PM
System crashes at logout Eric_The_Froggy Mandriva 7 06-15-2004 09:31 AM
cardmgr crashes system! coldforge Linux - Software 1 01-28-2004 05:48 PM
X server crashes system Parksy Linux - Hardware 3 09-07-2003 10:38 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 11:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration