-   Linux - Newbie (
-   -   oom kill, need suggestions (

bee2643 08-04-2006 12:44 PM

oom kill, need suggestions
Over the past 3 days i have been waking up in the morning to a (what appears to be) frozen computer. It's non responsive to keyboard commands, the monitor will not wake from standby, ping works, however ssh does not.

through my little knowledge of troubleshooting linux (as i have had relatively little problems with it) i checked the syslog.

i am running Debian with kernel version 2.6.11

Aug 4 09:13:52 mj0006 kernel: lowmem_reserve[]: 0 719 719
Aug 4 09:13:52 mj0006 kernel: Normal free:4064kB min:3396kB low:4244kB high:5092kB active:306080kB inactive:312872kB present:737216kB pages_scanned:1390136 all_unreclaimable? yes
Aug 4 09:13:52 mj0006 kernel: lowmem_reserve[]: 0 0 0
Aug 4 09:13:52 mj0006 kernel: HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Aug 4 09:13:52 mj0006 kernel: lowmem_reserve[]: 0 0 0
Aug 4 09:13:52 mj0006 kernel: DMA: 5*4kB 23*8kB 2*16kB 1*32kB 0*64kB 1*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 2956kB
Aug 4 09:13:52 mj0006 kernel: Normal: 176*4kB 146*8kB 15*16kB 3*32kB 1*64kB 0*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 4064kB
Aug 4 09:13:52 mj0006 kernel: HighMem: empty
Aug 4 09:13:52 mj0006 kernel: Swap cache: add 200474, delete 200474, find 4953/12947, race 0+19 Aug 4 09:13:52 mj0006 kernel: Free swap = 0kB
Aug 4 09:13:52 mj0006 kernel: Total swap = 489972kB
Aug 4 09:13:52 mj0006 kernel: Out of Memory: Killed process 2705 (cron).
Aug 4 09:13:52 mj0006 kernel: oom-killer: gfp_mask=0x1d2

what appears to be the most significant of the paste is the bottom six lines.
any insight as to what the problem is, and how to fix it would be greatly appreciated.

benjithegreat98 08-04-2006 01:04 PM

Well, it appears to be a memory leak of some sort. The reason it is probably so unresponsive is that your harddrive is probably being run nonstop because of paging to the swap partition.

Is this a home PC? Server for something?

I would first disable any unnecessary daemons and services. I'm not versed very well in debian so I would let somebody else answer that for you if you don't know how.

You can also run something like 'top' in the background and sort everything by memory usage to see if anything is growing unusually large. Though, if it is a memory leak, the program using the memory may not be reported there.

Also make sure all your programs are up to date. You can use apt-get to handle that. It could be a known problem and it is fixed. Or maybe you did update and this started after that?

Have you started running any programs or applets in the taskbar or anything like that in the last few days? When you leave your computer on do you have any programs open or are they all shutdown?

bee2643 08-04-2006 01:43 PM

the workstation only has thunderbird, firefox, and nxserver installed on it.
nxserver is used to connect remotely, which times out when the computer freezes and makes it pointless. 90% of the work done on the machine is through rdp to a terminal server.

no other programs installed. i will monitor top, please let me know of any other suggestions.

benjithegreat98 08-04-2006 02:09 PM

Are people using nxserver to get to you? Or are people using the linux workstation to get to a different terminal server.

If people are connecting though nx they you can look to make sure that all the session are closed before you leave at night. Maybe somebody is letting the session time out. I think when a session times out the session is still running. You can see if anybody is connected by typing "ps aux|grep nx" and if any proccess show up.

By the way, oom-killer is Out Of Memory Killer. There is a possiblity that a kernel upgrade could help. Just a suggestion. It may not solve a thing something to look into.

pete1234 08-05-2006 09:09 AM

Take a look at this post:

syg00 08-05-2006 09:22 AM

And how will that help ???. OOM kill is a symptom, not the problem.

You need to find the leak - might be worth running something like top in batch mode out to a file so you can see who the culprit might be.

bee2643 08-08-2006 10:14 AM

i took the suggestion about running top to a file, and the problem came up as rsync, there were about 50 instances of it. so i killed rsync, now, where is it starting from, or how do i fix this from happening?

bee2643 08-08-2006 02:26 PM

i ran the rsync command from crontab -e
this is what im getting:

532214 100% 643.24kB/s 0:00:00 (7, 91.6% of 6072)
failed to set permissions on . : Operation not permitted

sent 213372 bytes received 94634 bytes 10440.88 bytes/sec
total size is 2269094560 speedup is 7367.05
rsync error: some files could not be transferred (code 23) at main.c(791)

the script is baking up a home folder.

All times are GMT -5. The time now is 06:12 PM.