LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices

Reply
 
LinkBack Search this Thread
Old 02-25-2010, 07:06 PM   #1
ArthurGoldberg
LQ Newbie
 
Registered: Feb 2010
Posts: 2

Rep: Reputation: 0
RHEL Server Rel 5.4 freezes with large jobs


Hello

We're running
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.4 (Tikanga)
$ uname -r
2.6.18-164.11.1.el5

It hosts an Apache/2.2.3 web server. We also run apache-tomcat-5.5.23. Most of our programs are mod_perl. Sometimes our users input over-sized data sets, or queries that generate too much output. (I realize that we should try to prevent them from doing that, but right now I'm looking for a more general solution.)

When a large job runs it can 'freeze' our system. The system becomes unresponsive to everything, including command line commands. Sometimes it unfreezes after a while. Once, in this situation I was able to create a high-priority shell. ps reported:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
349 root 10 -5 0 0 0 R 37.7 0.0 5:53.56 kswapd1
348 root 20 -5 0 0 0 R 35.8 0.0 5:57.67 kswapd0

It froze again yesterday, sometime around 14:50; about then /var/log/messages says

Feb 24 14:34:47 ourMachine avahi-daemon[3133]: Invalid query packet.
Feb 24 14:35:18 ourMachine last message repeated 6 times
Feb 24 14:35:18 ourMachine last message repeated 2 times
Feb 24 14:35:32 ourMachine setroubleshoot: SELinux is preventing the http daemon from connecting to network port 3306 For complete SELinux messages. run sealert -l 0afcfa46-07b8-48eb-aec3-e7dda9872b84
Feb 24 14:35:34 ourMachine avahi-daemon[3133]: Invalid query packet.
Feb 24 14:55:06 ourMachine last message repeated 6 times
Feb 24 15:00:44 ourMachine last message repeated 3 times
Feb 24 15:00:55 ourMachine last message repeated 5 times
Feb 24 15:01:21 ourMachine dhclient: DHCPREQUEST on eth0 to 128.122.128.24 port 67
Feb 24 15:09:51 ourMachine kernel: hald-addon-stor invoked oom-killer: gfp_mask=0xd0, order=0, oomkilladj=0
Feb 24 15:09:51 ourMachine kernel:
Feb 24 15:09:51 ourMachine kernel: Call Trace:
Feb 24 15:09:51 ourMachine kernel: [<ffffffff800c6076>] out_of_memory+0x8e/0x2f3
Feb 24 15:09:51 ourMachine kernel: [<ffffffff8000f487>] __alloc_pages+0x245/0x2ce
Feb 24 15:09:51 ourMachine kernel: [<ffffffff80017812>] cache_grow+0x133/0x3c1
Feb 24 15:09:51 ourMachine kernel: [<ffffffff8005c2e5>] cache_alloc_refill+0x136/0x186
Feb 24 15:09:51 ourMachine kernel: [<ffffffff8000ac12>] kmem_cache_alloc+0x6c/0x76
Feb 24 15:09:51 ourMachine kernel: [<ffffffff80012658>] getname+0x25/0x1c2
Feb 24 15:09:51 ourMachine kernel: [<ffffffff80019cba>] do_sys_open+0x17/0xbe
Feb 24 15:09:51 ourMachine kernel: [<ffffffff8005d28d>] tracesys+0xd5/0xe0
Feb 24 15:09:51 ourMachine kernel:
Feb 24 15:09:51 ourMachine kernel: Mem-info:
Feb 24 15:09:48 ourMachine dhclient: DHCPREQUEST on eth0 to 128.122.128.24 port 67
Feb 24 15:03:45 ourMachine avahi-daemon[3133]: Invalid query packet.
Feb 24 15:09:54 ourMachine kernel: Node 0 DMA per-cpu:
Feb 24 15:09:54 ourMachine dhclient: DHCPREQUEST on eth0 to 128.122.128.24 port 67
Feb 24 15:09:54 ourMachine avahi-daemon[3133]: Invalid query packet.
Feb 24 15:09:54 ourMachine kernel: cpu 0 hot: high 0, batch 1 used:0
Feb 24 15:09:55 ourMachine dhclient: DHCPACK from 128.122.128.24
Feb 24 15:09:55 ourMachine avahi-daemon[3133]: Invalid query packet.
Feb 24 15:09:55 ourMachine kernel: cpu 0 cold: high 0, batch 1 used:0
Feb 24 15:09:56 ourMachine kernel: cpu 1 hot: high 0, batch 1 used:0
Feb 24 15:09:56 ourMachine kernel: cpu 1 cold: high 0, batch 1 used:0
Feb 24 15:09:56 ourMachine kernel: cpu 2 hot: high 0, batch 1 used:0
Feb 24 15:09:56 ourMachine kernel: cpu 2 cold: high 0, batch 1 used:0
Feb 24 15:09:56 ourMachine kernel: cpu 3 hot: high 0, batch 1 used:0
Feb 24 15:09:56 ourMachine kernel: cpu 3 cold: high 0, batch 1 used:0
Feb 24 15:09:57 ourMachine kernel: Node 0 DMA32 per-cpu:

Observing the machine, I see at least one very busy disk. I suspect that some high priority system process (perhaps kswapd) is using all the cpus, preventing anything else from running. Unfortunately, I cannot find much info on kswapd, or debuggging this problem.

Thanks
Arthur
 
Old 02-25-2010, 07:30 PM   #2
John VV
Guru
 
Registered: Aug 2005
Posts: 12,156

Rep: Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597Reputation: 1597
well mod_perl can bog a system down
but i routinely work with 5 Gig to 9 Gig( or bigger) imaging data sets on CentOS5.4

can you define what you consider a "too large" file is and what they are doing with it.
 
Old 02-26-2010, 12:34 AM   #3
mesiol
Member
 
Registered: Nov 2008
Location: Lower Saxony, Germany
Distribution: CentOS, RHEL, Solaris 10, AIX, HP-UX
Posts: 731

Rep: Reputation: 137Reputation: 137
Hi,

you can limit the resources for the user running apache/tomcat to prevent from a complete unresponsive system.
Take a look at
Code:
ulimit
.
 
Old 02-26-2010, 11:14 AM   #4
ArthurGoldberg
LQ Newbie
 
Registered: Feb 2010
Posts: 2

Original Poster
Rep: Reputation: 0
thanks folks

I investigated ulimit. These are our current ulimits:

ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 135168
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 135168
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited

Our box has 16 GB RAM. Right now ps (ps -ale --sort=-vsize) says (I added commas for readability - wish code would do that):

S UID PID PPID C PRI NI RSS SZ WCHAN TTY TIME CMD
S 48 3490 3484 1 75 0 7,861,380 2,071,971 stext ? 00:35:10 httpd
S 48 3569 3484 0 76 0 1,217,284 402,132 stext ? 00:05:51 httpd
S 0 3200 1 1 79 0 140,716 336,470 stext ? 00:26:26 java
S 48 3488 3484 0 76 0 364,932 199,448 stext ? 00:01:23 httpd
S 48 3571 3484 0 75 0 312,572 175,107 stext ? 00:01:46 httpd

where
RSS = resident set size, the non-swapped physical memory that a task has used (in kiloBytes).
SZ = approximate amount of swap space that would be required if the process were to dirty all writable pages and then be swapped out. This number is very rough!

The biggest httpd seems too big. Perhaps it allocated a bunch of memory and never freed it.

I think that the important ulimit options for us are:

-d The maximum size of a processís data segment
-l The maximum size that may be locked into memory
-v The maximum amount of virtual memory available to the shell

I'm thinking of trying a 2 GB limit on data segments with "ulimit -d 2000000".

e.g., see http://httpd.apache.org/docs/2.0/vhosts/fd-limits.html:
#!/bin/sh
ulimit -S -n 100
exec httpd

we could start httpd with
#!/bin/sh
ulimit -d 2000000
exec /usr/sbin/apachectl restart

then processes won't be able to exceed a 2 GB data segment.

For the system call, see http://linux.die.net/man/2/setrlimit, and the underlying calls, "brk, sbrk - change data segment size" at
http://www.kernel.org/doc/man-pages/...an2/brk.2.html.

Comments?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using CUPS 1.3.7 on RHEL 5 - wrapping issue on print jobs shannonadams68 Red Hat 1 02-13-2010 07:17 AM
How to identify Red Hat Server 'type' in rel 5? gdkenoyer Linux - General 6 09-04-2008 12:23 PM
can't create large files on RHEL 4 on AMD64 lltong Red Hat 2 01-06-2006 11:48 AM
xmms freezes when adding large directory true_atlantis Suse/Novell 1 01-03-2006 11:13 PM
Redhat 7.2 freezes on large hard disk bruce_mckinnon Linux - Software 0 03-27-2002 05:55 PM


All times are GMT -5. The time now is 07:01 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration