LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 05-07-2013, 03:09 PM   #1
ip_address
Member
 
Registered: Apr 2012
Distribution: RedHat
Posts: 42

Rep: Reputation: 2
multiple jobs on linux box


Hello everyone,

I have a 6 core box with Centos 6 running on it. If I run an awk job to extract a column from a large file (1.2 billion rows), it takes some time, extract the column and dump it into a new file.

The problem is if I run 50 such jobs in the background all the processes disappear after some time. Is there a way i could schedule 6 jobs at a time OR make sure my machine is being 100% utilized and once they are finished, next 6 jobs start running without using wait command OR writing a bash script. Is there an inherent linux mechanism/command/scheduler that could be used to automatically schedule multiple jobs.

Thanks!
 
Old 05-07-2013, 03:21 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 27,744
Blog Entries: 54

Rep: Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973Reputation: 2973
Quote:
Originally Posted by ip_address View Post
The problem is if I run 50 such jobs in the background all the processes disappear after some time.
...because they have finished or what?


Quote:
Originally Posted by ip_address View Post
Is there a way i could schedule 6 jobs at a time OR make sure my machine is being 100% utilized and once they are finished, next 6 jobs start running without using wait command OR writing a bash script. Is there an inherent linux mechanism/command/scheduler that could be used to automatically schedule multiple jobs.
To schedule one-off jobs use the 'at' service. You could list all jobs in one 'at' job but what I'd do is write a script that drives the awk job, make it loop over the tables and 'at' that script.
 
Old 05-07-2013, 04:56 PM   #3
ip_address
Member
 
Registered: Apr 2012
Distribution: RedHat
Posts: 42

Original Poster
Rep: Reputation: 2
Quote:
...because they have finished or what?
They haven't finished for sure because one job takes approximately 10-15 minutes to finish and with the limited resources and more jobs, the time to finish all jobs will be more. In first couple of minutes, I can see jobs running but after some time (< 10 minutes) all jobs are gone, no output files as well. I tried monitoring using top and ps -ef command but cannot see any jobs running. That's why I thought if jobs can be queued and run 5-6 at a time depending upon the load on the machine (cpu utilization and I/O) then all the 50 jobs will finish sooner or later as compared to no output in the current scenario.

Quote:
To schedule one-off jobs use the 'at' service. You could list all jobs in one 'at' job but what I'd do is write a script that drives the awk job, make it loop over the tables and 'at' that script.
Could you provide an example. I thought 'at' command will run a particular command at a given time and i would have to specify time but could it be possible to automatically schedule jobs one after the other. Currently I am running my awk commands like this:

Code:
for ((counter=1;counter<=50;counter++));do

awk -v var="$counter" < "$INPUT_PATH""$file" '{print $var}' > "$OUTPUT_PATH""$col_name"

done
I am actually dumping awk command in each iteration into separate bash scripts and then running each bash script using nohup in background.

Thanks!

Last edited by ip_address; 05-07-2013 at 04:59 PM.
 
Old 05-07-2013, 06:05 PM   #4
suicidaleggroll
Senior Member
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 3,221

Rep: Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155
First you should determine why this job takes so long. If the CPU is the bottle neck, then running more of them on more CPUs will speed it up, but my hunch says that the CPU is not the problem, it's the disk I/O that's slowing things down, and running more of them is just going to slow everything down to a crawl. You can check this by running one instance and then watching top. If the process is using 100% CPU then the CPU is the bottleneck, otherwise it's probably disk access that's slowing things down (you should see a relatively high number for the %wa category at the top in this case).

Also, how much RAM does this process use, do you even have enough to handle 50 at a time? You may be running out of memory and the OOM killer is killing off your processes to keep from locking up the machine.

Last edited by suicidaleggroll; 05-07-2013 at 06:10 PM.
 
2 members found this post helpful.
Old 05-07-2013, 06:25 PM   #5
ip_address
Member
 
Registered: Apr 2012
Distribution: RedHat
Posts: 42

Original Poster
Rep: Reputation: 2
Quote:
Originally Posted by suicidaleggroll View Post
First you should determine why this job takes so long. If the CPU is the bottle neck, then running more of them on more CPUs will speed it up, but my hunch says that the CPU is not the problem, it's the disk I/O that's slowing things down, and running more of them is just going to slow everything down to a crawl. You can check this by running one instance and then watching top. If the process is using 100% CPU then the CPU is the bottleneck, otherwise it's probably disk access that's slowing things down (you should see a relatively high number for the %wa category at the top in this case).

Also, how much RAM does this process use, do you even have enough to handle 50 at a time? You may be running out of memory and the OOM is killing off your processes to keep from locking up the machine.
For a single job cpu utilization is 100% but you are correct, I am facing OOM problem. Why kernel kills all the 50 jobs. It could kill some of them, reclaim memory and let other continue running. Any thoughts?
 
Old 05-07-2013, 06:48 PM   #6
suicidaleggroll
Senior Member
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 3,221

Rep: Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155
Quote:
Originally Posted by ip_address View Post
For a single job cpu utilization is 100% but you are correct, I am facing OOM problem. Why kernel kills all the 50 jobs. It could kill some of them, reclaim memory and let other continue running. Any thoughts?
The OOM killer is really a sledge-hammer last resort to keep the system from locking up. It doesn't have the time to kill one process, wait to see if things are ok, kill another process, wait some more. Instead it hits the threshold where things are about to get REAL bad, REAL fast, and it goes on a killing rampage to make sure the system doesn't lock up.

The OOM killer isn't your friend, it doesn't play nice, it's a mass murdering lunatic who has one job...kill anything and everything that could potentially be causing a problem. It's up to the user to make sure they don't push the system to the point where the OOM killer is unleashed.

http://lwn.net/Articles/317814/

Last edited by suicidaleggroll; 05-07-2013 at 06:54 PM.
 
1 members found this post helpful.
Old 05-07-2013, 09:44 PM   #7
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
In bash you could start 6 copies, collecting the pids, then check every eg 15 mins and start another for each one that has completed.
Probably store the pids in an array.

In Perl (or eg C), you have the extra choices of multi-threading (use thread-global vars to track) or multi-process (use SIGCHLD).
 
Old 05-07-2013, 10:25 PM   #8
sundialsvcs
Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 5,455

Rep: Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172Reputation: 1172
I suggest that you abandon the multi-thread approach ... it will probably be faster without it.

Here's why: consider what the disk drive is doing. Ultimately, the entire process is "I/O bound" because those billions of rows are on-disk. The CPU probably has very little to do ... it's just waiting for I/O.

If a single process is reading from the file, the read/write head assembly on the disk probably isn't moving-around too much. But, if multiple processes are reading from multiple files and/or from different places in the same file, "seeking" is occurring non-stop.
 
Old 05-07-2013, 11:25 PM   #9
NevemTeve
Senior Member
 
Registered: Oct 2011
Location: Budapest
Distribution: Debian/GNU/Linux, AIX
Posts: 1,947

Rep: Reputation: 524Reputation: 524Reputation: 524Reputation: 524Reputation: 524Reputation: 524
First of all, you should find out why does your program "vanish". Use gdb.
http://dirac.org/linux/gdb/06-Debugg...ng_Process.php
 
Old 05-12-2013, 01:39 PM   #10
bigearsbilly
Senior Member
 
Registered: Mar 2004
Location: england
Distribution: FreeBSD, Debian, Mint, Puppy
Posts: 3,314

Rep: Reputation: 175Reputation: 175
It's hard to say without a clue as to what your jobs do.
I would split the 50 jobs into, say, 5 lots of 10 jobs and run them like that. Experiment.

You can use top or uptime to check the load average, if it starts going much above 1 you are becoming inefficient.
Use vmstat 3 to check how your memory is being used.

If you are extracting all the data first then dumping to file maybe you run out of memory, likely if you processes get killed.

Are you processing the columns or just copying? If it's simple, using C writing a line at a time in append mode is very efficient.
 
Old 05-12-2013, 02:23 PM   #11
suicidaleggroll
Senior Member
 
Registered: Nov 2010
Location: Colorado
Distribution: OpenSUSE, CentOS
Posts: 3,221

Rep: Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155Reputation: 1155
Quote:
Originally Posted by bigearsbilly View Post
You can use top or uptime to check the load average, if it starts going much above 1 you are becoming inefficient.
OP has a 6 core machine, at a load of 1 it's barely even getting warmed up

http://blog.scoutapp.com/articles/20...-load-averages
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] multiple jobs on a multicore processor portia Slackware 7 05-03-2011 07:30 PM
Running Multiple Jobs on Linux Servers InfoSearcher Linux - Newbie 2 07-28-2010 10:17 PM
multiple cron jobs jnreddy Linux - Server 7 07-16-2010 01:32 AM
[SOLVED] bash jobs over multiple instances keeperofdakeys Linux - Software 2 06-06-2010 05:07 AM
multiple print jobs jeepster81 Linux - Software 2 02-27-2008 09:03 AM


All times are GMT -5. The time now is 04:40 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration