LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 07-12-2008, 05:51 PM   #1
dasy2k1
Member
 
Registered: Oct 2005
Location: 127.0.0.1
Distribution: Manjaro
Posts: 963

Rep: Reputation: 36
shell scripting a cpu intensive process on 2 cores


say i want to run a very cpu intensive non multi threaded process on all the files in a certian directory (and suppose i cant just do this with "process ./*")

i could use somthing like

Code:
for $FILE in `ls` do
process $FILE
done
this however only uses one of my 2 processor cores as the for loop waits for the process on the first iteration to compleate before starting the second.

is it possible to have 2 incidences running at the same time to use all cores in my box and halve the time taken...

something akin to the -j3 switch in make
 
Old 07-12-2008, 07:35 PM   #2
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by dasy2k1 View Post
this however only uses one of my 2 processor cores as the for loop waits for the process on the first iteration to compleate before starting the second.

is it possible to have 2 incidences running at the same time to use all cores in my box and halve the time taken...
running command with "&" (like "fortune &") should start it in background, while printing proccess ID. "wait" command waits until process terminates. However, I think that easiest way to use both cores will be to split "ls" output in two halves, then run two processes - first will process first half or ls output, while second process will process second half. This is not perfect, but it is easy enough to do.
 
Old 07-13-2008, 01:57 AM   #3
salasi
Senior Member
 
Registered: Jul 2007
Location: Directly above centre of the earth, UK
Distribution: SuSE, plus some hopping
Posts: 4,070

Rep: Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897Reputation: 897
Be aware. This is not a answer. In fact some of the questions are only vaguely relevant to your particular question.

Welcome, in a rather indirect way, to the problem of parallelisation!

The previous suggestion was a rather good one for this particular task, but note that it doesn't very easily lead to a general solution of "how to take advantage of multiple cores in cpu-intensive scripting applications".

This is probably because currently there isn't one and it might be worth thinking, albeit briefly, about why that might be. In general, if you have a situation where the obvious (whatever that means) way of structuring the task leads to one stage producing some processed information that is then passed on to another stage, if you wait for all of stage one to complete before moving on to stage two, you have constructed a situation in which you have made parallelisation a bit tricky.

Whereas there are a number of tricks that you might use to avoid having to wait for stage one to complete before you could start on stage two, I don't think anyone has a magic 'apply switch qqq13' style solution for this in scripting. And I'm not holding my breath for one, either.

Oh, and as many tasks are I/O limited, rather than cpu limited, for them the only important thing is to use the I/O 'slots' efficiently. So a general, one-size-fits-all, just-add-water, solution to these problems doesn't seem close at hand.

Stretching a point, as you mention 'waits', you might want to wonder what you would like n cores to do, when you come to a wait loop. Have each core wait for 1/n of the time that a single core would wait?

(PS, if you do read this and think 'what an idiot, it is actually trivial to solve those problems, if only you approach them in this way' please do get in touch.)
 
Old 07-13-2008, 08:02 AM   #4
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by salasi View Post
Be aware. This is not a answer. In fact some of the questions are only vaguely relevant to your particular question.

Welcome, in a rather indirect way, to the problem of parallelisation!
Implementing a well-made task pool in shell is a waste of time, IMHO, especilly for such problem.

Quote:
Originally Posted by salasi View Post
Whereas there are a number of tricks that you might use to avoid having to wait for stage one to complete before you could start on stage two, I don't think anyone has a magic 'apply switch qqq13' style solution for this in scripting. And I'm not holding my breath for one, either.
You don't wait for stage one to complete. You launch both stages immediately and wait for them both to terminate.
Since it is shell, best bet will be simply launch two childs and forget about them, since shell is not suitable for precise control over threads.

Quote:
Originally Posted by salasi View Post
Stretching a point, as you mention 'waits', you might want to wonder what you would like n cores to do, when you come to a wait loop.
With "wait" there are three processes. The "root" process is the that launches two childs. The childs are supposed to get hold on both cores, with root spending near zero of cpu time. This is because "wait" is a system call especially designed for thread control. There is a chance that there is no "wait loop" (if there is, I'll be extremely disappointed with kernel code quality) and calling "wait" probably puts the whole thread into sleep state, so "wait" is most likely handled by a kernel's task scheduler. You can check if this true or not by digging kernel sources. I'm assuming that "there is no loop" because on multithreaded systems using "sleep" and "wait" commands normally reduces thread CPU usage (almost to zero), while those commands are being called, while infinite loops nomally give 100% cpu usage.
In provided example this is not really necessarry to wait for both childs. Root process could run two childs (this will require using (program name &)) and exit.
Quote:
Originally Posted by salasi View Post
Have each core wait for 1/n of the time that a single core would wait?
This is completely wrong.
1) Given PID "wait" waits until process terminates. It DOESN'T wait for "N milliseconds" or something. There is a "sleep" for that. Wait can't return 1/2 time before process terminates on dual-core machine.
2) Even with "sleep" command and even assuming that it uses "loop" it is not possible to execute single thread on several cores. It'll break program logic badly, so it is not possible. One thread can be executed only on one core no matter what. Of course, there is no warranty that this thread will be running on a core0 all the time. OS can decide to use different core for this thread for every millisecond.
3) There is no point to write precise thread control program in shell. There doesn't seem to be enough tools for that (semaphores, mutexes and such are required. Of course, maybe they are available and I simply don't know about theem). You need another programming language with threading support. See if python has such functions, it should. Or use C/C++.

Last edited by ErV; 07-13-2008 at 08:11 AM.
 
Old 07-14-2008, 10:15 AM   #5
dasy2k1
Member
 
Registered: Oct 2005
Location: 127.0.0.1
Distribution: Manjaro
Posts: 963

Original Poster
Rep: Reputation: 36
While this theoretical discussion is very interesting im not interested in an elegant or totally precice control program

the issue is that i have 57GB of video files that need crunching with ffmpeg, and a dual core machine to do them with.

would it be possable to script this using make?
as make can already handle parrellisation with the -j flag?
 
Old 07-15-2008, 01:18 AM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,360

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
In short, divide your video files into n groups eg n subdirs, then run n instances of your normal script, pointing each one to a separate sub-dir.
Eg create a loop around your usual code (eg a parent script to loop calling the processing script) and feed the list of dirs to it, something like

Code:
for subdir in `cat subdirlist.txt`
do
    convert_videos.sh $subdir &
done
As mentioned, in a job like this its likely I/O will be the limiter, so you can use more subdirs/processes than you have cores.
 
Old 07-15-2008, 08:12 AM   #7
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by dasy2k1 View Post
While this theoretical discussion is very interesting im not interested in an elegant or totally precice control program
I've told you easied way to do it before. Generate ls output, split it in two halves (assuming you are using dual-core) and launch two child processes at once.

Quote:
Originally Posted by dasy2k1 View Post
the issue is that i have 57GB of video files that need crunching with ffmpeg, and a dual core machine to do them with.
x264 codec can use several cores simultaneously during video encoding, increasing encoding speed (there is a really small quality loss, though), but this codec is overall slower. You can try to use it instead of ffmpeg. In case of using x264 you won't need to bother with multicores within shell script.

Quote:
Originally Posted by dasy2k1 View Post
would it be possable to script this using make?
Yes.
 
Old 07-15-2008, 12:05 PM   #8
smoked kipper
Member
 
Registered: May 2008
Location: UK
Distribution: Slackware,Slamd64
Posts: 81

Rep: Reputation: 15
Quote:
for $FILE in `ls`
`ls` is pointless.

Code:
for file in *; ...
BTW, $ is misplaced here, it is only used for de-referencing a variable.

Quote:
Originally Posted by chrism01 View Post
As mentioned, in a job like this its likely I/O will be the limiter, so you can use more subdirs/processes than you have cores.
Er, maybe on your planet. Down here, crunching video files sucks 110% cpu.

ErV has supplied the obvious solution. E.g.

Code:
find -type f | xargs -n 2 echo > filelist # list of files, 2 per line
cut -d' ' -f1 filelist > list1
cut -d' ' -f2 filelist > list2
pass list1 to one ffmpeg process (or a script that loops over the files), pass list2 to another. Adjust find command to taste.

Last edited by smoked kipper; 07-15-2008 at 12:06 PM.
 
Old 07-16-2008, 01:37 AM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,360

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
57GB of data / 4KB disk block = 14,250,000 disk block reads (approx), similarly for output (disk writes), depending on how much there is ie data conv ratio.
Also, 57GB in + (say) 57 GB out = 114 GB RAM used ... avg home PC has 2GB RAM?
 
Old 07-16-2008, 01:52 AM   #10
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Quote:
Originally Posted by chrism01 View Post
57GB of data / 4KB disk block = 14,250,000 disk block reads (approx), similarly for output (disk writes), depending on how much there is ie data conv ratio.
1) There is a thing called DMA. Greatly reduces cpu load during disk I/O.
2) 14250000 disk reads is really small number of function calls when compared to video encoding complexity.
3) Unless video encoder can process 10..40 MB/s (make it 3..5 if drive is in PIO mode) input stream (it's not video bitrate, it's how fast data is being pushed into encoder) in realtime, you'll never notice any encoding performance problems due to the I/O. It's not 3.5 floppy drive.

Quote:
Originally Posted by chrism01 View Post
Also, 57GB in + (say) 57 GB out = 114 GB RAM used ...
X_X
Why in the world have you decided that this'll need 114GB RAM?
Video encoders NEVER load entire movie into memory.

Last edited by ErV; 07-16-2008 at 02:00 AM.
 
Old 07-16-2008, 05:20 AM   #11
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,360

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
OP said he has 57GB of files. all that must pass through RAM to get to the cpu for conv and the new version has to pass back out again on the way to the disk...
 
Old 07-16-2008, 08:10 AM   #12
ErV
Senior Member
 
Registered: Mar 2007
Location: Russia
Distribution: Slackware 12.2
Posts: 1,202
Blog Entries: 3

Rep: Reputation: 62
Thumbs down

Quote:
Originally Posted by chrism01 View Post
OP said he has 57GB of files. all that must pass through RAM to get to the cpu for conv and the new version has to pass back out again on the way to the disk...
Look, no video converter loads whole file in ram.

And when given filelist, no video converter loads all files from list into RAM.

Although, yes, files must pass through RAM, they doesn't have to be loaded into RAM completely and all at once.

Or how do you think 256 mb machine can encode 4.7GB dvd's into AVI's?

Video/audio encoding is handled as a stream - I.e. conversion utility load block of input (not a disk block) into buffer, processes it, then immediately append result to destination file. Then it loads next block of input into the same buffer, reusing memory.

If OP by chance use a convertor that DOES load whole files to RAM, then I recommend uninstalling it and using normal software, like mencoder.

I've been converting a lot of video files at once using mencoder, and that didn't require as much RAM as you say. So when you talk about 57GB of RAM, you are talking nonsense.

Last edited by ErV; 07-16-2008 at 08:13 AM.
 
Old 07-16-2008, 02:13 PM   #13
dasy2k1
Member
 
Registered: Oct 2005
Location: 127.0.0.1
Distribution: Manjaro
Posts: 963

Original Poster
Rep: Reputation: 36
in the end i just split the list manually into 2 folders and ran the same script on the 2 folders,
and no i dont have 114G of RAM, i have 2G and it only used just over 1G when both processes were running,
it was more likley that i would have troubles running out of disk space
(57G in + about 18G out)

ps i was using ffmpeg though i had played around with mencoder but was getting severe audio/video desync

Last edited by dasy2k1; 07-16-2008 at 02:15 PM.
 
Old 07-17-2008, 12:24 AM   #14
sundialsvcs
LQ Guru
 
Registered: Feb 2004
Location: SE Tennessee, USA
Distribution: Gentoo, LFS
Posts: 10,668
Blog Entries: 4

Rep: Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945Reputation: 3945
Generally speaking, with a problem like this one you need to focus on whatever-it-is that is most likely to be "the ruling constraint."

Nearly always, and certainly in this case, the constraint is going to be the I/O subsystem. The disk drive(s) have to transport many gigabytes of data to and from storage. The CPU, by comparison, has a trivial amount of work to do.

If you did this sort of thing frequently, a high-performance I/O subsystem and the use of multiple drives would be important, so that the drives would not spend much time "seeking" .. that is, moving the read/write head around. Through a nice, capacious I/O channel, you'd be slurping data from one drive, converting it, and sending it out to the other drive (which might live on an entirely separate controller, pinned to a different IRQ).

You really can't control what the CPU core(s) are doing at any particular time... you just hope to take reasonable advantage of them. But since the CPU isn't going to be the bottleneck, it's much more important to be sure that you don't burden the I/O system.
 
Old 07-18-2008, 06:20 AM   #15
dasy2k1
Member
 
Registered: Oct 2005
Location: 127.0.0.1
Distribution: Manjaro
Posts: 963

Original Poster
Rep: Reputation: 36
Quote:
Originally Posted by sundialsvcs View Post

Nearly always, and certainly in this case, the constraint is going to be the I/O subsystem. The disk drive(s) have to transport many gigabytes of data to and from storage. The CPU, by comparison, has a trivial amount of work to do
Hate to break it to you but re encoding video (from mpeg2 to Xvid) the constraint is certainly in cpu!

i can copy 50 odd gb on my hard drive (io limited) in about 20 mins,

to crunch 50 gb of video takes 12-14 hours with the cpu usage at 100% for the whole time. running 2 processes at once to use both cores drops this to about 8 hours
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Disable CPU cores in RHEL3 ElectroLinux Linux - Server 1 03-31-2008 05:34 PM
Software and CPU Cores DIGITAL39 Linux - Software 8 10-10-2007 02:47 AM
how to find number of cores in CPU narensr Linux - Hardware 5 08-24-2006 01:09 PM
non cpu intensive GUI needed phillips321 Linux - Software 8 04-11-2005 11:01 AM
FC3 seems very CPU-intensive deadmilkman Fedora 6 12-30-2004 05:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 08:56 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration