shell scripting a cpu intensive process on 2 cores

dasy2k1 · 07-12-2008, 05:51 PM

say i want to run a very cpu intensive non multi threaded process on all the files in a certian directory (and suppose i cant just do this with "process ./*")

i could use somthing like

Code:

for $FILE in `ls` do
process $FILE
done

this however only uses one of my 2 processor cores as the for loop waits for the process on the first iteration to compleate before starting the second.

is it possible to have 2 incidences running at the same time to use all cores in my box and halve the time taken...

something akin to the -j3 switch in make

ErV · 07-12-2008, 07:35 PM

Quote:

Originally Posted by dasy2k1

this however only uses one of my 2 processor cores as the for loop waits for the process on the first iteration to compleate before starting the second.

is it possible to have 2 incidences running at the same time to use all cores in my box and halve the time taken...

running command with "&" (like "fortune &") should start it in background, while printing proccess ID. "wait" command waits until process terminates. However, I think that easiest way to use both cores will be to split "ls" output in two halves, then run two processes - first will process first half or ls output, while second process will process second half. This is not perfect, but it is easy enough to do.

salasi · 07-13-2008, 01:57 AM

Be aware. This is not a answer. In fact some of the questions are only vaguely relevant to your particular question.

Welcome, in a rather indirect way, to the problem of parallelisation!

The previous suggestion was a rather good one for this particular task, but note that it doesn't very easily lead to a general solution of "how to take advantage of multiple cores in cpu-intensive scripting applications".

This is probably because currently there isn't one and it might be worth thinking, albeit briefly, about why that might be. In general, if you have a situation where the obvious (whatever that means) way of structuring the task leads to one stage producing some processed information that is then passed on to another stage, if you wait for all of stage one to complete before moving on to stage two, you have constructed a situation in which you have made parallelisation a bit tricky.

Whereas there are a number of tricks that you might use to avoid having to wait for stage one to complete before you could start on stage two, I don't think anyone has a magic 'apply switch qqq13' style solution for this in scripting. And I'm not holding my breath for one, either.

Oh, and as many tasks are I/O limited, rather than cpu limited, for them the only important thing is to use the I/O 'slots' efficiently. So a general, one-size-fits-all, just-add-water, solution to these problems doesn't seem close at hand.

Stretching a point, as you mention 'waits', you might want to wonder what you would like n cores to do, when you come to a wait loop. Have each core wait for 1/n of the time that a single core would wait?

(PS, if you do read this and think 'what an idiot, it is actually trivial to solve those problems, if only you approach them in this way' please do get in touch.)

ErV · 07-13-2008, 08:02 AM

Quote:

Originally Posted by salasi

Be aware. This is not a answer. In fact some of the questions are only vaguely relevant to your particular question.

Welcome, in a rather indirect way, to the problem of parallelisation!

Implementing a well-made task pool in shell is a waste of time, IMHO, especilly for such problem.

Quote:

Originally Posted by salasi

Whereas there are a number of tricks that you might use to avoid having to wait for stage one to complete before you could start on stage two, I don't think anyone has a magic 'apply switch qqq13' style solution for this in scripting. And I'm not holding my breath for one, either.

You don't wait for stage one to complete. You launch both stages immediately and wait for them both to terminate.
Since it is shell, best bet will be simply launch two childs and forget about them, since shell is not suitable for precise control over threads.

Quote:

Originally Posted by salasi

Stretching a point, as you mention 'waits', you might want to wonder what you would like n cores to do, when you come to a wait loop.

With "wait" there are three processes. The "root" process is the that launches two childs. The childs are supposed to get hold on both cores, with root spending near zero of cpu time. This is because "wait" is a system call especially designed for thread control. There is a chance that there is no "wait loop" (if there is, I'll be extremely disappointed with kernel code quality) and calling "wait" probably puts the whole thread into sleep state, so "wait" is most likely handled by a kernel's task scheduler. You can check if this true or not by digging kernel sources. I'm assuming that "there is no loop" because on multithreaded systems using "sleep" and "wait" commands normally reduces thread CPU usage (almost to zero), while those commands are being called, while infinite loops nomally give 100% cpu usage.
In provided example this is not really necessarry to wait for both childs. Root process could run two childs (this will require using (program name &)) and exit.

Quote:

Originally Posted by salasi

Have each core wait for 1/n of the time that a single core would wait?

This is completely wrong.
1) Given PID "wait" waits until process terminates. It DOESN'T wait for "N milliseconds" or something. There is a "sleep" for that. Wait can't return 1/2 time before process terminates on dual-core machine.
2) Even with "sleep" command and even assuming that it uses "loop" it is not possible to execute single thread on several cores. It'll break program logic badly, so it is not possible. One thread can be executed only on one core no matter what. Of course, there is no warranty that this thread will be running on a core0 all the time. OS can decide to use different core for this thread for every millisecond.
3) There is no point to write precise thread control program in shell. There doesn't seem to be enough tools for that (semaphores, mutexes and such are required. Of course, maybe they are available and I simply don't know about theem). You need another programming language with threading support. See if python has such functions, it should. Or use C/C++.

dasy2k1 · 07-14-2008, 10:15 AM

While this theoretical discussion is very interesting im not interested in an elegant or totally precice control program

the issue is that i have 57GB of video files that need crunching with ffmpeg, and a dual core machine to do them with.

would it be possable to script this using make?
as make can already handle parrellisation with the -j flag?

chrism01 · 07-15-2008, 01:18 AM

In short, divide your video files into n groups eg n subdirs, then run n instances of your normal script, pointing each one to a separate sub-dir.
Eg create a loop around your usual code (eg a parent script to loop calling the processing script) and feed the list of dirs to it, something like

Code:

for subdir in `cat subdirlist.txt`
do
    convert_videos.sh $subdir &
done

As mentioned, in a job like this its likely I/O will be the limiter, so you can use more subdirs/processes than you have cores.

ErV · 07-15-2008, 08:12 AM

Quote:

Originally Posted by dasy2k1

While this theoretical discussion is very interesting im not interested in an elegant or totally precice control program

I've told you easied way to do it before. Generate ls output, split it in two halves (assuming you are using dual-core) and launch two child processes at once.

Quote:

Originally Posted by dasy2k1

the issue is that i have 57GB of video files that need crunching with ffmpeg, and a dual core machine to do them with.

x264 codec can use several cores simultaneously during video encoding, increasing encoding speed (there is a really small quality loss, though), but this codec is overall slower. You can try to use it instead of ffmpeg. In case of using x264 you won't need to bother with multicores within shell script.

Quote:

Originally Posted by dasy2k1

would it be possable to script this using make?

Yes.

smoked kipper · 07-15-2008, 12:05 PM

Quote:

for $FILE in `ls`

`ls` is pointless.

Code:

for file in *; ...

BTW, $ is misplaced here, it is only used for de-referencing a variable.

Quote:

Originally Posted by chrism01

As mentioned, in a job like this its likely I/O will be the limiter, so you can use more subdirs/processes than you have cores.

Er, maybe on your planet. Down here, crunching video files sucks 110% cpu.

ErV has supplied the obvious solution. E.g.

Code:

find -type f | xargs -n 2 echo > filelist # list of files, 2 per line
cut -d' ' -f1 filelist > list1
cut -d' ' -f2 filelist > list2

pass list1 to one ffmpeg process (or a script that loops over the files), pass list2 to another. Adjust find command to taste.

chrism01 · 07-16-2008, 01:37 AM

57GB of data / 4KB disk block = 14,250,000 disk block reads (approx), similarly for output (disk writes), depending on how much there is ie data conv ratio.
Also, 57GB in + (say) 57 GB out = 114 GB RAM used ... avg home PC has 2GB RAM?

ErV · 07-16-2008, 01:52 AM

Quote:

Originally Posted by chrism01

57GB of data / 4KB disk block = 14,250,000 disk block reads (approx), similarly for output (disk writes), depending on how much there is ie data conv ratio.

1) There is a thing called DMA. Greatly reduces cpu load during disk I/O.
2) 14250000 disk reads is really small number of function calls when compared to video encoding complexity.
3) Unless video encoder can process 10..40 MB/s (make it 3..5 if drive is in PIO mode) input stream (it's not video bitrate, it's how fast data is being pushed into encoder) in realtime, you'll never notice any encoding performance problems due to the I/O. It's not 3.5 floppy drive.

Quote:

Originally Posted by chrism01

Also, 57GB in + (say) 57 GB out = 114 GB RAM used ...

X_X
Why in the world have you decided that this'll need 114GB RAM?
Video encoders NEVER load entire movie into memory.

chrism01 · 07-16-2008, 05:20 AM

OP said he has 57GB of files. all that must pass through RAM to get to the cpu for conv and the new version has to pass back out again on the way to the disk...

ErV · 07-16-2008, 08:10 AM

Quote:

Originally Posted by chrism01

OP said he has 57GB of files. all that must pass through RAM to get to the cpu for conv and the new version has to pass back out again on the way to the disk...

Look, no video converter loads whole file in ram.

And when given filelist, no video converter loads all files from list into RAM.

Although, yes, files must pass through RAM, they doesn't have to be loaded into RAM completely and all at once.

Or how do you think 256 mb machine can encode 4.7GB dvd's into AVI's?

Video/audio encoding is handled as a stream - I.e. conversion utility load block of input (not a disk block) into buffer, processes it, then immediately append result to destination file. Then it loads next block of input into the same buffer, reusing memory.

If OP by chance use a convertor that DOES load whole files to RAM, then I recommend uninstalling it and using normal software, like mencoder.

I've been converting a lot of video files at once using mencoder, and that didn't require as much RAM as you say. So when you talk about 57GB of RAM, you are talking nonsense.

dasy2k1 · 07-16-2008, 02:13 PM

in the end i just split the list manually into 2 folders and ran the same script on the 2 folders,
and no i dont have 114G of RAM, i have 2G and it only used just over 1G when both processes were running,
it was more likley that i would have troubles running out of disk space
(57G in + about 18G out)

ps i was using ffmpeg though i had played around with mencoder but was getting severe audio/video desync

sundialsvcs · 07-17-2008, 12:24 AM

Generally speaking, with a problem like this one you need to focus on whatever-it-is that is most likely to be "the ruling constraint."

Nearly always, and certainly in this case, the constraint is going to be the I/O subsystem. The disk drive(s) have to transport many gigabytes of data to and from storage. The CPU, by comparison, has a trivial amount of work to do.

If you did this sort of thing frequently, a high-performance I/O subsystem and the use of multiple drives would be important, so that the drives would not spend much time "seeking" .. that is, moving the read/write head around. Through a nice, capacious I/O channel, you'd be slurping data from one drive, converting it, and sending it out to the other drive (which might live on an entirely separate controller, pinned to a different IRQ).

You really can't control what the CPU core(s) are doing at any particular time... you just hope to take reasonable advantage of them. But since the CPU isn't going to be the bottleneck, it's much more important to be sure that you don't burden the I/O system.

dasy2k1 · 07-18-2008, 06:20 AM

Quote:

Originally Posted by sundialsvcs

Nearly always, and certainly in this case, the constraint is going to be the I/O subsystem. The disk drive(s) have to transport many gigabytes of data to and from storage. The CPU, by comparison, has a trivial amount of work to do

Hate to break it to you but re encoding video (from mpeg2 to Xvid) the constraint is certainly in cpu!

i can copy 50 odd gb on my hard drive (io limited) in about 20 mins,

to crunch 50 gb of video takes 12-14 hours with the cpu usage at 100% for the whole time. running 2 processes at once to use both cores drops this to about 8 hours