LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Converting video formats (https://www.linuxquestions.org/questions/programming-9/converting-video-formats-4175429488/)

BenCollver 10-06-2012 11:22 AM

As I read the man page, the -exec option does not execute multiple commands in parallel. With the ; suffix, it substitutes a single file in the command. With the + suffix, it substitutes multiple files in the command. Either way, it runs a single command at a time.

porphyry5 10-07-2012 09:35 AM

Quote:

Originally Posted by BenCollver2 (Post 4798873)
As I read the man page, the -exec option does not execute multiple commands in parallel. With the ; suffix, it substitutes a single file in the command. With the + suffix, it substitutes multiple files in the command. Either way, it runs a single command at a time.

We're reading the man page the same way, but I think we're talking at cross purposes. What I'm asking is: what would be the effective difference between
Code:

find . -type f -name '*.ogg' -print0 | xargs -0 -n 1 -P 2 ./transcode.sh
and
find . -type f -name '*.ogg' -print0 -exec xargs -0 -n 1 -P 2 ./transcode.sh {} +

Presumably both are feeding a stream of filenames to xargs, and it is xargs that runs two commands in parallel.

Please forgive me for imposing on your patience by belaboring this point, but when I first began using find (in place of ls) I assumed one would feed its output through a pipe to the next command, as one does with ls. Surprise, it didn't work, had to use the -exec option to get it to go. Yet here you are using a pipe for find's output. The only other difference is your use of -print0, but that seems to be for the benefit of xargs -0, not for enabling the pipe.

TobiSGD 10-07-2012 10:08 AM

Forgive me my ignorance, but doesn't xargs expect to be fed from stdin? So does this even work with the -exec option, since it does not have a pipe to stdin?

rknichols 10-07-2012 01:45 PM

Quote:

Originally Posted by porphyry5 (Post 4799423)
Code:

find . -type f -name '*.ogg' -print0 -exec xargs -0 -n 1 -P 2 ./transcode.sh {} +

That command line prints each filename on the terminal (--print0 that is not redirected) and tries to run xargs with its standard input coming from the terminal and a prototype command line that consists of "./transcode.sh" and all of the filenames as arguments (due to the "{} +"}. Then xargs will wait indefinitely for input. If you type a ctrl-D on the terminal, xargs will run that prototype command line with what it perceives as no additional arguments. Not exactly what you expected.

The net result will be a single execution of ./transcode.sh with all of the filenames as arguments (i.e., a single thread).

porphyry5 10-08-2012 08:02 AM

Quote:

Originally Posted by TobiSGD (Post 4799442)
Forgive me my ignorance, but doesn't xargs expect to be fed from stdin? So does this even work with the -exec option, since it does not have a pipe to stdin?

Right, but I'm trying to discover why it doesn't, what is the difference between find's -exec and a pipe?
I'm operating on these premises:
1) Any linux cli app accepts input from stdin
2) A pipe feeds the output of the previous app through stdin to the subsequent app
3) My earlier experience with find was that its output won't pipe, you use the -exec option to feed find's output to a subsequent app
4) BenCollver2 gave an example showing find will send output through a pipe, so my premise 3) is wrong
Code:

find . -type f -name '*.ogg' -print0 | xargs -0 -n 1 -P 2 ./transcode.sh
That being so, what is the effective difference between piping find's output and using its -exec option, other than the obvious, because in this case you must use a pipe, in that case you must use -exec?

Why does find have an -exec option at all? Why not just use a pipe in all cases?

Reuti 10-08-2012 08:19 AM

Quote:

Originally Posted by BenCollver2 (Post 4793602)
Assuming you have a processor with 2 cores, you could speed it up by transcoding 2 videos in parallel. Off the cuff, it could look like the following:
Code:

cores=2
i=0
for fl in *.ogg
do
    ffmpeg -i "$fl" -o "${fl%.*}.webm" >/dev/null 2>&1 &
    i=$(($i + 1))
    if [ $i -ge $cores ]
    then
        wait
        i=0
    fi
done


This is one way to go, but it assumes that the processes have all a similar processing time for the bunches of 2 which are computed at a time. With more and more cores being in a local machine, I started for some time already to install a queuing system also local on all the workstations of each user. Typical suspects are GridEngine, Torque or slurm. This way you submit the jobs and they are handled when resources become free (being it cores or memory).

ramram29 10-08-2012 08:27 AM

I recommend using dual disks. Get the fastest ones you can then read from one and write to the other. Using an ext2 temporary partition just for enconding also helps (no file system journal to slow you down). Ultimately, if your output is not too big less than 1G you can save the temporary encoded file to memory /dev/shm. However make sure you have enough free memory, lower the swappiness and also move the temp file right after it is created (add that to the script).

porphyry5 10-08-2012 08:28 AM

Quote:

Originally Posted by rknichols (Post 4799590)
That command line prints each filename on the terminal (--print0 that is not redirected) and tries to run xargs with its standard input coming from the terminal and a prototype command line that consists of "./transcode.sh" and all of the filenames as arguments (due to the "{} +"}. Then xargs will wait indefinitely for input. If you type a ctrl-D on the terminal, xargs will run that prototype command line with what it perceives as no additional arguments. Not exactly what you expected.

The net result will be a single execution of ./transcode.sh with all of the filenames as arguments (i.e., a single thread).

I think you are saying the xargs command will form 2 commands utilizing transcode.sh. In BenCollver2's command
Code:

find . -type f -name '*.ogg' -print0 | xargs -0 -n 1 -P 2 ./transcode.sh
the first filename produced by find will be included in the first command generated by xargs, the 2nd filename will be included in the 2nd command generated by xargs.
But in
Code:

find . -type f -name '*.ogg' -print0 -exec xargs -0 -n 1 -P 2 ./transcode.sh {} +
both filenames are included in the first command generated by xargs, which then hangs waiting for input to complete the 2nd command.

So if I have that right, then it means that a pipe feeds the previous app's output singly, line by line, to the subsequent app. But find's -exec option concatenates all the lines of output produced by find and feeds that concatenation as a single item to the subsequent app. Is that correct?

Reuti 10-08-2012 08:28 AM

Quote:

Originally Posted by porphyry5 (Post 4800202)
That being so, what is the effective difference between piping find's output and using its -exec option, other than the obvious, because in this case you must use a pipe, in that case you must use -exec?

Why does find have an -exec option at all? Why not just use a pipe in all cases?

– With a pipe and xargs you can feed a certain amount of arguments to the called application, with find … -exec it’s one or all.

– You can use more than one -exec to find.
– Some -exec variants execute the command after changing the working directory to the location of the found file.

Reuti 10-08-2012 08:57 AM

Quote:

Originally Posted by porphyry5 (Post 4800225)
But in
Code:

find . -type f -name '*.ogg' -print0 -exec xargs -0 -n 1 -P 2 ./transcode.sh {} +
both filenames are included in the first command generated by xargs, which then hangs waiting for input to complete the 2nd command.

So if I have that right, then it means that a pipe feeds the previous app's output singly, line by line, to the subsequent app. But find's -exec option concatenates all the lines of output produced by find and feeds that concatenation as a single item to the subsequent app. Is that correct?

The xargs is waiting for input – but there is none unless you type something or generate some:
Code:

echo foobar | find . -type f -name '*.ogg' -print0 -exec xargs -0 -n 1 -P 2 ./transcode.sh {} +
The piped input “foobar” will never be used (unless you add -I{} to the xargs command).

rknichols 10-08-2012 10:37 AM

Quote:

Originally Posted by porphyry5 (Post 4800202)
I'm operating on these premises:
1) Any linux cli app accepts input from stdin
2) A pipe feeds the output of the previous app through stdin to the subsequent app
3) My earlier experience with find was that its output won't pipe, you use the -exec option to feed find's output to a subsequent app

Most (not all) cli apps that take filenames as arguments will in the absence of any filename arguments accept a data stream, not a list of files to be opened, on stdin. I'll use the wc command (print newline, word, and byte counts for each file) as a convenient example because its output tells you exactly what files it processed.
Code:

[rkn] ~ $ wc /etc/profile /etc/fstab
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
  85  298 2855 total

With the filenames given as arguments, wc prints the line, word, and byte counts for each file and, since it processed more than one file, also prints the totals.
Code:

[rkn] ~ $ find /etc/profile /etc/fstab -print | wc
      2      2      24
[rkn] ~ $ echo /etc/profile /etc/fstab | wc
      1      2      24

Piping the output from find yields a completely different result since wc is just evaluating the data stream it received from find: 2 lines, 2 words, 24 characters. The result using echo is similar except that echo has sent its output on just one line.
Code:

[rkn] ~ $ cat /etc/passwd | wc /etc/profile /etc/fstab
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
  85  298 2855 total

The wc command completely ignored what it received on stdin since it was given filename arguments.

The xargs command reads from stdin and uses that data to generate a list of arguments to the prototype command line:
Code:

[rkn] ~ $ echo /etc/profile /etc/fstab | xargs wc
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
  85  298 2855 total

The argument list generated by xargs is simply appended to whatever arguments were given on the prototype command line:
Code:

[rkn] ~ $ echo /etc/profile /etc/fstab | xargs wc /etc/passwd
  51  93 2643 /etc/passwd
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
 136  391 5498 total

As a side note, the wc command can be told to read a list of NUL-termninated filenames on stdin and process those files. In this, it acts much like "xargs -0":
Code:

[rkn] ~ $ find /etc/profile /etc/fstab -print0 | wc --files0-from=-
64 182 1363 /etc/profile
21 116 1492 /etc/fstab
85 298 2855 total

Using wc in this manner allows processing an unlimited number of files in a single invocation (generating a single overall total). Passing an extremely long list to xargs could result in more than one invocation of the target command if the list would exceed the kernels limit on the maximum length of an argument list.

dugan 10-08-2012 08:09 PM

Quote:

Originally Posted by dugan (Post 4793544)
Just do one patent-encumbered h264/mp3/mp4 encode and one patent-unencumbered vp8/vorbis/webm encode. Something like this should cover you:

Code:

ffmpeg -i input.avi -vcodec libx264 -acodec mp3 movie.mp4
ffmpeg -i input.avi -vcodec libvpx_vp8 -acodec vorbis movie.webm


I've since discovered that it's safe to use AAC in the place of MP3. According to a website, every browser that supports MP3 also supports the superior AAC codec. So this would be even better:

Code:

ffmpeg -i input.avi -vcodec libx264 -acodec aac movie.mp4
ffmpeg -i input.avi -vcodec libvpx_vp8 -acodec vorbis movie.webm


porphyry5 10-09-2012 11:29 AM

Quote:

Originally Posted by Reuti (Post 4800228)
– With a pipe and xargs you can feed a certain amount of arguments to the called application, with find … -exec it’s one or all.

– You can use more than one -exec to find.
– Some -exec variants execute the command after changing the working directory to the location of the found file.

Quote:

The xargs is waiting for input – but there is none unless you type something or generate some:

Code:

echo foobar | find . -type f -name '*.ogg' -print0 -exec xargs -0 -n 1 -P 2 ./transcode.sh {} +
The piped input “foobar” will never be used (unless you add -I{} to the xargs command).
Thank you for these 2 posts, clearly there is much more to these apps than the NAME line in the man page reveals.

porphyry5 10-09-2012 11:36 AM

Quote:

Originally Posted by rknichols (Post 4800357)
Most (not all) cli apps that take filenames as arguments will in the absence of any filename arguments accept a data stream, not a list of files to be opened, on stdin. I'll use the wc command (print newline, word, and byte counts for each file) as a convenient example because its output tells you exactly what files it processed.
Code:

[rkn] ~ $ wc /etc/profile /etc/fstab
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
  85  298 2855 total

With the filenames given as arguments, wc prints the line, word, and byte counts for each file and, since it processed more than one file, also prints the totals.
Code:

[rkn] ~ $ find /etc/profile /etc/fstab -print | wc
      2      2      24
[rkn] ~ $ echo /etc/profile /etc/fstab | wc
      1      2      24

Piping the output from find yields a completely different result since wc is just evaluating the data stream it received from find: 2 lines, 2 words, 24 characters. The result using echo is similar except that echo has sent its output on just one line.
Code:

[rkn] ~ $ cat /etc/passwd | wc /etc/profile /etc/fstab
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
  85  298 2855 total

The wc command completely ignored what it received on stdin since it was given filename arguments.

The xargs command reads from stdin and uses that data to generate a list of arguments to the prototype command line:
Code:

[rkn] ~ $ echo /etc/profile /etc/fstab | xargs wc
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
  85  298 2855 total

The argument list generated by xargs is simply appended to whatever arguments were given on the prototype command line:
Code:

[rkn] ~ $ echo /etc/profile /etc/fstab | xargs wc /etc/passwd
  51  93 2643 /etc/passwd
  64  182 1363 /etc/profile
  21  116 1492 /etc/fstab
 136  391 5498 total

As a side note, the wc command can be told to read a list of NUL-termninated filenames on stdin and process those files. In this, it acts much like "xargs -0":
Code:

[rkn] ~ $ find /etc/profile /etc/fstab -print0 | wc --files0-from=-
64 182 1363 /etc/profile
21 116 1492 /etc/fstab
85 298 2855 total

Using wc in this manner allows processing an unlimited number of files in a single invocation (generating a single overall total). Passing an extremely long list to xargs could result in more than one invocation of the target command if the list would exceed the kernels limit on the maximum length of an argument list.

Thank you very much for this post, it is a very clear demonstration of what is going on in each of the various command lines, and adds a whole new dimension to the ever-useful wc command. I'm beginning to think linux apps are like onions; you may think you've reached the core of them, but if you dig a little you find a deeper layer of complexity beneath the one you've become familiar with.


All times are GMT -5. The time now is 12:20 AM.