I admit to lazyness: Can I "fork" a stream of data (dual use)?
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I admit to lazyness: Can I "fork" a stream of data (dual use)?
This weekend I used Steve Litt's rawread-scheme to duplicate SuSE 8.2 .
I also followed his advice, and compared the md5sums of source and target CD-ROMs. That means reading five disks three times (once getting the data, then getting the original md5sum, then after burning the new CD). Well, to do that 15 times to a CD ...
So here is my question(s): When I copy the source-CD with something like
Code:
dd if=/dev/cdrom of=/my/temporary/iso bs=n cont=m
how can I use the output to the hard-disk of dd (like "of=/my/temporary/iso") to be used simultaneously by md5sum? I mean, why reading the source-CD twice, once for the temporary image and once for md5sum?
I was wondering, whether there is something like "||" for two parallel / diverging pipes or any possibility to influence stdout (1>) to do that?
Actually I'd imagine that md5sum could also work in-between if= and of= of the dd-statement, but that would bust dd ... so I need probably a "cloned" data-stream for md5sum but how do I get that?
P.S.: While I'm at it, Is there a utility to compare a set (source and target) of md5sums?
After dumping the ISO image, just enter the following command: md5sum /my/temporary/iso Making the md5sum of the ISO image from harddisk is much faster than from CD. the magic command to use the standard output of a program twice is tee . To send output of dd to stdout, omit the of= option.
from man md5sum
md5sum [OPTION] --check [FILE]
DESCRIPTION
Print or check MD5 (128-bit) checksums. With no FILE, or when FILE is -, read standard input.
-c, --check
check MD5 sums against given list
so: create a file with file names and md5sums (i.e. the "given list") and issue the
md5sum program with the options above.
Apparently, you like parallel processes. try the following:
dd if=/dev/cdrom bs=n cont=m | tee /my/temporary/iso | md5sum
Originally posted by stonux After dumping the ISO image, just enter the following command: md5sum /my/temporary/iso
Making the md5sum of the ISO image from harddisk is much faster than from CD.
But that's so ... so ... so inartistic .
And it still means reading data twice.
Moreover, it doesn't discover errors during transfer from the CD to the hard-disk .
Quote:
Originally posted by stonux the magic command to use the standard output of a program twice is tee . To send output of dd to stdout, omit the of= option.
Absolutely M*A*R*V*E*L*O*U*S
(read that as bold+italic+underlined+blinking+magenta on turquoise Background).
That's what I was looking for. And it works, yippeeee.
I was simply stumped, I didn't know where to look. Thanks a load.
This forum is fun .
Quote:
Originally posted by stonux
so: create a file with file names and md5sums (i.e. the "given list") and issue the
md5sum
program with the options above.
That's what I will try next. Thanks in advance.
Quote:
Originally posted by stonux
Apparently, you like parallel processes.
Who doesn't? Reading data once using them multiply -- that's it .
Quote:
Originally posted by stonux
Try the following:
dd if=/dev/cdrom bs=n cont=m | tee /my/temporary/iso | md5sum
Did that, and it worked, even with the bash script (rawread).
perhaps the tee command is what you're looking for.
I guess you can use this in a pipe...
dd if=/dev/cdrom | tee some.iso | md5sum -
?
but I'm not sure how efficient this really is. dd seams to write large blocks at once, and perhaps tee doesn't. Use the "time" command to find out what is faster.
md5sum -c is used the check a file, it requires an .md5 file as input. This file contains some master list of all files, and their md5 sums.
Yes, it was tee, and it is definitely faster (subjectively at least). Remember, that dd is reading by DMA slowly from a CD-ROM and writing by DMA to the disk, while md5sum hashing is done by the CPU. I'd guess, in this case there is little competition for resources.
Quote:
This file contains some master list of all files, and their md5 sums.
Yes, but in what format? You can see in my last post what I used -- without success.
There it says one has to specify the file type (i.e. "." for text): Note, that after the md5sum-result comes
Code:
<blank><minus><blank><dot><blank><file name>
Obviously, that stuck in md5sum's gullet, so first I left out the <blank><dot>.
The next problem was the minus sign after the md5sum-result. That was part of the output of md5sum to the screen (stdout). When redirected to a file, the output was <md5sum-result><blank><filename>, which I could use directly as input for the next run of md5sum.
So, thanks a lot, stonux and yapp, for a most instructive thread.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.