DD with tee to multiple outputs

Ictcmn · 08-29-2015, 02:12 PM

Hello!

I am quite new to Linux, and I have run across a problem using dd and tee.

Basically, I need to image around 100 32gb USB sticks, and I am trying to find ways to maximise efficiency as I go.
I understand the basic dd if=/dev/sdx of=/dev/sdy

I recently came across using tee to enable dding to multiple outputs while still only reading from the source once.
This command works for me in that regard:
dd if=/dev/sdx | tee>(dd of=/dev/sdy) | dd of=/dev/sdz

After much searching I managed to find a Belkin powered 7 port USB hub that I have, and I was hoping to be able to use this to image 7 USBs at the same time using tee as above.

The command I am trying to use is as follows:
dd if="/media/root/Samsung D3/image.dd" | tee >(dd of=/dev/sdc) | tee >(/dev/sdd) | tee >(/dev/sde) | tee >(/dev/sdf) | tee >(/dev/sdg) | dd of=/dev/sdh

However, I get the message:
bash: /dev/sde: Permission denied
bash: /dev/sdf: Permission denied
bash: /dev/sdg: Permission denied
bash: /dev/sdd: Permission denied

If I go back and use dd to copy to one of these "denied" drives, they all work individually, so the USB sticks themselves are not locked up or broken.

I was wondering if anyone could help me in trying to pinpoint the cause of the problem. I am wondering if what I am attempting is just insane, and cannot be done?! Maybe dd cannot handle copying to this many drives full stop?
Alternatively, is it possible that even though the hub is powered by the mains, my computer is still unable to handle the total power needed?
I could not find linux drivers for the hub, but the fact that the drives all show up in fdisk -l makes me think that they are being recognised as they should be?

Any help would be much appreciated! It is nice to find a forum that states its aim as being to be friendly to all comers!

I am happy to provide more information as necessary. (The hub model is F5U307 if that helps.

Thanks in advance,
Chris

Ictcmn · 08-29-2015, 02:14 PM

Oo, also, if anyone had suggestions of better programs or commands to use - I am all ears!

I did try and search for solutions to this problem, and I read the "read first" posts, so hopefully this is all above board!

rknichols · 08-29-2015, 06:13 PM

Quote:

Originally Posted by Ictcmn

For drives d, e, f, and g you have left out the dd process in the redirection, so the shell is trying to execute the device as a command, not write to it. Since the device inode does not have execute permission, you get, "permission denied".

jpollard · 08-30-2015, 05:46 AM

A problem with using tee is that it will not use good buffering practice for device I/O.

You should be able to do "dd if=/dev/sdx | tee /dev/sdy >/dev/sdz" though, and that would write to /dev/sdy and /dev/sdz. Unfortunately, the buffering will be incorrect making the entire thing go SLOW.

If you want it to go fast try the following as a script instead:

Code:

dd if=/dev/sdx bs=1M of=/dev/sdy &
dd if=/dev/sdx bs=1M of=/dev/sdz

The reason this should run faster is that the kernel will buffer the data from /dev/sdx... and if the second dd command gets in there soon enough (it should as it takes a while for a 1MB buffer to fill) then it will actually read the already buffered data - without having to re-read /dev/sdx. This also bypasses the pipes (64k maximum buffer) and repeated copies to/from kernel buffers.

NOTE: all /dev/sdx, /dev/sdy and /dev/sdz ought to be on separate controllers. Internal disks are faster and may not be a problem, but if input and output are on the same, it will take nearly twice as long (if the two output devices are on the same controller it will also take nearly twice as long...). This effect is less noticable with small buffers - but the elapsed time will still be VERY long.

Ictcmn · 08-30-2015, 05:59 AM

Awesome - thanks guys. How did I miss dd off the others :s
With the idea for the script using & would that still work for seven at once? As in, would it still get to the seventh before the buffer filled?
Thanks for the great tip on controllers, I hadn't thought of that, but I can definitely space them out more now.

jpollard · 08-30-2015, 06:06 AM

BTW, the example of using tee for this I found had a different structure (from http://blog.urfix.com/11-awesome-dd-commands/ ):

"dd if=/dev/sda | tee >(dd of=/dev/sdb) >(dd of=/dev/sdc) >(dd of=/dev/sdd) | dd of=/dev/sde"

This still has the problem of multiple small buffers though.

jpollard · 08-30-2015, 06:11 AM

Quote:

Originally Posted by Ictcmn

Awesome - thanks guys. How did I miss dd off the others :s
With the idea for the script using & would that still work for seven at once? As in, would it still get to the seventh before the buffer filled?
Thanks for the great tip on controllers, I hadn't thought of that, but I can definitely space them out more now.

It should. The one case I did see this work (indirectly, the admin was rebuilding a lab) was for installing systems on 14 workstations at once. He said if he managed to get all 14 going within 10-15 seconds (it was network based) then it only took an hour. If not... It took two or three times as long as the buffering flushed and reloaded causing the server to thrash re-reading data.

In the dd case, they should all finish very nearly simultaneously (barring controller contention...)

With tee forcing a synchronization point, they all will finish that way; but I think the elapsed time will be longer owing to the smaller buffers being handled.

syg00 · 08-30-2015, 06:59 AM

As a corollary to that, how much RAM do you have ?.
Best solution, find a machine with more than 32G and just let the first copy run with a large bs on the of. Every other copy will use the data in the page cache - no re-reading (best case).

Else simply adjust the size by using count and skip and seek - that way you keep the active set below say 70 of your actual RAM.
Untested, but seems sane.

jpollard · 08-30-2015, 08:11 AM

Thinking along those lines... If you DO have enough memory, you could even make the original copy into a tmpfs filesystem (/tmp is frequently that) or a ramfs.

After that ALL copies would be from the cache... and yet another controller available for output.