LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   What is a command for fast parallel copying of whole directory to two different disks of notebook? (https://www.linuxquestions.org/questions/linux-software-2/what-is-a-command-for-fast-parallel-copying-of-whole-directory-to-two-different-disks-of-notebook-4175583738/)

gdpr004 07-04-2016 10:40 AM

What is a command for fast parallel copying of whole directory to two different disks of notebook?
 
I need command that reads files ONCE and then simultaneously copies them to multiply destinations, namely external USB-HDD and internal HDD of my notebook. Something like "cp -vax /from/here /to/here01 /to/here02" (of course in reality this example doesn't work because "cp" just can't do it). Parallel execution of "cp" doesn't count because in this case command will read files twice. I have many files and complex directory tree to copy, so just figuring out how to simultaneously copy ONE file in different destinations isn't enough.

jefro 07-05-2016 03:35 PM

Hello and welcome to LQ.

If you mean to use this for live state then I doubt it will work as expected. If you just want to limit read on original then maybe look at the ideas here and see if they would do. http://serverfault.com/questions/137...-same-filelist

syg00 07-05-2016 08:15 PM

Quote:

Originally Posted by roboq6 (Post 5570283)
Parallel execution of "cp" doesn't count because in this case command will read files twice.

Just proves once again it's a good thing (operating) system designers take a more holistic view than the users.
What counts is not how often each file/directory is read, but how often the disk is physically accessed.

Kick off two cp in background and let the system handle things appropriately.

jpollard 07-11-2016 07:24 AM

Quote:

Originally Posted by syg00 (Post 5571076)
Just proves once again it's a good thing (operating) system designers take a more holistic view than the users.
What counts is not how often each file/directory is read, but how often the disk is physically accessed.

Kick off two cp in background and let the system handle things appropriately.

Yup. One place would do complete reinstall of a lab containing 15-20 machines (I think it was). As long as all the client systems were booted within a couple of seconds the reinstall only took an hour. If one or two were balky and need to take 5 minutes to get the boot started, it could take up to four hours as everything started thrashing (they found it better to just wait an hour and then do the balky systems in a separate pass).

slackartist 07-15-2016 02:02 PM

i dont understand that, are you talking about a pxe install?.

AwesomeMachine 07-15-2016 10:19 PM

I'm pretty sure the original idea would not actually save much, if any, time. Although, theoretically it could.

jpollard 07-16-2016 05:53 AM

Quote:

Originally Posted by AwesomeMachine (Post 5576745)
I'm pretty sure the original idea would not actually save much, if any, time. Although, theoretically it could.

Nearly none.

Two copies of the same file can be done reading a file ONCE. As long as the complete file will fit in memory cache, the second copy will read the cache... 2/3/4/.. copies in parallel starting at the same time will use the cache only, even if the file is too large for cache. The only time problems show up is when the speed of one of the destinations is REALLY slow, causing a cache reload - and thus starting to thrash the cache.

The DISADVANTAGE of doing the parallel write mentioned is that the copies are now limited to the speed of the slowest device...

pan64 07-16-2016 05:55 AM

what about the command tee:
Code:

cat inputfile | tee out1 out2 .... outn
to OP:
would be nice to tell us if that really helps. It depends on a lot of things, for example the size of the file, cache, i/o speed ....

Shadow_7 07-17-2016 02:33 AM

Any reason there couldn't just be one copy of the file(s) on a network share? I suppose rsync could do what you want, but it's gonna read for each instance. A technicality if you make one copy to a clone and then read the clone multiple times. Although tee would probably more accurately follow your rules. If you consider each pipe as not a read. A series of daisy chained mirrors perhaps if network bandwidth is setup that way, but more of a faster way, than best way. Assuming no breaks in the chain. Pros and Cons whichever route is chosen.

jpollard 07-17-2016 04:43 AM

As long as all the computers start copying in a second or two, it would work just fine.

If a couple are out of sync then the server could start thrashing - depending on the size of the file(s) being copied.

gdpr004 07-17-2016 10:39 AM

Quote:

Originally Posted by syg00 (Post 5571076)
Just proves once again it's a good thing (operating) system designers take a more holistic view than the users.
What counts is not how often each file/directory is read, but how often the disk is physically accessed.

Kick off two cp in background and let the system handle things appropriately.

Okay, it seems like you're right, my OS will just use cache in order to achieve better performance.


All times are GMT -5. The time now is 12:00 PM.