LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-29-2013, 06:23 AM   #1
bkarthick
Member
 
Registered: Apr 2009
Posts: 56

Rep: Reputation: 0
Need Help for coping huge files


Hi All,

My manager has given below task to complete in one week.So i need your help.

"For promoting contents which are huge in size from one Unix server to another a tool/script will have to be developed which will take a backup and push the contents between servers. You need not develop the tool, but you will have to provide a detailed document on how this script can be developed, scenarios where this can be utilized and what are the tools that are available in the market which serves this purpose"
 
Old 07-29-2013, 06:49 AM   #2
pan64
LQ Guru
 
Registered: Mar 2012
Location: Hungary
Distribution: debian i686 (solaris)
Posts: 8,133

Rep: Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273Reputation: 2273
it depends on the devices, network and some related things. Probably you can try to read the history of rsync (see: http://www.samba.org/~tridge/phd_thesis.pdf). But you can find other backup tools too (not to speak about a single scp command)
 
Old 07-29-2013, 07:34 AM   #3
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,130
Blog Entries: 2

Rep: Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825
For huge files I would always prefer rsync over scp, since it can resume interrupted copies, so that you don't have to start from the beginning.
 
1 members found this post helpful.
Old 07-29-2013, 09:30 AM   #4
TB0ne
LQ Guru
 
Registered: Jul 2003
Location: Birmingham, Alabama
Distribution: SuSE, RedHat, Slack,CentOS
Posts: 17,961

Rep: Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693Reputation: 3693
Quote:
Originally Posted by bkarthick View Post
Hi All,
My manager has given below task to complete in one week.So i need your help.

"For promoting contents which are huge in size from one Unix server to another a tool/script will have to be developed which will take a backup and push the contents between servers. You need not develop the tool, but you will have to provide a detailed document on how this script can be developed, scenarios where this can be utilized and what are the tools that are available in the market which serves this purpose"
So..your manager gave you a job, and you'd like US to do it for you?? How about telling us what you've come up with already, and show us what effort you've put forth of your own?

rsync and scp have already been mentioned, but you don't provide sufficient details for anyone to give you much more. Are you using a SAN? Bandwidth between the servers? Do you already have a backup system in place, and if so, why can't you just use it? Considered DRDB for that content? What do you consider 'huge in size'??
 
Old 07-29-2013, 09:40 AM   #5
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,836

Rep: Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360
If I can get my hands on the design document, I'm sure I could whip something up real quick. But, honestly, I'd probably feel like I should be paid for it.

Its not often that people actually come out and say "I have X to do at work, what do i do?"

It might be time to re-evaluate your position if copying data from A to B is going to be a difficult project for you.
 
1 members found this post helpful.
Old 07-29-2013, 12:10 PM   #6
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,604

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
It depends on the definition of "huge".

One problem I have seen with rsync is that it has to first scan the directory tree for new files... before starting even the first file.

When you have 50 million files to scan... it can take several days before it even starts.

Now copying a few 100-200GB files is not that hard. Copying 50,000 might be..
 
Old 07-29-2013, 12:25 PM   #7
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,836

Rep: Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360
Rsync is nice, as you mentioned. But for a "smaller" set of files.

If you need to copy 50,000,000 files to another machine, the fastest way I know to do it is with 'dd', 'netcat' and 'bzip2' but there is a lot that goes into doing it, and the circumstances have to be made just right.
 
Old 07-29-2013, 01:21 PM   #8
TobiSGD
Moderator
 
Registered: Dec 2009
Location: Germany
Distribution: Whatever fits the task best
Posts: 17,130
Blog Entries: 2

Rep: Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825Reputation: 4825
Quote:
Originally Posted by szboardstretcher View Post
Rsync is nice, as you mentioned. But for a "smaller" set of files.

If you need to copy 50,000,000 files to another machine, the fastest way I know to do it is with 'dd', 'netcat' and 'bzip2' but there is a lot that goes into doing it, and the circumstances have to be made just right.
In that case I may be a good idea to use ssh together with tar and [insert favorite compression command here], if encrypted transmission is necessary. netcat transmits the data unencrypted, AFAIK.
 
1 members found this post helpful.
Old 07-29-2013, 01:27 PM   #9
szboardstretcher
Senior Member
 
Registered: Aug 2006
Location: Detroit, MI
Distribution: GNU/Linux systemd
Posts: 3,836

Rep: Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360Reputation: 1360
Quote:
Originally Posted by TobiSGD View Post
In that case I may be a good idea to use ssh together with tar and [insert favorite compression command here], if encrypted transmission is necessary. netcat transmits the data unencrypted, AFAIK.
TobiSGD is 100% Correct. Netcat is unencrypted. But far far FASTER than encrypted. I use it because I have a closed network, and nothing above 'sensitive' as far as information goes. If I were sending over the tubes I would *at least* use SSH.

If you want to tar a zip over ssh, it's certainly an option. But it will be slower.

SSH supports its own compression out of the box, as well.

Last edited by szboardstretcher; 07-29-2013 at 01:29 PM.
 
Old 07-29-2013, 01:48 PM   #10
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,604

Rep: Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241Reputation: 1241
That is why we need more information before making any full recommendations. One of the issues is "push"... does this mean that it needs to be done more than once? Is is a true backup being copied, or just copying files to multiple servers... Is NFS connected between them? How many files, how large, how often?

I did develop a perl script to migrate from one 12 TB filesystem to a 16TB system. Not exactly fast, but there were other considerations (not changing the access time for one), and the need to sync the two while online... rsync was too slow, and some of the files could change while copying, backup/restore too slow (single network connection for the entire thing) and the need to search for new files faster than they could be created... My system had NFS mounts to both servers, so I could use a multi-threaded search (45 minutes for scanning both filesystems with 12 threads, with no updates), and a couple of threads doing nothing but copying files identified by the first 12. And it had a checkpoint/restart feature.

It didn't try to resume file copies, but the individual files were small enough (5-10M) that it didn't matter. What did matter was resuming the search threads (and the list of identified files). The NFS servers DID have to be tuned for this (I ended up using 64 NFS daemons to keep things busy).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Time of coping files sluge Linux - Newbie 1 08-30-2012 07:48 AM
[SOLVED] Data lost while coping files smilemukul Linux - Newbie 6 12-13-2010 04:26 PM
Automake coping files to build directory paulmeacham Programming 0 08-27-2009 09:54 AM
coping or moving files to my pc using putty micro_xii Linux - Newbie 3 05-21-2007 03:00 AM
Permission Denied while coping files busaussie Linux - General 5 01-05-2006 12:26 PM


All times are GMT -5. The time now is 06:18 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration