LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices

Reply
 
LinkBack Search this Thread
Old 02-24-2009, 09:23 PM   #1
dman65
Member
 
Registered: Sep 2003
Posts: 61

Rep: Reputation: 15
Copying large numbers of files


I am currently in the process of trying to migrate data from an older Linux server to a new one with the double the hard drive space and a faster processor.

For the past two weeks I have been wrestling with whether my issue was that I was using the wrong software or I had a hardware problem and the more things I try the more confusing it becomes.

The first thing I tried was copying over a 100BaseT ethernet connection using a crossover cable. I tried rsync. I tried mounting the remote system using samba and then using cp and star. The most throughput I was ever able to get was 4GB an hour or about 1MB/sec. I have 750GB in 2.5 million files to transfer, so I decided to try something else and I went and purchased an external USB drive. I tested this on a Windows XP machine and I was able to copy 44GB an hour. I then hooked it up to the Linux machine and reformatted using 4 different file systems. The fastest transfer I was able to get was 10GB an hour on the Linux system. That included using star with the -no -fsync option. I assumed that there was just something wrong with the usb drivers in that version of Linux and I went and purchased a 500GB IDE drive to stick in the old server with plans to copy the 750GB in chunks and then install the drive in the new server and copy the files over and then take it back to the old server. Unfortunately, I am currently getting about 3.5GB an hour on that drive.

Can anyone give me some ideas as to how I can get this data to copy faster? I am really wanting to do a full migration of this server over a weekend and with it taking an hour to transfer 10GB on the fastest method I have found that just won't happen, especially considering that after transferring to the external drive the data from that drive will then have to be uploaded to the new server.
 
Old 02-24-2009, 11:02 PM   #2
jschiwal
Moderator
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,263

Rep: Reputation: 562Reputation: 562Reputation: 562Reputation: 562Reputation: 562Reputation: 562
Are you just copying data. Some directories you don't want to copy or backup. E.G. /proc, /sys, /tmp, /media, /mnt.

I think it would be better using nfs instead of samba for mounting the destination directory on the new server. The tar info manual has an example of copying a partition by piping the output of tar to another tar command.
Here is an example:
tar -C / -cf - /var /home /usr | tar -C /mnt/rootdir/ -xf -

The dash for the filename is what allows you to pipe the stream via stdout | stdin.

Rather than mounting a filesystem, you could use ssh or netcat (nc) for transferring the files over a crossover cable.

Using ssh, it works best if you have public key authentication setup.

Here is an example of using tar to transfer files over an ssh tunnel:
Code:
cd Documents/; tar -C ~/Documents/ -cf - amiga* | ssh hpmedia tar -C Documents/ -xvf -
amiga-computer made of snow.jpg
amiga computer survivor.jpg
The initial cd command was only needed because I used a wild card instead of a directory target (It's expanded before cd'ing to Documents). (I didn't want to demonstrate by copying an entire directory.)

You may want to check if all network traffic is that slow on the new server. Do you maybe have two nic devices configured for bonding?

Since you are connecting via a cross-over cable, you could use rsh instead of ssh. Be sure to uninstall it after transferring the files.
 
Old 02-24-2009, 11:33 PM   #3
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 217

Rep: Reputation: 42
Could it be the disk I/O on your old server? Been a while, so my memory is a little fuzzy, but I did find a bookmark to an old O'Reilly article about hdparm. IIRC my disk reads were in the 4-5 MB/sec range before tweaking the drive settings as this article recommends.

This is what the drive gets now:
Code:
root: hdparm -t /dev/hdb

/dev/hdb:
 Timing buffered disk reads:  130 MB in  3.02 seconds =  43.02 MB/sec
Settings:
Code:
root: hdparm  /dev/hdb

/dev/hdb:
 multcount     = 16 (on)
 IO_support    =  1 (32-bit)
 unmaskirq     =  1 (on)
 using_dma     =  1 (on)
 keepsettings  =  0 (off)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 16383/255/63, sectors = 120103200, start = 0
As for the author's disclaimer, I changed my drive settings a long time ago and I've never had any problems.

HTH

Norm

Last edited by norobro; 02-25-2009 at 08:38 AM. Reason: correct spelling
 
Old 02-25-2009, 08:38 AM   #4
dman65
Member
 
Registered: Sep 2003
Posts: 61

Original Poster
Rep: Reputation: 15
Thanks Norm.

I have checked hdparm on the drives I am using and they all show acceptable, though not stellar, numbers. For example:
Code:
docserver1:/mnt/removeable # hdparm -tT /dev/hdd

/dev/hdd:
 Timing cached reads:   1440 MB in  2.00 seconds = 718.31 MB/sec
 Timing buffered disk reads:  172 MB in  3.03 seconds =  56.85 MB/sec
Even at half of that speed I should be able to move 100GB an hour.

Hello jschiwal,

I am basically trying to just move a data directory. Mostly compressed black and white tiff images. I'm not moving any system or settings files, just the scanned images.

I will look into the nfs and rsh options.
 
Old 02-25-2009, 12:57 PM   #5
dman65
Member
 
Registered: Sep 2003
Posts: 61

Original Poster
Rep: Reputation: 15
I have been working on this some more today. I created an NFS connection between the existing server and the soon to be server and I have been trying to run star in copy mode to copy the files to the new server.

It started out copying data across the crossconnect wire at about 6GB an hour and then it just ground to a halt. I went in and watched top for a while and then ran vmstat.

It seems that whenever star is run it creates two processes. One of the processes seems to stay in an "uninterruptible sleep" and this is the one that shows up most often in top. The other process seems to be the one that actually does the copying since whenever it appears more data is copied to the new server.

Here is what I am getting:
Code:
procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in    cs us sy id wa
 1  0   7576   1776  88472 262204    0    0     0     9 1179   102  0  0 100  0
 0  0   7576   1528  88524 262384    0    0     0    19 1148    88  0  0 99  0
 0  0   7576   2216  88540 261740    0    0     0    10 1157    90  0  0 100  0
 0  0   7576   2088  88556 261820    0    0     0     5 1070    47  0  0 100  0
 0  0   7576   1840  88572 262064    0    0     0     5 1171    95  0  0 99  0
 0  0   7576   1712  88588 262184    0    0     0     5 1088    57  0  0 99  0
 0  0   7576   1460  88604 262384    0    0     0     6 1140    81  0  0 100  0
 0  0   7576   2096  88644 261780    0    0     0    19 1195    99  0  1 99  0
 0  0   7576   1840  88696 261992    0    0     0    19 1160    88  0  0 99  0
 0  0   7576   1512  88716 262256    0    0     0     8 1171    95  0  0 99  0
 0  0   7576   2224  88728 261536    0    0     0     5 1154    86  0  0 100  0
 0  0   7576   1568  88792 261752    0    0   593    94 1350   244  0  1 87 12

docserver1:/mnt/removeable # top
top - 14:44:01 up 18:45,  5 users,  load average: 1.01, 1.01, 1.00
Tasks:  79 total,   1 running,  78 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0% us,  0.0% sy,  0.0% ni, 100.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:    515640k total,   513920k used,     1720k free,    88448k buffers
Swap:  1028120k total,     7576k used,  1020544k free,   261568k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 7290 root       0 -20 10472 9024 1876 D  0.3  1.8   0:02.83 star
 7403 root      18   0  1968  940 1764 R  0.3  0.2   0:00.12 top
    1 root      16   0   596  148  452 S  0.0  0.0   0:01.70 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.16 ksoftirqd/0
    3 root       5 -10     0    0    0 S  0.0  0.0   0:00.10 events/0
    4 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 khelper
    5 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 netlink/0
    6 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 kacpid
   24 root       5 -10     0    0    0 S  0.0  0.0   0:05.52 kblockd/0
   34 root      16   0     0    0    0 S  0.0  0.0   0:13.44 pdflush
   35 root      15   0     0    0    0 S  0.0  0.0   0:08.93 pdflush
   37 root       6 -10     0    0    0 S  0.0  0.0   0:00.00 aio/0
   36 root      15   0     0    0    0 S  0.0  0.0   0:51.58 kswapd0
  620 root      18   0     0    0    0 S  0.0  0.0   0:00.00 kseriod
 1509 root      25   0     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
 1554 root       5 -10     0    0    0 S  0.0  0.0   0:02.13 reiserfs/0
 2136 root      25   0     0    0    0 S  0.0  0.0   0:00.19 khubd



  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 6130 Traci.He  16   0  9640 4264 8264 S  1.0  0.8   0:10.51 smbd
 6075 nobody    16   0  9592 3924 8220 S  0.3  0.8   0:09.53 smbd
 7290 root       0 -20 10472 9028 1876 D  0.3  1.8   0:02.99 star
 7291 root       3 -20 10732 9208 1876 S  0.3  1.8   0:06.31 star
    1 root      16   0   596  148  452 S  0.0  0.0   0:01.70 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.16 ksoftirqd/0
    3 root       5 -10     0    0    0 S  0.0  0.0   0:00.10 events/0
    4 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 khelper
    5 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 netlink/0
    6 root       5 -10     0    0    0 S  0.0  0.0   0:00.00 kacpid
   24 root       5 -10     0    0    0 S  0.0  0.0   0:05.52 kblockd/0
   34 root      16   0     0    0    0 S  0.0  0.0   0:13.44 pdflush
   35 root      15   0     0    0    0 S  0.0  0.0   0:08.94 pdflush
   37 root       6 -10     0    0    0 S  0.0  0.0   0:00.00 aio/0
   36 root      15   0     0    0    0 S  0.0  0.0   0:51.58 kswapd0
  620 root      18   0     0    0    0 S  0.0  0.0   0:00.00 kseriod
 1509 root      25   0     0    0    0 S  0.0  0.0   0:00.00 scsi_eh_0
I am a little confused since the uninterruptible sleep is supposed to happend when a process is waiting on i/o from what I have read, but the vmstat output is showing that the cpu is spending the majority of its time in the idle state rather than the i/o wait state.
 
Old 02-26-2009, 05:27 AM   #6
jschiwal
Moderator
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,263

Rep: Reputation: 562Reputation: 562Reputation: 562Reputation: 562Reputation: 562Reputation: 562
What is the filesystem of the destination? Did you run out of inodes there? What command did you use to copy the files?
 
Old 02-26-2009, 07:07 AM   #7
dman65
Member
 
Registered: Sep 2003
Posts: 61

Original Poster
Rep: Reputation: 15
Hello jschiwal,

I used star -copy /LocalDirectory . /mnt/nfs_directory
The remote machine is using the XFS file system. It isn't out of inodes.

On a positive note, I started the tar over ssh last night and I am getting 8GB to 12GB an hour over the crossover cable. At least that should finish copying everything some time Sunday afternoon and hopefully I can do an rsync then that will only take a couple of hours and I can finish up the project.
 
Old 02-26-2009, 06:01 PM   #8
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 217

Rep: Reputation: 42
dman65,

I hadn't heard of "star" before your post so I had to try it out. I'm experiencing the same issues as you just copying from one disk to another on the same machine. I have played around with the different switches and noticed only slight increases in throughput. For example:
Code:
star -fifostats  fs=256x1m bs=64x8k -c . | star -xp -C /mnt/hda7/kde-devel
This directory is almost 2.4 GB in size and took 14 minutes to copy.

Anyway, if you find a solution I, for one, would appreciate a post back.

Norm
 
Old 02-26-2009, 07:32 PM   #9
dman65
Member
 
Registered: Sep 2003
Posts: 61

Original Poster
Rep: Reputation: 15
Hello Norm,

If I come up with anything I will let you know. I found out about 'star' when I googled copying large numbers of files and linux. There were a few posts on different mailing list that said it was the fastest way to copy files. That is definitely not what I have experienced.

I have had good results with tunneling tar through an ssh connection to another machine. At this point I am just letting this job run until it finishes. I think it should be finished Saturday afternoon. I am thinking about using tar to try to do a local copy and see what kind of results I get writing to a USB drive as well as an EIDE drive.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Best way to organize large numbers of files. dman65 Linux - General 4 03-12-2008 07:42 PM
Copying large files to FAT32 sketch7 Linux - General 11 10-22-2007 10:13 AM
LXer: Using 'ls' and 'xargs' to manage large numbers of files LXer Syndicated Linux News 0 02-15-2007 06:16 AM
changing SOAs in named with large numbers of zone files untoldone Linux - Networking 3 08-17-2004 02:35 PM


All times are GMT -5. The time now is 11:07 AM.

Main Menu
 
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: @linuxquestions
Open Source Consulting | Domain Registration