LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 04-16-2011, 09:15 AM   #1
trackpads
LQ Newbie
 
Registered: Apr 2011
Posts: 10

Rep: Reputation: 0
Split a large file to download on a windows machine


Hi,

I am removing some old graphics from my server and one of the gallery programs have created two enormous directories that I cannot even open with FTP.

I tried to tar each directory and the first came out to about 37gb and the second keeps failing (its bigger one would assume).

How can I archive and split these into smaller files? I would sincerely appreciate any help you can give,

Thanks,

-Jason
 
Old 04-16-2011, 09:45 AM   #2
smallpond
Senior Member
 
Registered: Feb 2011
Location: Massachusetts, USA
Distribution: Fedora
Posts: 1,380

Rep: Reputation: 326Reputation: 326Reputation: 326Reputation: 326
Make a list of all files:
Code:
ls > allfiles
split the list into chunks of 1000 files each:
Code:
split -l1000 allfiles CHUNK
Now you have a bunch of files named CHUNKaa, etc.
Code:
for i in CHUNK* ; do tar cf $i.tar -T $i ; done
This has created a tar file for each chunk. If you don't have room to do them all at once, then you can script the loop to tar up one chunk, scp it off to some other storage, delete it, then do the next.
 
1 members found this post helpful.
Old 04-16-2011, 10:13 AM   #3
trackpads
LQ Newbie
 
Registered: Apr 2011
Posts: 10

Original Poster
Rep: Reputation: 0
Awesome,

Tried but for some reason near the end I got a bunch of these errors, did an ls and was able to verify that these files are there:


tar: xyz55823.jpg*: Cannot stat: No such file or directory
tar: xyz55824.jpg*: Cannot stat: No such file or directory
tar: xyz55825.jpg*: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors
trackpad@trackpads.com [~/www/path/to/2658]#
 
Old 04-16-2011, 10:17 AM   #4
trackpads
LQ Newbie
 
Registered: Apr 2011
Posts: 10

Original Poster
Rep: Reputation: 0
Also, for some reason the first chunk file is 23gb and all the subsequent ones are only 10k.
 
Old 04-16-2011, 10:23 AM   #5
stress_junkie
Senior Member
 
Registered: Dec 2005
Location: Massachusetts, USA
Distribution: Ubuntu 10.04 and CentOS 5.5
Posts: 3,873

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Quote:
Originally Posted by trackpads View Post
Awesome,

Tried but for some reason near the end I got a bunch of these errors, did an ls and was able to verify that these files are there:


tar: xyz55823.jpg*: Cannot stat: No such file or directory
tar: xyz55824.jpg*: Cannot stat: No such file or directory
tar: xyz55825.jpg*: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors
trackpad@trackpads.com [~/www/path/to/2658]#
You may need to run fsck on the partition when it is not mounted.

Your other problems may be related to this.
 
Old 04-16-2011, 10:33 AM   #6
jschiwal
Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654Reputation: 654
Firstly, from your title, I'm not certain which machine is Windows. The client or the server. Since is seems you ran tar on the server, is sounds like the client is Windows.

Is it the tar command that failed before it got to the split command. You may simply have some bad files or files that were altered after you started tar.

Tar does have a volume size option, which would allow you to create separate files.

Another option is to pipe the output of tar. For example:
ssh user@server tar -C <directory> czf - . | cat >gallery.tar.gz

You could instead use tar to replicate the files themselves:
ssh user@server tar -C <directory> czf - . | tar -C <restore_directory> xzvf - >logfile

If the client is a windows machine, you could run a live linux distro or Cygwin to run the ssh and tar commands.

The problem you are having is that there are too many files in the directory rather than its size. For example, trying "ls *.jpg" may cause an out of memory error. This is because the wildcard is expanded and sorted by the shell before the command is executed. Using ftp, a list of all of the files in the directory may be produced so it can be sorted as well. Sorting is expensive time wise.

One thing that is often done in the case where there are 10s of thousands of files or more in a directory is to use the find command instead of ls, and to limit the number of arguments to a command using xargs to handle the list produced by find. If you use tar without wild card arguments, you probably won't have the problem I mentioned. Files are added as they are found.
 
1 members found this post helpful.
Old 04-16-2011, 12:07 PM   #7
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Hi,

I have not tried smallpond's solution but it looks ok to me. One thing I would change, though, would be
ls > allfiles
to
ls -1 > allfiles

This way you can assure every tar command will archive 1000 files.

As for the size issue of the chunks, maybe you have some very long filenames? Or it ran out of output filenames and put the rest in the last chunk? But this should have given an error message.
Try this to mimic the split command
Code:
c=0;d=1;while read -r line; do echo "$line">>chunk$c;if [[ $((++d)) > 1000 ]];then ((++c)); d=1;fi;done < allfiles
and let us know if the results are the same.

Finally, the file not found issue. Here is a very far fetched idea:
My guess would be that,
- you put "double-quotes" around "$i" in smallpond's example and
- your 'ls' command is really aliased to 'ls -F' and
- the files in question somehow got falsly execute permissions assigned.

The output of
Code:
alias ls
ls -l xyz55823.jpg
will show if my far fetched assumption holds any truth

Last edited by crts; 04-16-2011 at 12:20 PM. Reason: typos
 
1 members found this post helpful.
Old 04-16-2011, 03:29 PM   #8
smoker
Senior Member
 
Registered: Oct 2004
Distribution: Fedora Core 4, 12, 13, 14, 15, 17
Posts: 2,279

Rep: Reputation: 248Reputation: 248Reputation: 248
From your original errors :
tar: xyz55823.jpg*: Cannot stat: No such file or directory

Is there an existing file named xyz55823.jpg* ?
 
Old 04-17-2011, 02:46 PM   #9
trackpads
LQ Newbie
 
Registered: Apr 2011
Posts: 10

Original Poster
Rep: Reputation: 0
Guys, thanks again,

I got those downloaded but now have an even bigger single files archive.

I am moving from a linux host to a windows host. The file I need to download is my cpanel backup which is about 200gb. It is a tar.gz file. Can I split that into increments of say 500mb and then reassemble them on my windows box?

Thanks again,

-Jason
 
Old 04-18-2011, 06:22 AM   #10
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Quote:
Originally Posted by trackpads View Post
Guys, thanks again,

I got those downloaded but now have an even bigger single files archive.

I am moving from a linux host to a windows host. The file I need to download is my cpanel backup which is about 200gb. It is a tar.gz file. Can I split that into increments of say 500mb and then reassemble them on my windows box?

Thanks again,

-Jason
Hi,

how did you solve the previous problems you encountered, i.e. the '.jpg*: Cannot stat: No such file or directory' issue? Some feedback would be nice for others who might stumble upon the same problem.

As for your new problem, after a quick web search I found this tool:
http://stahlworks.com/dev/index.php?tool=split

the description it sounds promising I have never used this tool before. So if you want to try then make a backup first and do some testruns with smaller dummy-files.
Hope this helps.
 
Old 04-18-2011, 06:40 AM   #11
trackpads
LQ Newbie
 
Registered: Apr 2011
Posts: 10

Original Poster
Rep: Reputation: 0
I ended up using some partial work as well as just making one enourmous archive of my cpanel.

How do you install the sfk thing on linux? I only have shell access, looks nifty if I can get it to work.

Thanks,

-Jason
 
Old 04-18-2011, 07:05 AM   #12
trackpads
LQ Newbie
 
Registered: Apr 2011
Posts: 10

Original Poster
Rep: Reputation: 0
ok, here is another try, it seems to be working,

Use this script to install rar: http://nixcraft.com/shell-scripting/...rar-linux.html

Then rar with a command such as 'rar a -r -v200m forum.rar forum'
 
  


Reply

Tags
file, split


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[quick] trying to split a large file but linux says it's to large steve51184 Linux - General 16 05-06-2008 07:40 AM
how do I split large file by string? khairil Programming 5 04-28-2008 10:37 PM
Split large file into multiples jdozarchuk Linux - Newbie 1 11-04-2004 09:42 AM
split a large mpeg file into two zstingx Linux - General 3 11-06-2003 06:26 PM


All times are GMT -5. The time now is 06:23 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration