LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Split a large file to download on a windows machine (https://www.linuxquestions.org/questions/linux-newbie-8/split-a-large-file-to-download-on-a-windows-machine-875320/)

trackpads 04-16-2011 09:15 AM

Split a large file to download on a windows machine
 
Hi,

I am removing some old graphics from my server and one of the gallery programs have created two enormous directories that I cannot even open with FTP.

I tried to tar each directory and the first came out to about 37gb and the second keeps failing (its bigger one would assume).

How can I archive and split these into smaller files? I would sincerely appreciate any help you can give,

Thanks,

-Jason

smallpond 04-16-2011 09:45 AM

Make a list of all files:
Code:

ls > allfiles
split the list into chunks of 1000 files each:
Code:

split -l1000 allfiles CHUNK
Now you have a bunch of files named CHUNKaa, etc.
Code:

for i in CHUNK* ; do tar cf $i.tar -T $i ; done
This has created a tar file for each chunk. If you don't have room to do them all at once, then you can script the loop to tar up one chunk, scp it off to some other storage, delete it, then do the next.

trackpads 04-16-2011 10:13 AM

Awesome,

Tried but for some reason near the end I got a bunch of these errors, did an ls and was able to verify that these files are there:


tar: xyz55823.jpg*: Cannot stat: No such file or directory
tar: xyz55824.jpg*: Cannot stat: No such file or directory
tar: xyz55825.jpg*: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors
trackpad@trackpads.com [~/www/path/to/2658]#

trackpads 04-16-2011 10:17 AM

Also, for some reason the first chunk file is 23gb and all the subsequent ones are only 10k.

stress_junkie 04-16-2011 10:23 AM

Quote:

Originally Posted by trackpads (Post 4326353)
Awesome,

Tried but for some reason near the end I got a bunch of these errors, did an ls and was able to verify that these files are there:


tar: xyz55823.jpg*: Cannot stat: No such file or directory
tar: xyz55824.jpg*: Cannot stat: No such file or directory
tar: xyz55825.jpg*: Cannot stat: No such file or directory
tar: Error exit delayed from previous errors
trackpad@trackpads.com [~/www/path/to/2658]#

You may need to run fsck on the partition when it is not mounted.

Your other problems may be related to this.

jschiwal 04-16-2011 10:33 AM

Firstly, from your title, I'm not certain which machine is Windows. The client or the server. Since is seems you ran tar on the server, is sounds like the client is Windows.

Is it the tar command that failed before it got to the split command. You may simply have some bad files or files that were altered after you started tar.

Tar does have a volume size option, which would allow you to create separate files.

Another option is to pipe the output of tar. For example:
ssh user@server tar -C <directory> czf - . | cat >gallery.tar.gz

You could instead use tar to replicate the files themselves:
ssh user@server tar -C <directory> czf - . | tar -C <restore_directory> xzvf - >logfile

If the client is a windows machine, you could run a live linux distro or Cygwin to run the ssh and tar commands.

The problem you are having is that there are too many files in the directory rather than its size. For example, trying "ls *.jpg" may cause an out of memory error. This is because the wildcard is expanded and sorted by the shell before the command is executed. Using ftp, a list of all of the files in the directory may be produced so it can be sorted as well. Sorting is expensive time wise.

One thing that is often done in the case where there are 10s of thousands of files or more in a directory is to use the find command instead of ls, and to limit the number of arguments to a command using xargs to handle the list produced by find. If you use tar without wild card arguments, you probably won't have the problem I mentioned. Files are added as they are found.

crts 04-16-2011 12:07 PM

Hi,

I have not tried smallpond's solution but it looks ok to me. One thing I would change, though, would be
ls > allfiles
to
ls -1 > allfiles

This way you can assure every tar command will archive 1000 files.

As for the size issue of the chunks, maybe you have some very long filenames? Or it ran out of output filenames and put the rest in the last chunk? But this should have given an error message.
Try this to mimic the split command
Code:

c=0;d=1;while read -r line; do echo "$line">>chunk$c;if [[ $((++d)) > 1000 ]];then ((++c)); d=1;fi;done < allfiles
and let us know if the results are the same.

Finally, the file not found issue. Here is a very far fetched idea:
My guess would be that,
- you put "double-quotes" around "$i" in smallpond's example and
- your 'ls' command is really aliased to 'ls -F' and
- the files in question somehow got falsly execute permissions assigned.

The output of
Code:

alias ls
ls -l xyz55823.jpg

will show if my far fetched assumption holds any truth :)

smoker 04-16-2011 03:29 PM

From your original errors :
tar: xyz55823.jpg*: Cannot stat: No such file or directory

Is there an existing file named xyz55823.jpg* ?

trackpads 04-17-2011 02:46 PM

Guys, thanks again,

I got those downloaded but now have an even bigger single files archive.

I am moving from a linux host to a windows host. The file I need to download is my cpanel backup which is about 200gb. It is a tar.gz file. Can I split that into increments of say 500mb and then reassemble them on my windows box?

Thanks again,

-Jason

crts 04-18-2011 06:22 AM

Quote:

Originally Posted by trackpads (Post 4327269)
Guys, thanks again,

I got those downloaded but now have an even bigger single files archive.

I am moving from a linux host to a windows host. The file I need to download is my cpanel backup which is about 200gb. It is a tar.gz file. Can I split that into increments of say 500mb and then reassemble them on my windows box?

Thanks again,

-Jason

Hi,

how did you solve the previous problems you encountered, i.e. the '.jpg*: Cannot stat: No such file or directory' issue? Some feedback would be nice for others who might stumble upon the same problem.

As for your new problem, after a quick web search I found this tool:
http://stahlworks.com/dev/index.php?tool=split

the description it sounds promising I have never used this tool before. So if you want to try then make a backup first and do some testruns with smaller dummy-files.
Hope this helps.

trackpads 04-18-2011 06:40 AM

I ended up using some partial work as well as just making one enourmous archive of my cpanel.

How do you install the sfk thing on linux? I only have shell access, looks nifty if I can get it to work.

Thanks,

-Jason

trackpads 04-18-2011 07:05 AM

ok, here is another try, it seems to be working,

Use this script to install rar: http://nixcraft.com/shell-scripting/...rar-linux.html

Then rar with a command such as 'rar a -r -v200m forum.rar forum'


All times are GMT -5. The time now is 06:29 AM.