Linux script Limiting Zip archive size
Hi Everyone:
I am trying to zip a large amount of wav files through a script. However, this zip file needs to be uploaded to a business partners FTP server, which limits files to 2gig max. They have things set up so that it has to be a zip file specifically. Is there a way to script to take the contents of a folder, zip all the files into one zip file unless it is more than 2 gigs, in which case break it into 2 files, or more if necessary. I cannot use the split archives for this as each file needs to be stand alone. Thanks everyone |
zipsplit splits a zip file into multiple zip files and you can specify maximum size with the -n option.
http://wiki.linuxquestions.org/wiki/Zipsplit |
I looked into zipsplit. It appears that zipsplit creates files which are dependent on each other (ie. file1.ro1, file2.ro2, etc) where all of the pieces are required to be present before the file can be unzipped.
Am I incorrect on this? The problem with that, if it is true, is that their system unzips a file as soon as it completed uploading. At that point, the next file upload overwrites the prior file as they all are required to be named the same. If I am confused, please clear me up! :) |
No - the zip files created by zipsplit are independent of each other.
As a test I created a zip file of a directory where I keep various scripts and binaries for personal use: Code:
zip bin.zip * I then ran the following to break up the single zip into multiples: Code:
zipsplit -n 23000 bin.zip 4 zip files will be made (100% efficiency) creating: bin1.zip creating: bin2.zip creating: bin3.zip creating: bin4.zip I then did an scp of bin2.zip to another server and ran unzip there. This extracted the files in that zip file without asking about any of the others. I then did scp of bin4.zip and unzipped it with the same results. I then did it for bin1.zip then for bin3.zip and in all four cases the files unzipped cleanly without asking for any of the others. (As noted when I did bin2.zip there were no others on the system.) |
The man page for zipsplit says it does not support files over 2G in size.
Quote:
You can pre-group your wav files beforehand, and then zip them separately. They are unlikely to compress much, so grouping each 1800 M to 2 G of wav files to be zipped in one archive should work. Here is a simple Bash script you might start with: Code:
#!/bin/bash This script will also never use the Zip64 extensions for large archives, even when the directory contents are several gigabytes (at least if the size limit is < 2G). The resulting archives should be unpackable with even old PKZIP programs. (If I remember correctly, the old ones may choke with zipsplit archives.) If you have wildly different file sizes, you should consider writing an awk script, which uses one of the greedy algorithms for solving the bin packing problem (to efficiently decide which file should go in which zip archive). Similar problem is encountered when backing up large directories to write-once media (DVD-R discs, for example). If you need each zip file to be as close to the limit as possible, you could use the zip -O option to add each file to the archive without overwriting the old one; if the result is smaller than the limit, then try adding the next file to it. However, since in the worst case you'd copy almost 2G (the archive size before adding the current file -- remember, it must keep the old archive intact in case the limit is exceeded) each time you add one file, it would be quite slow at times. I hope this gets you started, |
The man page is cryptic on this. The project site's FAQ seems to suggest the limitation is on files within the archive. That is to say I'm not sure if its limitation is on zip files larger than 2 GB or files contained with in the zip larger than 2 GB or both.
It can't hurt to try it if you don't have any files larger than 2 GB you are zipping up or if the zip is less than 2 GB anyway. I'll have to admit I'm surprised by such a limitation on the zip file size itself. It seems the most likely users of zipsplit would be those with large zip files and these days 2 GB is nothing. |
Ok, so I am attempting to use zipsplit. However, every time I try this command:
Quote:
zipsplit warning: Entry is larger than max split size of : zipsplit warning: use -n to set split size zipsplit error: Entry too big to split, read, or write I have entered in multiple sizes, and nothing has work. In addition, I tried to remove the -n 23000 from the command, and still received the same error. This is a 43MB test file. Any suggestions? |
Why not use the nice looking script posted above by Nominal Animal ?
|
bin.zip was an arbitrary name I chose for my file because it was zip of a bin directory.
The errors are essentially blank presumably because it can not find a file named bin.zip. Odd though - I'd have thought they'd tell you file not found rather than do all that. You need to run the command on the zip filename YOU created. Since you seem to have created it frequently prior to posting I'm assuming it is NOT named bin.zip (or it is one hell of a coincidence if it is). The generic syntax for what I did is: zipsplit -n <size> <filename> Where you substitute the size in bites for <size> and filename of YOUR zip file for <filename>. |
Quote:
Quote:
The limitations for zipsplit are due to the fact that it is a very old 32-bit format, and you can only describe lengths of up to 2147483647 bytes exactly using a signed 32-bit*integer. (Where unsigned integers are used, the limit is of course 4G, or 4294967295 bytes). To overcome the limitations, Zip64 extensions were developed, using 64 bit integers (theoretically, 9223372036854775807 byte limit for signed 64-bit integers). However, zipsplit does not support those extensions. (I suspect there may be a technical reason, since the man page indicates zipsplit uses different extensions for splitting the zip archive; if those extensions are not 64-bit, then you cannot support 64-bit archives. It seems that PKZIP has only relatively recently grown support for zipsplit extensions, too.) Quote:
|
@Mensawater:
I was not actually using the bin.zip in the actually command, but did not want to specifically post the name of the file. It was still giving the error. I have done some reading and it appears that the newer versions of Ubuntu have a bug regarding zipsplit. @Nominal Animal I would love to use your script. I am trying to understand it, as it will be connecting to other scripts in sequence. It is a bit more complex than I am used to at this point, so any help that you could provide in understanding how it works would be great. |
Here's the script I listed above explained part by part.
If the script is called with less than three arguments, or the first argument is -h or --help, the script outputs some usage information to standard error: Code:
#!/bin/bash Code:
# Base name of the zip archive to create Code:
# Maximum size for input files for each archive Code:
shift 2 Code:
find "$@" -type f -print0 | ( Code:
while read -d "" FILE ; do Code:
NEWTOTAL=$[ SIZE + TOTAL ] Code:
if [ ${#FILES[@]} -lt 1 ] || [ $NEWTOTAL -le $MAXTOTAL ]; then Code:
INDEX=$[ INDEX + 1 ] Code:
FILES=("$FILE") Code:
done Code:
if [ ${#FILES[@]} -gt 0 ]; then Code:
elif [ $INDEX -eq 0 ]; then Code:
echo "" >&2 Code:
else Use the subshell exit status for the entire script. If the subshell succeeded, the script will also return success. Code:
) |
Wow! I really appreciate the clarification on that Nominal Animal! The script works flawlessly. I have saved it into a script file. Can I just call it from another script with the line ./nominalanimal.sh Recordings 2G ./ ? That line should provide all needed properties correct?
Don't answer that, I will figure it out myself! :) Thanks again for all of your help. |
Ok, a couple things just came up.
1. I ran this script on 5g worth of files. It split this up into three different zip files. -1 was 1.3G, -2 was 800MB, and -3 was 2.39GB. I am not sure why it did that. Any thoughts? 2. Can I run this for another directory? For instance, if I put in ./nominalanimal.sh /home/foo/bar/recordings will it create the zip files in the specified directory? And will it zip the files contained within that directory? 3. Can this script be modified so that once a file is added to the zip, the original can be moved to a different directory? |
It depends on the type of files. As stated binary files like audio will not compress much. Since zip files contain additional information by compressing many binary files the actual result might be bigger then the original source. And since the final zip file size is a bit of an unknown you will need to adjust max size parameter.
|
All times are GMT -5. The time now is 08:08 PM. |