Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Location: Montpellier, France, Europe, World, Solar System
Distribution: Debian Sarge, Fedora core 5 (i386 and x86_64)
According to gzip documentation, I'm affraid it is not possible unless your do this by hand with a text editor:
Originally Posted by gzip man page
If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.
The easiest way to make a compressed archive of individual files in a directory is to use tar with a compress option (either gzip or bzip)
# compress with gzip
tar zcf yourArchiveFile.tar.gz yourDirectory/
# compress with bzip
tar jcf yourArchiveFile.tar.bz2 yourDirectory/
To then decompress the archive:
# decompress all the archive
tar zxf yourArchive.tar.gz
tar jxf yourArchive.tar.bz2
# decompresse a single file or directory in the archive
tar zxf yourArchive.tar.gz pathInArchive/to/file
tar jcf yourArchive.tar.bz2 pathInArchive/to/file
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
I dunno how many files these were or what type, but most files (jpg, executable, doc) have a distinct signature at the start of the file. Maybe you can do something with a hex editor to find the start of the file and extract them.
I've managed to get myself into the same problem, with no alternative methods, and the data is actually quite important.
Thankfully, the types of files in the binary mass are limited to pretty much just png, gif, jpg, flv, zip, and exe, so aside from the exe, each should have an easily read legible header, and some even have a terminating signature! That said, I can't find any prescripted methods or programs to extract them all at once (I don't care about the file names just yet), and there are too many files (50,000) to extract by hand in a reasonable amount of time. I'm given some hope, though, when basic image viewing software is able to identify the first image and only the first image in the cat'd binary mass, because it means that routines exist to fully identify the beginning and end of the contained files.
Does anyone have any helpful suggestions on programs, scripts, or routines to use?
If you have merged the gzip file, you can split them quite reliably. I'd personally write a small C program, that splits the input into gzip chunks -- the four bytes in the header are the time stamp, so you have a ten-byte header to detect --, and tests using gzip -t if the file is complete or needs more chunks. This is because you do get every file boundary correct, but you may occasionally split at data too; using gzip to test if the data file is complete handles that case beautifully. If this is your case, let me know; I can probably whip up such a program quite easily. If you used normal gzip options, then using gunzip -N on each of the split files will restore the original file names and time stamps.
However, if you have first concatenated all the files, and then gzipped the one bulk data chunk, the situation is much more complex. You need to do something like the above, but separately for each supported file type. If most of the files are JPEG, GIF and PNG images, you could pick them out from the data first, and then worry about the leftovers. If you are lucky, the leftover files were surrounded by images, so their boundaries are easier to find. Obviously, I'd write a program for this too; it would be pretty silly to try to do it by hand. However, in this case the program would be much more complex, simply due to the larger number of file types. Images and zip files are easy to check for completeness -- there are libraries one could incorporate for this --, but FLV and EXE files are a pain.
Thanks for the advice. I'm pretty sure I did the latter... It's pretty ugly. Between now and the time I asked the question, I found some forensic software, scalpel, that does a pretty good job of finding headers and footers of many known file types (gif, png, jpg, zip included), and extracting all possibilities (Which includes false positives).
This is no easy task, and it's probably going to be multi-step... First extract the file possibilties, then verify each file to make sure it's a valid jpg/png/gif/zip/pdf. Good times.