LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices

Reply
 
Search this Thread
Old 07-12-2006, 03:53 AM   #1
dandini
LQ Newbie
 
Registered: Jul 2006
Posts: 3

Rep: Reputation: 0
recover individual files concatenated with gzip


I dumbly managed to gzip the contents of a directory into one file - so although I can decompress it, it comes out as one long concatenated file.

Does anyone have any idea how and if I might be able to extract this as the original individual files?

Many thanks,

Dan
 
Old 07-12-2006, 04:16 AM   #2
zeitounator
Member
 
Registered: Aug 2003
Location: Montpellier, France, Europe, World, Solar System
Distribution: Debian Sarge, Fedora core 5 (i386 and x86_64)
Posts: 262

Rep: Reputation: 30
According to gzip documentation, I'm affraid it is not possible unless your do this by hand with a text editor:
Quote:
Originally Posted by gzip man page
If you wish to create a single archive file with multiple members so that members can later be extracted independently, use an archiver such as tar or zip. GNU tar supports the -z option to invoke gzip transparently. gzip is designed as a complement to tar, not as a replacement.
The easiest way to make a compressed archive of individual files in a directory is to use tar with a compress option (either gzip or bzip)
Code:
# compress with gzip
tar zcf yourArchiveFile.tar.gz yourDirectory/
# compress with bzip
tar jcf yourArchiveFile.tar.bz2 yourDirectory/
To then decompress the archive:
Code:
# decompress all the archive
tar zxf yourArchive.tar.gz
tar jxf yourArchive.tar.bz2
# decompresse a single file or directory in the archive
tar zxf yourArchive.tar.gz pathInArchive/to/file
tar jcf yourArchive.tar.bz2 pathInArchive/to/file
 
Old 07-12-2006, 06:23 AM   #3
dandini
LQ Newbie
 
Registered: Jul 2006
Posts: 3

Original Poster
Rep: Reputation: 0
Thanks for your response.

It sounds as though I'm fried then. I am an idiot of the highest order.

Thanks again,

Dan
 
Old 07-12-2006, 07:08 AM   #4
jlinkels
Senior Member
 
Registered: Oct 2003
Location: Bonaire
Distribution: Debian Wheezy/Jessie/Sid, Linux Mint DE
Posts: 4,234

Rep: Reputation: 545Reputation: 545Reputation: 545Reputation: 545Reputation: 545Reputation: 545
I dunno how many files these were or what type, but most files (jpg, executable, doc) have a distinct signature at the start of the file. Maybe you can do something with a hex editor to find the start of the file and extract them.

jlinkels
 
Old 07-12-2006, 09:24 AM   #5
dandini
LQ Newbie
 
Registered: Jul 2006
Posts: 3

Original Poster
Rep: Reputation: 0
Many thanks for the suggestion, but the data is not mission critical - it was really just if there was a simple get-out I would have taken it, but it's not worth the time to pick through.

Thanks again,

Dan
 
Old 09-02-2011, 09:08 PM   #6
trajetre
LQ Newbie
 
Registered: Sep 2011
Posts: 2

Rep: Reputation: Disabled
I've managed to get myself into the same problem, with no alternative methods, and the data is actually quite important.

Thankfully, the types of files in the binary mass are limited to pretty much just png, gif, jpg, flv, zip, and exe, so aside from the exe, each should have an easily read legible header, and some even have a terminating signature! That said, I can't find any prescripted methods or programs to extract them all at once (I don't care about the file names just yet), and there are too many files (50,000) to extract by hand in a reasonable amount of time. I'm given some hope, though, when basic image viewing software is able to identify the first image and only the first image in the cat'd binary mass, because it means that routines exist to fully identify the beginning and end of the contained files.

Does anyone have any helpful suggestions on programs, scripts, or routines to use?

Thanks for your help!
 
Old 09-04-2011, 01:21 AM   #7
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943Reputation: 943
The situation depends on whether you concatenated the data before compressing, or just the compressed data. You can find the (approximate) number of gzip headers in a file using
Code:
hexdump -ve '1/1 " %02x"' file.gz | sed -e 's| 1f 8b 08 00 .. .. .. .. 02 03|\n+|g' | grep -ce ^+
If you have merged the gzip file, you can split them quite reliably. I'd personally write a small C program, that splits the input into gzip chunks -- the four bytes in the header are the time stamp, so you have a ten-byte header to detect --, and tests using gzip -t if the file is complete or needs more chunks. This is because you do get every file boundary correct, but you may occasionally split at data too; using gzip to test if the data file is complete handles that case beautifully. If this is your case, let me know; I can probably whip up such a program quite easily. If you used normal gzip options, then using gunzip -N on each of the split files will restore the original file names and time stamps.

However, if you have first concatenated all the files, and then gzipped the one bulk data chunk, the situation is much more complex. You need to do something like the above, but separately for each supported file type. If most of the files are JPEG, GIF and PNG images, you could pick them out from the data first, and then worry about the leftovers. If you are lucky, the leftover files were surrounded by images, so their boundaries are easier to find. Obviously, I'd write a program for this too; it would be pretty silly to try to do it by hand. However, in this case the program would be much more complex, simply due to the larger number of file types. Images and zip files are easy to check for completeness -- there are libraries one could incorporate for this --, but FLV and EXE files are a pain.
 
Old 09-04-2011, 03:12 PM   #8
trajetre
LQ Newbie
 
Registered: Sep 2011
Posts: 2

Rep: Reputation: Disabled
Thanks for the advice. I'm pretty sure I did the latter... It's pretty ugly. Between now and the time I asked the question, I found some forensic software, scalpel, that does a pretty good job of finding headers and footers of many known file types (gif, png, jpg, zip included), and extracting all possibilities (Which includes false positives).

This is no easy task, and it's probably going to be multi-step... First extract the file possibilties, then verify each file to make sure it's a valid jpg/png/gif/zip/pdf. Good times.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Command for where the individual files are for a deb package farpoint Debian 2 06-09-2006 07:29 AM
read gzip files with Mozilla? lugoteehalt Linux - Software 9 11-17-2005 05:27 AM
Passwording individual files SuSE05 Linux - Software 4 04-20-2005 07:14 AM
Burning individual files... nutshell Linux - General 4 03-02-2002 10:21 AM
gzip compressed files jamaso Linux - Newbie 4 01-13-2002 05:59 PM


All times are GMT -5. The time now is 11:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration