Linux - SoftwareThis forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
I am trying to find and extract from a large disk (300GB) of unknown partitioning and formatting, numerous files, based on some hex search patterns.
It seems programs like ghex2 and khex attempt to load the whole (device) file into memory (I don't think I have heard of any system with 300GB of RAM...), so they fail badly.
I have tried using split to create smaller segments of the /dev/hdd, but since my other drive is not this large, this bombs out due to lack of diskspace.
Next I experimented with dd: [blocks=512] ~~5GB
#dd if=/dev/hdd of=test1 skip=1000000 size=10000000
#hexdump -C test1
and have repeated this numerous times to create summaries of the data, looking for non-zero, and non-ff data, but this is a slow old way of doing it, and very manual. If the data is not repeating, the hexdump only shows 16 bytes, then an asterix is shown until there is a difference.
I have used grep to try to find the data I'm after, but haven't got a working pattern yet for:
0x00 0x00 0x00 0x00 0x00
nor 0x00 0x00 0x01 0xca
The offsets of the patterns within the /dev/hdd would also be useful to use with dd.
I tried combining the two methods directly and using pipes and redirection, but I haven't hit upon a working combination.
#hexdump -C -s 80000 -n 1000000/dev/hdd
hexdump: stdin: Illegal seek.
I guess there may be tools/ options that can do this for me.
Basically I want to dump data from the device file (raw hard disk), that is not all zeroes nor all ff, into separate files each, but I need to choose an offset (since my other disk isn't big enough - I can examine / erase / store elsewhere the useful data and then repeat the process further along the disk).
Even better, is there a way to extract chunks of data starting with the second pattern above, and stopping just before the next version of the pattern, or copying a certain number of blocks of the search data (eg 10000 items) into each file ?
# fsstat /dev/hdd
Cannot determine file system type
To give a better idea on what I'm doing:
a. I have a dvr that records to standard IDE harddisk.
b. Instead of editing/writing out recorded mpeg2 video to the unit's DVD recorder, I want to edit the captured video on PC first.
c. I put the disk into a linux machine: /dev/hdd
d. Linux (nor dos/xp) don't understand the partitions
e. Linux can't mount the whole disk as various well known file systems.
f. As mentioned, I can use dd with skip and size values to grab blocks of data at a time.
g. I only have 5G of disk space free, so I need to extract mpeg2 files up to half that amount, and then temporarily store them elsewhere, until I get a chance to edit the streams. (they play fine in mplayer/vlc etc).
I can use hexdump to convert the hex data to ascii, and then grep to find frame markers [\x00\x00\x01\x0c] positions. I can then re-extract the exact blob of data that I am after.
But this is tedious and painstaking. For a giggle I wrote a short java app to find the marker positions, but this seems to run extremely slowly (however, it did do the job!). Once I had extracted some usable blobs of data, I thought I bet there is already tools/utilities options for doing exactly what I am doing; this lead me here!
Is there existing tools to:
1. get output of a list of areas where the marker occurs. (this occurs a lot!, so some way to say continue to a max byte size would be better).
2. get output of the null (\x00) parts of the disk eg 16+ bytes
3. get output of the null (\xff) parts of the disk eg 16+ bytes
4. extract blobs of data between a start marker and either
1. the 00 00's
2. the ff ff's
3. finishing just before the next marker ?
I'm also downloading helix, but that is going to take ages, so I welcome any further suggestions...
How about a shell script that uses dd to read the disk one chunk at a time while looking for the patterns you want? Because some things tend to be done in multiples of 512-byte sectors, you could just loop on the sector count, and have the script report to you when it finds the pattern, and where (eg sector + offset). You could write it to print n bytes of context whenever it finds a match, or you can just note the locations, and then go back by hand.
For 300 Gbytes, I see you starting the thing, and then going to dinner while it works....
search for pattern on drive
command-line arguments are search pattern in hex, and drive desingnation
Example call: "finder eeff /dev/hda"
while true do;
dd if=$2 bs=512 count=$CS | hexdump -C | grep $1 > tmp
print CS, tmp (#sector count + the line(s) from hexdump which includes offset)
This is NOT the correct syntax, and maybe only stops with ctrl-Z. You need to add more sophistication to taste and get it in the shell syntax....
How about a simple shell script, to loop thru the drive and find patterns. This can be as complex as you wnat it--looking for pattern A after having found pattern B, etc.
BC=0 (initialize the block count)
while true do;
dd if=$1 bs=2048 count=BC |hexdump -C |grep _pattern_ > tmp
printf BC\n\n tmp (this gives you block count and offset (from hexdump)--tmp already has newline chars, so we dont need to add them)
_pattern_ can come from a command line argument or be hard coded
$1 is the first argument---eg /dev/hda
keep the block size as a multiple of the secotr size.
to have sub-loops, probably write hexdump output to file, then loop with grep so that you get the return code from grep to tell it to look for another pattern.
This obviously lacks finesse--eg it will likely loop forever. Also, you need to use the correct sahell syntax....
teckk: I have now given helix a whirl; nice distro, but it doesn't seem to have the tools to analyze raw disk data when the file system is unknown (ie not NTFS/FAT/EXT3 etc).
Also tried with foremost. It seems to need an actual file, rather than the /dev/hdd device file. If I had another 300G disk, I could copy the whole drive to the new drive (within an ext3fs filesystem). I did this with the first 1G, but it was unable to find any mpg files or any at all, where I had already extracted many mpeg streams manually.
I guess I'm back to either scripting or some program coding to get what I am trying to retrieve.