LinuxQuestions.org - find hex patterns in raw hdd block device

- Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)

- - find hex patterns in raw hdd block device (https://www.linuxquestions.org/questions/linux-software-2/find-hex-patterns-in-raw-hdd-block-device-398668/)

find hex patterns in raw hdd block device

I am trying to find and extract from a large disk (300GB) of unknown partitioning and formatting, numerous files, based on some hex search patterns.

It seems programs like ghex2 and khex attempt to load the whole (device) file into memory (I don't think I have heard of any system with 300GB of RAM...), so they fail badly.

I have tried using split to create smaller segments of the /dev/hdd, but since my other drive is not this large, this bombs out due to lack of diskspace.

Next I experimented with dd: [blocks=512] ~~5GB
#dd if=/dev/hdd of=test1 skip=1000000 size=10000000
#hexdump -C test1
and have repeated this numerous times to create summaries of the data, looking for non-zero, and non-ff data, but this is a slow old way of doing it, and very manual. If the data is not repeating, the hexdump only shows 16 bytes, then an asterix is shown until there is a difference.
I have used grep to try to find the data I'm after, but haven't got a working pattern yet for:
0x00 0x00 0x00 0x00 0x00
nor 0x00 0x00 0x01 0xca
The offsets of the patterns within the /dev/hdd would also be useful to use with dd.

I tried combining the two methods directly and using pipes and redirection, but I haven't hit upon a working combination.
#hexdump -C -s 80000 -n 1000000/dev/hdd
hexdump: stdin: Illegal seek.

#hexdump -C|dd if=/dev/hdd skip=20000000 size=1000000
1&y1&z1&{1&|1&}1&~1&1&\uffff1&\uffff1&\uffff1
ie crap!
#dd=/dev/hdd skip=20000000 size=1000000|hexdump -C

I guess there may be tools/ options that can do this for me.

Basically I want to dump data from the device file (raw hard disk), that is not all zeroes nor all ff, into separate files each, but I need to choose an offset (since my other disk isn't big enough - I can examine / erase / store elsewhere the useful data and then repeat the process further along the disk).

Even better, is there a way to extract chunks of data starting with the second pattern above, and stopping just before the next version of the pattern, or copying a certain number of blocks of the search data (eg 10000 items) into each file ?

I don't know what you are trying to do. Try these.

http://www.sleuthkit.org/
http://www.efense.com/helix/

Thanks teckk, I have now tried the tools in sleuthkit. My major finding is that the various tools can't recognize a file system type, so they don't know what to do with it:

# dstat -v -f fat /dev/hdd 0
img_open: Type: (null) Offset: (null): NumImg: 1 Img1: /dev/hdd
raw_read_random: byte offset: 0 len: 512
dstat: Error: not a FATFS file system (cluster size)

# fsstat /dev/hdd
Cannot determine file system type
====
To give a better idea on what I'm doing:
a. I have a dvr that records to standard IDE harddisk.
b. Instead of editing/writing out recorded mpeg2 video to the unit's DVD recorder, I want to edit the captured video on PC first.
c. I put the disk into a linux machine: /dev/hdd
d. Linux (nor dos/xp) don't understand the partitions
e. Linux can't mount the whole disk as various well known file systems.
f. As mentioned, I can use dd with skip and size values to grab blocks of data at a time.
g. I only have 5G of disk space free, so I need to extract mpeg2 files up to half that amount, and then temporarily store them elsewhere, until I get a chance to edit the streams. (they play fine in mplayer/vlc etc).
I can use hexdump to convert the hex data to ascii, and then grep to find frame markers [\x00\x00\x01\x0c] positions. I can then re-extract the exact blob of data that I am after.

But this is tedious and painstaking. For a giggle I wrote a short java app to find the marker positions, but this seems to run extremely slowly (however, it did do the job!). Once I had extracted some usable blobs of data, I thought I bet there is already tools/utilities options for doing exactly what I am doing; this lead me here!

Is there existing tools to:

1. get output of a list of areas where the marker occurs. (this occurs a lot!, so some way to say continue to a max byte size would be better).
2. get output of the null (\x00) parts of the disk eg 16+ bytes
3. get output of the null (\xff) parts of the disk eg 16+ bytes
4. extract blobs of data between a start marker and either
1. the 00 00's
2. the ff ff's
3. finishing just before the next marker ?

I'm also downloading helix, but that is going to take ages, so I welcome any further suggestions...

Quick hunch:

How about a shell script that uses dd to read the disk one chunk at a time while looking for the patterns you want? Because some things tend to be done in multiples of 512-byte sectors, you could just loop on the sector count, and have the script report to you when it finds the pattern, and where (eg sector + offset). You could write it to print n bytes of context whenever it finds a match, or you can just note the locations, and then go back by hand.
For 300 Gbytes, I see you starting the thing, and then going to dinner while it works....;)

Pseudocode:
"finder"
search for pattern on drive
command-line arguments are search pattern in hex, and drive desingnation
Example call: "finder eeff /dev/hda"
CS=0
while true do;
dd if=$2 bs=512 count=$CS | hexdump -C | grep $1 > tmp
print CS, tmp (#sector count + the line(s) from hexdump which includes offset)
CS=CS+1
done

This is NOT the correct syntax, and maybe only stops with ctrl-Z. You need to add more sophistication to taste and get it in the shell syntax....

How about a simple shell script, to loop thru the drive and find patterns. This can be as complex as you wnat it--looking for pattern A after having found pattern B, etc.
eg: (pseudocode)
BC=0 (initialize the block count)
while true do;
dd if=$1 bs=2048 count=BC |hexdump -C |grep _pattern_ > tmp
printf BC\n\n tmp (this gives you block count and offset (from hexdump)--tmp already has newline chars, so we dont need to add them)
BC+=1
done

_pattern_ can come from a command line argument or be hard coded
$1 is the first argument---eg /dev/hda

keep the block size as a multiple of the secotr size.

to have sub-loops, probably write hexdump output to file, then loop with grep so that you get the return code from grep to tell it to look for another pattern.

This obviously lacks finesse--eg it will likely loop forever. Also, you need to use the correct sahell syntax....

OOOOPPPSSS!!
due to a glitch in the web site, I thought the first of these did not get posted---sorry

teckk: I have now given helix a whirl; nice distro, but it doesn't seem to have the tools to analyze raw disk data when the file system is unknown (ie not NTFS/FAT/EXT3 etc).

Also tried with foremost. It seems to need an actual file, rather than the /dev/hdd device file. If I had another 300G disk, I could copy the whole drive to the new drive (within an ext3fs filesystem). I did this with the first 1G, but it was unable to find any mpg files or any at all, where I had already extracted many mpeg streams manually.

I guess I'm back to either scripting or some program coding to get what I am trying to retrieve.