LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   whole-disk text-string scanning utility (https://www.linuxquestions.org/questions/linux-software-2/whole-disk-text-string-scanning-utility-711444/)

Completely Clueless 03-13-2009 04:24 PM

whole-disk text-string scanning utility
 
Hi guys,

Can anyone recommend a Linux utility to scan an entire physical disk (of only 12Gb)for selected text strings which searches (obviously) not just files and folders but cluster tips and unused space? Something which shows up all instances of hits found, where they are, and preferably has a "search and replace xyz with abc" facility. Many thanks!

CC.

chrism01 03-14-2009 12:39 AM

I think you probably want http://www.cgsecurity.org/wiki/PhotoRec_Step_By_Step. This primarily for recovering from corrupt or deleted files.
Not sure its do replace, there's usually no point. You recover first, then fix-up if possible.
For extant files try a loop with find & sed.

syg00 03-14-2009 12:45 AM

If you want to hit unused space et al, you'll probably need a full-on forensic tool.
Been discussed plenty of times - there are even forensic liveCDs.

Completely Clueless 03-14-2009 11:48 AM

Quote:

Originally Posted by syg00 (Post 3475028)
If you want to hit unused space et al, you'll probably need a full-on forensic tool.
Been discussed plenty of times - there are even forensic liveCDs.

Well I have the Knoppix DVD which has a comprehensive Forensic Toolkit on it and a baffling array of other utilities, so I may well have something to do the job already. But all of those program names mean nothing to me; I need a specific pointer to a particular piece of software which will do the job. I need a program name to search for.

pixellany 03-14-2009 11:55 AM

You can always search this way:

dd if=/dev/sda bs=512 skip=<START> count=<RANGE> | hexdump -C | grep <keyword>

Replace <START> with the number of the 1st sector to search
<RANGE> with the number of sectors to search
<keyword> with the string to look for

H_TeXMeX_H 03-14-2009 12:55 PM

Quote:

Originally Posted by pixellany (Post 3475423)
You can always search this way:

dd if=/dev/sda bs=512 skip=<START> count=<RANGE> | hexdump -C | grep <keyword>

Replace <START> with the number of the 1st sector to search
<RANGE> with the number of sectors to search
<keyword> with the string to look for

Sorry, but I have to disagree. It would seem to work, but as the output of hexdump -C is something like:

Code:

000076b0  2e 66 72 69 68 6f 73 74  2e 63 6f 6d 2f 22 20 63  |.frihost.com/" c|
000076c0  6c 61 73 73 3d 22 62 6f  74 74 6f 6d 5f 6c 69 6e  |lass="bottom_lin|
000076d0  6b 73 22 3e 46 72 69 68  6f 73 74 3c 2f 61 3e 2c  |ks">Frihost</a>,|
000076e0  20 3c 61 20 68 72 65 66  3d 22 68 74 74 70 3a 2f  | <a href="http:/|

What if I were to grep "class" from this ... it wouldn't work because it's split between lines.

Probably the best solution is to write a C program and use the image of the whole disk, but I'm not sure why anyone would do this.

You could also use:

Code:

find / | grep whatever
and foremost.

pixellany 03-15-2009 10:21 PM

touche!!

In this particular brute-force method, you would need to try several different key words until you established where the file was. Some different hexdump options might help also....

Completely Clueless 03-16-2009 09:18 AM

I wonder if a hex editor would do the job satisfactorily? Presumably this kind of program can 'see' *everything* on a disk?

farslayer 03-16-2009 10:06 AM

http://www.forensicswiki.org/wiki/The_Sleuth_Kit
The Sleuth Kit can search for keywords..

If that doesn't work for you check out some of the other Forensics tools available..
http://www.forensicswiki.org/wiki/Main_Page

H_TeXMeX_H 03-16-2009 12:02 PM

Hey, neat, I didn't know about this kit, now if only they supported more filesystems.

Completely Clueless 03-16-2009 12:08 PM

Quote:

Originally Posted by farslayer (Post 3477113)
http://www.forensicswiki.org/wiki/The_Sleuth_Kit
The Sleuth Kit can search for keywords..

If that doesn't work for you check out some of the other Forensics tools available..
http://www.forensicswiki.org/wiki/Main_Page

Thanks, Farslayer. You really are a most helpful guy. I'd tip you another "thanks" but it might start to look as if you're paying me, or we're related in some way. ;-)

farslayer 03-16-2009 01:00 PM

No worries.. The check is in the mail :)

Kenhelm 03-17-2009 06:42 AM

The 'strings' command extracts text from binary data.
The following code scans /dev/sda for strings containing '.jpg'
It has to be run as root. Use 'CTRL c' to stop the command.
Code:

dd if=/dev/sda | strings -n 4 -t d | grep  '\.jpg'

3005553932 ElectronicsCapacitorscapacitor_codes_filestop_img6.jpg
3112021438 Sunset2.jpg, and Sunset3.jpg.
3112022948 the pictures are saved as Sunset1.jpg, Sunset2.jpg
3119203911 http://www.perl.com/graphics/perlhome_header.jpg</

# 'grep -C 2'  adds 2 lines of context before and after
dd if=/dev/sda | strings -n 4 -t d | grep -C 2 '\.jpg'
--
3005442040 Bashlinuxcommand.orghtml_textsizeof.html
3005442088 Bashlinuxcommand.orghtml_textsizeof.README.html
3005443248 Bashlinuxcommand.orgimagesxterm.jpg
3005443292 Bashlinuxcommand.orgman_pagesa2p1.html
3005443340 Bashlinuxcommand.orgman_pagesa2ps1.html
--
3005552864 ElectronicsCapacitorscapacitor_codes_filesactuators.gif
3005552928 ElectronicsCapacitorscapacitor_codes_filesArticles.gif
3005552992 ElectronicsCapacitorscapacitor_codes_filesback_green.jpg
3005553056 ElectronicsCapacitorscapacitor_codes_filesback_stone.jpg
3005553120 ElectronicsCapacitorscapacitor_codes_filesBasics.gif
3005553180 ElectronicsCapacitorscapacitor_codes_filescp51.gif
--

-n 4 means only extract strings of 4 or more characters.
-t d means precede each extracted string with the decimal offset of its first character.
(This isn't the offset of '.jpg' unless it's at the start of the string.)
I'm using the version of 'strings' supplied with Mandriva.
The version supplied with Puppy 4.1.1 does not support '-t d' for decimal offset.
It only has '-o' which gives the offset in octal.

n.b. The dd command is dangerous; typing 'of=$device' instead of 'if=$device' can destroy the $device file system.

Completely Clueless 03-17-2009 02:59 PM

Quote:

Originally Posted by Kenhelm (Post 3478141)
n.b. The dd command is dangerous; typing 'of=$device' instead of 'if=$device' can destroy the $device file system.

Good point about the dangers of transposing your input and output files and another good reason why 'dd' should be re-written to become rather more 'intelligent.'

Thanks for the 'strings' command suggestion. I've never heard of it but will certainly check it out.

CC.

H_TeXMeX_H 03-17-2009 03:01 PM

Quote:

Originally Posted by Completely Clueless (Post 3478576)
Good point about the dangers of transposing your input and output files and another good reason why 'dd' should be re-written to become rather more 'intelligent.'

heh heh, I doubt it. But, you could probably write a wrapper script if you knew what you wanted to protect from.


All times are GMT -5. The time now is 02:19 PM.