Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
I'm not the master himself (obviously ), but why not? As long as it is not a partition but the whole drive dd should produce an identical bit-by-bit copy i.e. a clone as far as I understood this thread.
First of all, awesome material up there. Really inspired me a lot in using dd. Thumbs up x4 (hands and feets, in case you're wondering).
However, I encountered some problems with one of the techniques described in your tutorial.
Problem encountered: hexdump -C | grep 'string' fails miserabily if a line break or column break happens to be in the middle of 'string' in hexdump's output.
Test done (italics added by me):
Code:
fab@fab-laptop:~$ echo "Line long enough to wrap around in hexdump -C output" > foo.txt
fab@fab-laptop:~$ cat foo.txt | hexdump -C
00000000 4c 69 6e 65 20 6c 6f 6e 67 20 65 6e 6f 75 67 68 |Line long enough|
00000010 20 74 6f 20 77 72 61 70 20 61 72 6f 75 6e 64 20 | to wrap around |
00000020 69 6e 20 68 65 78 64 75 6d 70 20 2d 43 20 6f 75 |in hexdump -C ou|
00000030 74 70 75 74 0a |tput.|
00000035
fab@fab-laptop:~$ cat foo.txt | hexdump -C | grep 'enough'
00000000 4c 69 6e 65 20 6c 6f 6e 67 20 65 6e 6f 75 67 68 |Line long enough|
fab@fab-laptop:~$ cat foo.txt | hexdump -C | grep 'output'
[No output returned due to line break]
fab@fab-laptop:~$ cat foo.txt | hexdump -C | grep '6f 75'
00000000 4c 69 6e 65 20 6c 6f 6e 67 20 65 6e 6f 75 67 68 |Line long enough|
00000010 20 74 6f 20 77 72 61 70 20 61 72 6f 75 6e 64 20 | to wrap around |
00000020 69 6e 20 68 65 78 64 75 6d 70 20 2d 43 20 6f 75 |in hexdump -C ou|
fab@fab-laptop:~$ cat foo.txt | hexdump -C | grep '6f 75 74 70'
[No output returned]
fab@fab-laptop:~$ cat foo.txt | hexdump -C | grep '6e 67'
[No output returned]
(NOTE: results are exactly the same if using 'dd if=foo.txt' in place of cat)
This will always happen as long as hexdump is invoked before grep, because hexdump (and od) always formats output, no matter what switches one uses.
So the question is: how to look for a particular string in a stream of data AND BE SURE to find it (if it's present?).
Also, I want to avoid regexp hell, which I'm sure could help a lot when searching for text strings, but I want a method that also works for binary data. Here's why.
Scenario: failed SD memory card with lot of important photos in it. SD was reporting empty on camera that was using it, my windows box with an external SD card reader, and my linux laptop with an embedded SD card reader.
Fire up linux machine (Ubuntu 8.10 x86_64, kernel 2.6.27-11-generic), insert memory card into slot and mount read only, and one dd noerror later I had a working image of my memory card in my home dir.
Now, for the actual retrieval, I tought that I could dd | hexdump | grep to look up for the JPEG magic bytes, retrieve all data from one to the next, put it into a file, and hope for the best.
If that doesn't work (too much garbage data?), collect a list of the offsets that contain the JPEG magic bytes, and take a look manually. It was "only" 2gb of data, and I had plenty of time to spare and definitely the motivation to do so. So much that I set up a ramdrive large enough just for this task.
NOTE: grep usually can only look for ASCII substrings. Here's a nice trick to make it look for anything else:
Code:
cat something | grep `echo -e '\x26\x26'`
The -e switch enables interpretation of escape codes like \n, \r, and also \0NNN (octal) and \xNN (hexadecimal). Not every version of echo has this functionality. The one on my machine has.
In this particular case, '\x26\x26' translates to '&&', but any hex value is valid. For the record, JPEG magic number is 'ff d8 ff e0', which doesn't translate to any ASCII valid character. Try this with any valid jpeg image:
It's not pratical with longer byte sequences, but for magic numbers is enough.
Anyway, this didn't work, due to the problem of hexdump formatting of the output data. So I tought about reversing grep and hexdump in the command line, but that would've given me bogus offsets, thus rendering the actual extracting of the data pretty impratical. That could be solved by using the -a and --after-content=NUM options in grep, example:
Code:
dd if=foo.img | grep -a -A 10 `echo -e '\xff\xd8\xff\xe0'` | hexdump -C
The line above would effectively retrieve 10 "lines" of data following the JPEG magic byte, which, redirected to any file, could give us something to work with.
-a makes grep output the actual content of the file, instead of simply telling us that "the binary file matches".
-A is short for --after-content=NUM, which prints NUM lines after the match, match included.
So, problem solved? Not quite. First of all, I'm not sure of what constitutes a "line" when manipulating binary data, and the very concept seems quite silly to me.
Second, this doesn't let us have any real idea as to where the data actually IS in the data stream, since hexdump will be showing offsets pertaining to output generated by grep, not by dd.
Third, I really don't like using grep for a task like this, but that could just be me. Also, I'm not really sure about the effect of -a in grep: I'm afraid it could be corrupting data, by removing carriage returns and/or line feeds, which in a binary file, are not cr/lf at all.
Also, I'm not really that knowledged (is this a word?), and I don't have enough programming skills to write something in an actual programming language to extract data from a binary stream. So, meh.
Any help, suggestion, critique or whatever is appreciated.
And, AwesomeMachine, thumbs up again for the great op. This post of mine wouldn't have existed without yours :P
(PS: if anybody is actually going to try this with an actual memory card with photos taken with an actual camera, remember that photocameras do not store plain JPEGs, but JPEG+ExIf which magic number is ff d8 ff e1)
Scenario: failed SD memory card with lot of important photos in it. SD was reporting empty on camera that was using it, my windows box with an external SD card reader, and my linux laptop with an embedded SD card reader.
<snip>
Have you looked into photorec, testdisk and foremost, which puportedly recover photos (and other files) from damanged filesystems:
Have you looked into photorec, testdisk and foremost, which puportedly recover photos (and other files) from damanged filesystems.
Yes, I did in the end, for the particular task of extracting data from my memory card. The purpose of my post, however, was also to point out that dd | hexdump | grep 'string' may return false negatives under certain circumstances, and I was asking for another, not flawed method.
This works like a charm, thanks a lot. I understand that the double grep is to grab only the offsets, and not print onscreen the actual binary data. Then, with some math, one could dd again and extract
Quote:
Originally Posted by dr_agon
Beware: jpg files may also contain this string inside the picture (one of mine did).
...what string?
JPEG files produced by photocameras do contain ExIf information, which contain a bunch of ascii strings, are you talking about those?
Quote:
Originally Posted by dr_agon
Remark: If you include 0x00 somewhere in the expression you will not get correct results.
results.dat could then be further processed, in case there was more than one match. From your remark, I'd gather that this won't work, but I'm not currently able to test it due to not having a linux box at hand until Wednesday.
Every once in while (and it isn't often) one comes across a genuinely valuable resource like this. The only criticism I would make is that it could benefit from a little smoothing out here and there as at the moment it resembles some tech-head's hastily scribbled notes. If this could be done without the document losing its pithy conciseness, it would be AWESOME indeed.
Despite the above, you write VERY informatively. How about a seperate, substantive article along similar lines on computer forensics? You obviously know what you're talking about!
Last edited by Completely Clueless; 04-09-2009 at 08:07 AM.
Reason: afterthought.
OS 9 IS BSD Unix, as are OS x. A little window dressing has been added by Apple. What I would do is duplicate the drive using a live linux CD. Or, you could use rsync.
The great thing about using hexdump before grep is that grep chokes on a whole drive partition. It slowly grinds to an almost halt. But I am pleased you pointed out the problem with hexdump's formatting. Obviously there are some hidden issues, especially with search strings that span more than one of hexdump's output lines.
The whole idea of grep is to find lines in an ouput. Otherwise there would be no way to ever locate the data. You could try autopsy. That will deal with the formatting issue, and find the search string. Autopsy does a lot more than dd. For serious work, I highly recommend it.
Thanks for pointing out the limitations of grep with hexdump.
This is a great thread. Learned alot about dd that I didn't know. But, I still have some questions.
I need to image boxes all the time, and right now I use Acronis. It works great and is nice and all, but the issue is I have to open the machines and take out the hard drive, hook them up to another machine, then image them. This is done b/c the machines we use don't have a VGA on them. They only have NIC's, Console port, two USB ports, and power. That's it.
So, what I want to do is be able to use dd (if possible) to image these machines.
I have an Ubuntu server that I have setup to PXE boot Acronis right now. I also have a couple of Ubuntu distro's on it to PXE boot if I wanted.
Can I put our OS on the PXE server, then use dd to image the target machines? The target machines have completely blank hard drives from the mfg.
When I boot the target machine(blank HD) I can PXE boot it. I was hoping I'd be able to setup a way that one of the PXE options is our OS to install.
This is kind of a mix of dd and PXE I guess.
EDIT:
What I was thinking I could do is boot a live distro from a USB CD-ROM and then use it to dd the image from my Ubuntu server on the network to the target machine booted with the live cd. Would this work?
Last edited by abrrymnvette; 04-29-2009 at 12:01 PM.
If there is no OS on the target machine, you could use Knoppix, and then ssh into it from the source machine. The only thing is, you need to know what IP Knoppix is using. It will something in the subnet, but it could be any free address. Using all my strength, and formidable will, I can't come up a much better solution than removing the drives and imaging them.
It is conceivable, if you're good at scripting, and good at hacking the boot process, that you could make a boot disk that would minimally get the target up, and then ssh the source machine, and output the target's IP to a text file.
Then, you could read the text file on the source machine, ssh into the target, kill the ssh from the target to the source, run the command: 'netcat -l -p <port, i.e.: 1234> | dd of=/dev/hda bs=16065b' on the target. On the source machine, subsequently run: 'dd if=/home/sam/drive_image.bin bs=16065b | netcat <target_IP> <port>', and it should image
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.