LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 04-15-2011, 06:15 PM   #1
malloc
Member
 
Registered: Jul 2010
Posts: 111

Rep: Reputation: 4
Disassembly in GNU/Linux?


I want to be able to disassemble a binary file, modify the assembly source, then assemble the modified assembly source back into a modified binary file.

Purpose for this is pretty much just to play around with the Crackmes (www.crackmes.de) game.

Now, disassembly is easy, there are several tools that do it, including the standard objdump with the -d argument.

However, how would you assemble an assembly source file created with objdump -d? GCC for sure doesn't want to assemble it in that format. What program, script, or arguments to GCC (none that I can think of) can be used to accomplish this?

If someone also has some good tips for tools in general for Crackmes beyond what is standard in GNU/Linux I'd love to hear about it.
 
Old 04-15-2011, 08:57 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
You can use GNU as, for example.

I personally don't like the AT&T syntax, so for x86 and x86-64 I'd probably use ndisasm and nasm instead.
 
Old 04-15-2011, 10:00 PM   #3
SigTerm
Member
 
Registered: Dec 2009
Distribution: Slackware 12.2
Posts: 379

Rep: Reputation: 234Reputation: 234Reputation: 234
Quote:
Originally Posted by malloc View Post
I want to be able to disassemble a binary file, modify the assembly source, then assemble the modified assembly source back into a modified binary file.
As far as I know, standard practice on windows platform is to modify original binary file instead of trying to assemble back disassembled program listing - simply because modifying original binary is easier. I don't think it should be done any differently on linux.

Another thing is that there's this paragraph in LQ rules:
Quote:
Posts containing information about cracking, piracy, warez, fraud or any topic that could be damaging to either LinuxQuestions.org or any third party will be immediately removed.
Technically, your question is about cracking.
Although reverse engineering has its uses, and in some places it is legal to hack application in order to make it work on your hardware, I'm not sure if LQ is the right place to ask something like this.
 
Old 04-15-2011, 10:42 PM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by SigTerm View Post
Technically, your question is about cracking.
No, it is not about cracking, since there is no copy protection, authentication, or authorization involved.
This is pure and simple reverse engineering and modification.

I don't know of any single country where reverse engineering and software modification is illegal.

In most countries, distribution of unauthorized copies (cracked or not) is illegal. In some countries it is illegal to possess or distribute software intended solely or mainly for cracking. In some countries it is illegal to run cracked software. In most countries it is illegal to run software covered by safety regulations that has been modified, but even then, modification itself is legal, running is not.

Most proprietary software has end-user license agreements that specifically prohibit reverse engineering. If you have agreed to such an agreement, reverse engineering that software is a violation of the agreement; the possible penalties depend on the agreement and country.

The way I read the LQ rules, it should be all right to discuss reverse engineering techniques, just not cracking and related stuff.

I personally find very high value in reverse engineering; just consider Kinect as an example. I would be extremely disappointed if reverse engineering was decided to be a topic non grata in LinuxQuestions.
 
Old 04-15-2011, 11:28 PM   #5
SigTerm
Member
 
Registered: Dec 2009
Distribution: Slackware 12.2
Posts: 379

Rep: Reputation: 234Reputation: 234Reputation: 234
Quote:
Originally Posted by Nominal Animal View Post
In most countries, distribution of unauthorized copies (cracked or not) is illegal. In some countries it is illegal to possess or distribute software intended solely or mainly for cracking. In some countries it is illegal to run cracked software. In most countries it is illegal to run software covered by safety regulations that has been modified, but even then, modification itself is legal, running is not.
I'm not a lawyer. Anyway, if that's against the rules, then somebody will simply nuke the post.

Quote:
Originally Posted by Nominal Animal View Post
Most proprietary software has end-user license agreements that specifically prohibit reverse engineering. If you have agreed to such an agreement, reverse engineering that software is a violation of the agreement; the possible penalties depend on the agreement and country.
An interesting situation arises when EULA forbids modification of software, and law allows it ("a modification of software is allowed in order to make it work on user's machine"). But that's another story.

Anyway, regarding original post.
Quote:
Originally Posted by malloc View Post
I want to be able to disassemble a binary file, modify the assembly source, then assemble the modified assembly source back into a modified binary file.
Instead of trying to "assemble back" disassembly listing it would be (IMO) easier to patch original file directly, because there is no warranty that disassembler will correctly guess every single jump within the program and modify adresses accordingly when you change the listing.
The standard procedure goes as follows:
  1. disassemble original file
  2. locate point of interest (within disasm listing) you need to modify.
  3. prepare replacement command
  4. locate point of interest within original file.
  5. patch the original binary with replacement command. you can produce hex sequences with almost any assembler, so that shouldn't be hard. If replacement command is to short, then fill remaining bytes with NOPs. If it is too long, then you'll have to borrow extra space somewhere and relocate parts of subroutine's body into that space. For example, it might be possible to modify size of code segment if executable format allows that, or there may be unused areas filled with junk.

I tried to disassemble program and then put it back (win platform) by assembling the listing, and frankly it wasn't worth the effort. When you replace just one byte with something else, you will easily find out if changing this particular byte caused some kind of problem. However, disasmed listing will contain absolute/relative jumps and function calls, and disassembler may not be smart enough to replace addresses of such calls with labels, so inserting just one extra byte might easily screw up entire program (and because different commands require different number of bytes, this will almost certainly happen). IDA could handle this task to some extent, but, say, if source code was produced by C++ compiler and contains virtual functions, then I wouldn't expect disassembler to create source code listing in such way that it will automatically adjust virtual function addresses if you insert extra operation somewhere. So patching source binary would be "safer", unless you are absolutely certain that source code doesn't use some kind of weird addressing schemes that cannot be converted into labels by disassembler.

Last edited by SigTerm; 04-15-2011 at 11:31 PM.
 
Old 04-16-2011, 12:18 PM   #6
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
I tried to disassemble program and then put it back (win platform) by assembling the listing, and frankly it wasn't worth the effort.
That pretty much mirrors my own experience, except I even had the benefit of knowing quite a few details about the purpose and architecture of the code. There are just too many things that the disassembler has to guess about. If the code was originally written in a high level language, it becomes even harder.
The OP didn't say anything about the nature of the binary file, but there might be some value or advantage to be gained from the segmentation information in something like an ELF format object module. If you can get a disassembler to do a good job, then working on a tool to transform the source code into something that a standard assembler can digest might make the project more useful.
--- rod.
 
Old 04-16-2011, 02:58 PM   #7
malloc
Member
 
Registered: Jul 2010
Posts: 111

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by Nominal Animal View Post
You can use GNU as, for example.

I personally don't like the AT&T syntax, so for x86 and x86-64 I'd probably use ndisasm and nasm instead.
Thanks but my main question and problem is not how to assemble assembly code in general, I've done that for ten years with code that I've written myself. But how would you go about assembling assembler code that has been disassembled from machine code with [insert favourite program here]?

E.g. if you use objdump, then what program can assemble it? I've used nasm before but the code that objdump outputs doesn't look anything like what the syntax nasm would like (I don't just mean Intel vs. AT&T syntax, I mean the whole structure of the assembly program).
 
Old 04-16-2011, 03:05 PM   #8
malloc
Member
 
Registered: Jul 2010
Posts: 111

Original Poster
Rep: Reputation: 4
Quote:
Originally Posted by SigTerm View Post
I'm not a lawyer. Anyway, if that's against the rules, then somebody will simply nuke the post.


An interesting situation arises when EULA forbids modification of software, and law allows it ("a modification of software is allowed in order to make it work on user's machine"). But that's another story.

Anyway, regarding original post.

Instead of trying to "assemble back" disassembly listing it would be (IMO) easier to patch original file directly, because there is no warranty that disassembler will correctly guess every single jump within the program and modify adresses accordingly when you change the listing.
The standard procedure goes as follows:
  1. disassemble original file
  2. locate point of interest (within disasm listing) you need to modify.
  3. prepare replacement command
  4. locate point of interest within original file.
  5. patch the original binary with replacement command. you can produce hex sequences with almost any assembler, so that shouldn't be hard. If replacement command is to short, then fill remaining bytes with NOPs. If it is too long, then you'll have to borrow extra space somewhere and relocate parts of subroutine's body into that space. For example, it might be possible to modify size of code segment if executable format allows that, or there may be unused areas filled with junk.

I tried to disassemble program and then put it back (win platform) by assembling the listing, and frankly it wasn't worth the effort. When you replace just one byte with something else, you will easily find out if changing this particular byte caused some kind of problem. However, disasmed listing will contain absolute/relative jumps and function calls, and disassembler may not be smart enough to replace addresses of such calls with labels, so inserting just one extra byte might easily screw up entire program (and because different commands require different number of bytes, this will almost certainly happen). IDA could handle this task to some extent, but, say, if source code was produced by C++ compiler and contains virtual functions, then I wouldn't expect disassembler to create source code listing in such way that it will automatically adjust virtual function addresses if you insert extra operation somewhere. So patching source binary would be "safer", unless you are absolutely certain that source code doesn't use some kind of weird addressing schemes that cannot be converted into labels by disassembler.
I agree, and I actually solved some Crackmes using this approach. The only tools I used were GDB and objdump.

However, there is another reason why I want to do this that I didn't mention. It would be neat to be able to take arbitrary disassembled code and combine it with large chunks of other disassembled code (with the appropriate glue code) or larger chunks of code that I've written myself.

Just modifying a few instructions and such is trivial. But what if I want to actually enlarge the program by say 500 bytes? I don't want to input that by hand in GDB with the --write argument; especially not when these modifications may not be all in the same region.

How would you disassemble and assemble a program on the GNU/Linux system? In addition to the reason I just mentioned it is something I've wondered what tools I should use to do for a long time, and I'm getting to the point that I want to do it just because I want to and believe it shouldn't be that hard to do. That and the other reason I just mentioned.
 
Old 04-16-2011, 03:10 PM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
Quote:
Originally Posted by malloc View Post
I've used nasm before but the code that objdump outputs doesn't look anything like what the syntax nasm would like (I don't just mean Intel vs. AT&T syntax, I mean the whole structure of the assembly program).
That is the crux of the problem. It's why I said it is not a very fruitful task and one that should include development of some tools that transforms what you get into something actually useful to an assembler. Note also Nominal's recommendation to use ndisasm as an alternative to objdump.

--- rod.
 
Old 04-16-2011, 05:01 PM   #10
SigTerm
Member
 
Registered: Dec 2009
Distribution: Slackware 12.2
Posts: 379

Rep: Reputation: 234Reputation: 234Reputation: 234
Quote:
Originally Posted by malloc View Post
E.g. if you use objdump, then what program can assemble it? I've used nasm before but the code that objdump outputs doesn't look anything like what the syntax nasm would like (I don't just mean Intel vs. AT&T syntax, I mean the whole structure of the assembly program).
AFAIK, in general you don't do that.

Quote:
Originally Posted by malloc View Post
However, there is another reason why I want to do this that I didn't mention. It would be neat to be able to take arbitrary disassembled code and combine it with large chunks of other disassembled code (with the appropriate glue code) or larger chunks of code that I've written myself.
This can be done, but you'll have to be familiar with executable file format - it is documented somewhere, so you can write a tool that takes program apart into chunks of binary data without actually disassembling it, then puts chunks of data together. There may be a ready to use tool for this, but I'm not familiar with it.

Quote:
Originally Posted by malloc View Post
Just modifying a few instructions and such is trivial. But what if I want to actually enlarge the program by say 500 bytes?
To make it larger, you'll have to enlarge code segment without chaning its base address and append extra code at the end of the segment, or you could try to introduce additional code section and so on. This is highly dependent on executable file format you're modifying. Also, depending on your system you may be able to write directly into process memory, so you could make a launcher that starts original exe and then modifies it. This isn't a "correct" approach, though.

Anyway, this has nothing to do with assembling a program. You should be looking for a way to change sizes of code blocks stored in original program file without breaking it, not for a way to assemble back disassembled listing. If you can enlarge code "segment" stored in executable by 500 bytes, then you'll be able to write something in those extra bytes. You don't have to disassemble the program to make it bigger. Sizes of code and data sections are described somewhere within program's headers or chunks, so to make bigger code section, you'll need to modify size and adjust program file accordingly. Disassembling doesn't have to be involved into process.

Or you could make original executable load extra library and use as much space for code as you want within the library. Or you could replace original library with proxy library that redirects all calls to original but provides extra functionality. There are many ways to get extra space for code.

Last edited by SigTerm; 04-16-2011 at 05:09 PM.
 
Old 04-16-2011, 09:36 PM   #11
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Quote:
Originally Posted by malloc View Post
Thanks but my main question and problem is not how to assemble assembly code in general
No, of course not. I meant that objdump uses the same assembly format as GNU as, as they both are a part of, and use the same GNU binutils libraries.

You will have to write your own script tools to convert the object dumps back to compilable assembly source code.
As of right now, I don't know of any such scripts being readily available. To be honest, I think such scripts are on the border of legality on some countries, or at least very likely to be used by nefarious people I detest on principle, so I would not publish such scripts even if I had them. (But I don't, so no need to ask privately either.)

It is quite possible to write a set of scripts that take apart an ELF binary -- I'm assuming we've restricted ourselves to Linux --, disassembling and dumping its data using objdump, into an assembly source file that when compiled using as will compile to the same binary (ignoring ELF file level differences). Binutils and a scripting language (I'd recommend GNU awk for this) suffice in Linux.

Quote:
Originally Posted by malloc View Post
But how would you go about assembling assembler code that has been disassembled from machine code with [insert favourite program here]?
It depends utterly on the disassembler used.

For small changes, HT Editor (in image mode) might suffice for your needs.

To help you start understanding what you need to do to the disassembly and object dump, start by applying
Code:
sed -e '/[Ff]ile format/ d;
        /^[\t ]*\.\.\.$/ d;
        s|[0-9A-Fa-f]* <\([^>]*\)>|\1|g;
        s|^[\t ]*[0-9A-Fa-f]*:\t[0-9A-Fa-f ]*\t|\t|;
        s|^[\t ]*[0-9A-Fa-f]*:\t\([0-9A-Fa-f ]*\)[\t ]*$|\tdb \1|;
        s|^Disassembly of section \([^:]*\):$|\t.section \1|'
to objdump -d output. Obviously, that won't compile yet. You will need to fix the leftover references (see the * in the disassembly) using e.g. the symbol table, and save the contents of the non-code sections (filter objdump -s output via e.g. sed), before the end result will compile.

Personally, I'd write a GNU awk script to process the disassembly. That lets you easily track the exact offsets, allowing you to insert suitable alignment directives or byte padding to get the exact same byte sequences and alignment as the target binary. You can process and read the section information via objdump in a BEGIN rule, to get the necessary symbols and their addresses into associative arrays, so you can fix those in the disassembly. The same applies to the binary data in non-code sections.

To develop the necessary script I'd start by writing a small C program, and compiling it to both binary and assembly using
Code:
    gcc -O0 -Wall -pedantic -o binary source.c
    gcc -O0 -Wall -pedantic -S source.c
That way you can compare the results of your object dumping/parsing script to the actual assembly source.
When you get it close enough to compile the exact same binary (ignoring ELF file level differences), it should work for other binaries too.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: GNU/Linux and freedom: non-free software hidden in your GNU/Linux distribution LXer Syndicated Linux News 0 04-02-2010 11:21 PM
Sony VAIO PCG-FRV23 Laptop DISASSEMBLY INSTRUCTIONS mconroy Linux - Laptop and Netbook 2 11-05-2007 09:17 PM
Mad Dog HD enclosure disassembly. advice please bigalexe General 8 10-29-2007 09:14 AM
GNU/Linux vs. GNU/OpenSolaris win32sux Linux - Software 5 11-27-2005 10:10 AM
harddisk disassembly to get data eantoranz Linux - Hardware 8 10-10-2004 11:20 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 07:24 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration