In memory representation of a program

addtheice · 09-14-2010, 02:01 PM

I'm looking for a way to get the in memory representation of a program (and all linked in code). I'm working on a static analysis tool for work that requires me to load and analyse the code of a loaded program. (ie, here is the program file to load, what does it look like after loaded?)

My google-foo may just not be up to snuff because I can't seem to find anything on this.

any clue on where to look / what to read up on?

JohnGraham · 09-14-2010, 04:28 PM

It sounds like you want /proc/[pid]/mem and /proc/[pid]/maps. I've no experience using them though - `man 5 proc` might get you started.

cyent · 09-14-2010, 05:50 PM

Quote:

Originally Posted by JohnGraham

It sounds like you want /proc/[pid]/mem and /proc/[pid]/maps..

The exact layout of code in memory is the business of the OS and the ld.so, you _shouldn't_ be writing anything that depends on that exact layout as they are free to alter it as needed.

That said, they at least tell you what they have done in /proc/[pid]/maps

You can use ptrace to stop the program at some appropriate point and then run around /proc/[pid] to your hearts content.

I can post a nifty chunk of code that does that if you want.

addtheice · 09-16-2010, 12:44 PM

I should have been a bit more particular in my phrasing. I only need a map of what the program thinks it's lay out is. I don't really care about how the layout works in relation to virtual vs physical memory, nor do I care overmuch about what page is loaded vs which ones are not. I'm doing a data flow analysis and path flow analysis so I need a look at the program as it is in memory and how it's linked together when loaded. Your suggestions are exactly what I needed (tally ho to the internet I go!). I'm off to read up on this now.

I just purchased the book 'linux kernel internals' I think that will help a lot with this endeavor. I would deeply appreciate greatly any simple how to code you would like to post.

Sergei Steshenko · 09-16-2010, 01:56 PM

Quote:

Originally Posted by addtheice

I should have been a bit more particular in my phrasing. I only need a map of what the program thinks it's lay out is. I don't really care about how the layout works in relation to virtual vs physical memory, nor do I care overmuch about what page is loaded vs which ones are not. I'm doing a data flow analysis and path flow analysis so I need a look at the program as it is in memory and how it's linked together when loaded. Your suggestions are exactly what I needed (tally ho to the internet I go!). I'm off to read up on this now.

I just purchased the book 'linux kernel internals' I think that will help a lot with this endeavor. I would deeply appreciate greatly any simple how to code you would like to post.

You also probably need 'objdump' - try

man objdump
objdump -s -x `which cp` | less

- the last one is an example.

addtheice · 09-16-2010, 02:45 PM

Quote:

Originally Posted by Sergei Steshenko

You also probably need 'objdump' - try

man objdump
objdump -s -x `which cp` | less

- the last one is an example.

Thanks. That was next on my list of things to figure out. I also was looking at addr2line for the debug information. Combining this with your suggestion should give me almost all the information I could need. Anything else will probably be very particular to my problem and will need to be invented as I go (what do you mean I have to do work! really? ugg!)

cyent · 09-16-2010, 05:15 PM

Quote:

Originally Posted by addtheice

I should have been a bit more particular in my phrasing. I only need a map of what the program thinks it's lay out is.

Actually, if I remember correctly, objdump will only take you so far as the layout of statically linked items. Once DLL's, mmap and heap is involved, the program, as sitting on the disk, has no idea where those items will be, in physical or virtual memory. Those things are resolve at run time!

And at run time, the program is given a pointer to index off. Thus the only static information in the program is "index this far off the base pointer when you are told what it is".

addtheice · 09-17-2010, 11:20 AM

ok, So I have all this. The last thing I need is the ability to load a child process with the program. I had assumed that I could just fork, ptrace, and exec it....but how do I load the program *without* running it? I don't need a running program (in fact it can't be running).

I had assumed I could do something like exec it then break point it at the start but I seem to be having an issue getting that to work. Is there any central document collection on these calls? I can find all the information I want on each of the commands I need....once I know the name of the command. But if I know what I want to do but not what the command is called...I'm basically reduced to begging for help. I would rather go and figure this stuff out myself but I can't seem to find a good repository of this information that's all cross linked nicely by subject rather then by function name

theNbomr · 09-17-2010, 12:52 PM

I don't have a solution to your problem, but I think I can add a bit of perspective.

A program image in memory is normally put there by the OS, which has the privilege of manipulating protected memory. The method(s) required to do this will vary depending on the CPU architecture, available memory, etc. The OS (and I assume you are limiting this to Linux, which is already a large universe of possibilities) also creates a process, which includes entry into a process table, that is used by the scheduler to know when the process is allowed to run (among other things, I assume). It will be difficult (to divorce the function of loading and linking the binary image by the OS, from the creation of a process. I think that the best you can hope for is to mimic the behavior of the linking loader within the heap space of your own process. As I understand it, this cannot include creation of an executable code segment (same as self-modifying code).

Having said all of this, have you dismissed the use of a debugger as the principle tool for your investigation? gdb seems to posses all of the capabilities to load and control execution of a program image. Perhaps your tool could use gdb as a code base. Depending on how polished the tool needs to be, perhaps you could find a way to use gdb as a sort of back-end, and use some tools/methods like expect or screen's stuff command to control gdb in ways that produce your desired analysis.

I don't know if Tiny Programs, or the related A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux is useful, but for someone with interest in a problem such as yours, it should at least be interesting.

--- rod.

addtheice · 09-17-2010, 01:28 PM

GDB does something like this when you go to debug a program. It should be possible to load a child process then put a watch point at the _start routine. I know this because it's possible to load a program with GDB and place a watchpoint at the main function (which isn't exactly _start but whatever). So it _should_ be possible. I'm ok with suspending the program and reading it while it's suspended, I just can't have the program running or having had it modify itself anywhere at the start (all values in memory must be that of initialization).

It's possible, I'm just not sure exactly how to pull it off. I can't simply recreate how the OS does it because it would be far to easy to make a difference between my action and the OS's action and the point is to build a static analysis tool for what is 'on the metal and in the environment' live.

theNbomr · 09-17-2010, 04:32 PM

But doesn't gdb just load the specified program, and then wait for you to instruct it to step through code, or examine data, whatever? I just tried a helloWorld program under gdb running in a screen session. I was able to stuff commands to it from another shell, and it responded as expected (set a breakpoint on main). This could be rolled into a script (thinking Perl, but whatever language you want would work). It is possible to read back from the screen session, and capture the output of gdb. Actually analyzing the result this way would be clumsy, but depending on what degree of sophistication you require, it could be do-able. I have used this method to perform primitive control of interactive console-mode applications in the past, and while clumsy, it does work. I am guessing you could also contrive a system of launching gdb as a child process with pipes connected to its standard IO, to issue commands to it, and read the results.

--- rod.

addtheice · 09-17-2010, 07:00 PM

i could. but why? I could also just look at how GDB does it and do that as well. it's using ptrace and exec and fork to do this exact thing. thats my point. i don't need all that other stuff i just need THIS part of it. that's like asking for a coffee and someone giving you a coffee machine and ground coffee. i don't want THAT part just the coffee.