LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-14-2010, 02:01 PM   #1
addtheice
LQ Newbie
 
Registered: Sep 2010
Posts: 9

Rep: Reputation: 0
In memory representation of a program


I'm looking for a way to get the in memory representation of a program (and all linked in code). I'm working on a static analysis tool for work that requires me to load and analyse the code of a loaded program. (ie, here is the program file to load, what does it look like after loaded?)

My google-foo may just not be up to snuff because I can't seem to find anything on this.

any clue on where to look / what to read up on?
 
Old 09-14-2010, 04:28 PM   #2
JohnGraham
Member
 
Registered: Oct 2009
Posts: 467

Rep: Reputation: 139Reputation: 139
It sounds like you want /proc/[pid]/mem and /proc/[pid]/maps. I've no experience using them though - `man 5 proc` might get you started.
 
1 members found this post helpful.
Old 09-14-2010, 05:50 PM   #3
cyent
Member
 
Registered: Aug 2001
Location: ChristChurch New Zealand
Distribution: Ubuntu
Posts: 398

Rep: Reputation: 87
As he said...

Quote:
Originally Posted by JohnGraham View Post
It sounds like you want /proc/[pid]/mem and /proc/[pid]/maps..

The exact layout of code in memory is the business of the OS and the ld.so, you _shouldn't_ be writing anything that depends on that exact layout as they are free to alter it as needed.

That said, they at least tell you what they have done in /proc/[pid]/maps

You can use ptrace to stop the program at some appropriate point and then run around /proc/[pid] to your hearts content.

I can post a nifty chunk of code that does that if you want.
 
1 members found this post helpful.
Old 09-16-2010, 12:44 PM   #4
addtheice
LQ Newbie
 
Registered: Sep 2010
Posts: 9

Original Poster
Rep: Reputation: 0
Talking thanks!

I should have been a bit more particular in my phrasing. I only need a map of what the program thinks it's lay out is. I don't really care about how the layout works in relation to virtual vs physical memory, nor do I care overmuch about what page is loaded vs which ones are not. I'm doing a data flow analysis and path flow analysis so I need a look at the program as it is in memory and how it's linked together when loaded. Your suggestions are exactly what I needed (tally ho to the internet I go!). I'm off to read up on this now.

I just purchased the book 'linux kernel internals' I think that will help a lot with this endeavor. I would deeply appreciate greatly any simple how to code you would like to post.
 
Old 09-16-2010, 01:56 PM   #5
Sergei Steshenko
Senior Member
 
Registered: May 2005
Posts: 4,481

Rep: Reputation: 454Reputation: 454Reputation: 454Reputation: 454Reputation: 454
Quote:
Originally Posted by addtheice View Post
I should have been a bit more particular in my phrasing. I only need a map of what the program thinks it's lay out is. I don't really care about how the layout works in relation to virtual vs physical memory, nor do I care overmuch about what page is loaded vs which ones are not. I'm doing a data flow analysis and path flow analysis so I need a look at the program as it is in memory and how it's linked together when loaded. Your suggestions are exactly what I needed (tally ho to the internet I go!). I'm off to read up on this now.

I just purchased the book 'linux kernel internals' I think that will help a lot with this endeavor. I would deeply appreciate greatly any simple how to code you would like to post.
You also probably need 'objdump' - try

man objdump
objdump -s -x `which cp` | less

- the last one is an example.
 
Old 09-16-2010, 02:45 PM   #6
addtheice
LQ Newbie
 
Registered: Sep 2010
Posts: 9

Original Poster
Rep: Reputation: 0
Wink

Quote:
Originally Posted by Sergei Steshenko View Post
You also probably need 'objdump' - try

man objdump
objdump -s -x `which cp` | less

- the last one is an example.
Thanks. That was next on my list of things to figure out. I also was looking at addr2line for the debug information. Combining this with your suggestion should give me almost all the information I could need. Anything else will probably be very particular to my problem and will need to be invented as I go (what do you mean I have to do work! really? ugg!)
 
Old 09-16-2010, 05:15 PM   #7
cyent
Member
 
Registered: Aug 2001
Location: ChristChurch New Zealand
Distribution: Ubuntu
Posts: 398

Rep: Reputation: 87
Quote:
Originally Posted by addtheice View Post
I should have been a bit more particular in my phrasing. I only need a map of what the program thinks it's lay out is.
Actually, if I remember correctly, objdump will only take you so far as the layout of statically linked items. Once DLL's, mmap and heap is involved, the program, as sitting on the disk, has no idea where those items will be, in physical or virtual memory. Those things are resolve at run time!

And at run time, the program is given a pointer to index off. Thus the only static information in the program is "index this far off the base pointer when you are told what it is".

Last edited by cyent; 09-16-2010 at 05:16 PM.
 
Old 09-17-2010, 11:20 AM   #8
addtheice
LQ Newbie
 
Registered: Sep 2010
Posts: 9

Original Poster
Rep: Reputation: 0
ok, So I have all this. The last thing I need is the ability to load a child process with the program. I had assumed that I could just fork, ptrace, and exec it....but how do I load the program *without* running it? I don't need a running program (in fact it can't be running).

I had assumed I could do something like exec it then break point it at the start but I seem to be having an issue getting that to work. Is there any central document collection on these calls? I can find all the information I want on each of the commands I need....once I know the name of the command. But if I know what I want to do but not what the command is called...I'm basically reduced to begging for help. I would rather go and figure this stuff out myself but I can't seem to find a good repository of this information that's all cross linked nicely by subject rather then by function name
 
Old 09-17-2010, 12:52 PM   #9
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
I don't have a solution to your problem, but I think I can add a bit of perspective.

A program image in memory is normally put there by the OS, which has the privilege of manipulating protected memory. The method(s) required to do this will vary depending on the CPU architecture, available memory, etc. The OS (and I assume you are limiting this to Linux, which is already a large universe of possibilities) also creates a process, which includes entry into a process table, that is used by the scheduler to know when the process is allowed to run (among other things, I assume). It will be difficult (to divorce the function of loading and linking the binary image by the OS, from the creation of a process. I think that the best you can hope for is to mimic the behavior of the linking loader within the heap space of your own process. As I understand it, this cannot include creation of an executable code segment (same as self-modifying code).

Having said all of this, have you dismissed the use of a debugger as the principle tool for your investigation? gdb seems to posses all of the capabilities to load and control execution of a program image. Perhaps your tool could use gdb as a code base. Depending on how polished the tool needs to be, perhaps you could find a way to use gdb as a sort of back-end, and use some tools/methods like expect or screen's stuff command to control gdb in ways that produce your desired analysis.

I don't know if Tiny Programs, or the related A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux is useful, but for someone with interest in a problem such as yours, it should at least be interesting.

--- rod.
 
Old 09-17-2010, 01:28 PM   #10
addtheice
LQ Newbie
 
Registered: Sep 2010
Posts: 9

Original Poster
Rep: Reputation: 0
GDB moves in the same direction

GDB does something like this when you go to debug a program. It should be possible to load a child process then put a watch point at the _start routine. I know this because it's possible to load a program with GDB and place a watchpoint at the main function (which isn't exactly _start but whatever). So it _should_ be possible. I'm ok with suspending the program and reading it while it's suspended, I just can't have the program running or having had it modify itself anywhere at the start (all values in memory must be that of initialization).

It's possible, I'm just not sure exactly how to pull it off. I can't simply recreate how the OS does it because it would be far to easy to make a difference between my action and the OS's action and the point is to build a static analysis tool for what is 'on the metal and in the environment' live.
 
Old 09-17-2010, 04:32 PM   #11
theNbomr
LQ 5k Club
 
Registered: Aug 2005
Distribution: OpenSuse, Fedora, Redhat, Debian
Posts: 5,399
Blog Entries: 2

Rep: Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908Reputation: 908
But doesn't gdb just load the specified program, and then wait for you to instruct it to step through code, or examine data, whatever? I just tried a helloWorld program under gdb running in a screen session. I was able to stuff commands to it from another shell, and it responded as expected (set a breakpoint on main). This could be rolled into a script (thinking Perl, but whatever language you want would work). It is possible to read back from the screen session, and capture the output of gdb. Actually analyzing the result this way would be clumsy, but depending on what degree of sophistication you require, it could be do-able. I have used this method to perform primitive control of interactive console-mode applications in the past, and while clumsy, it does work. I am guessing you could also contrive a system of launching gdb as a child process with pipes connected to its standard IO, to issue commands to it, and read the results.

--- rod.
 
Old 09-17-2010, 07:00 PM   #12
addtheice
LQ Newbie
 
Registered: Sep 2010
Posts: 9

Original Poster
Rep: Reputation: 0
i could. but why? I could also just look at how GDB does it and do that as well. it's using ptrace and exec and fork to do this exact thing. thats my point. i don't need all that other stuff i just need THIS part of it. that's like asking for a coffee and someone giving you a coffee machine and ground coffee. i don't want THAT part just the coffee.
 
  


Reply

Tags
analysis, debugging, memory, static



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
converting number representation kshkid Programming 12 04-16-2008 06:40 AM
Printing the octal representation of a character. smoothdogg00 Programming 1 03-02-2006 10:44 PM
USB Device Representation in Linux BobCap Programming 1 10-12-2005 08:51 PM
Textual representation of numbers lackluster Programming 4 09-05-2004 04:02 PM
Memory Leak when using memory debugging C program on SuSE SLES8 babalina Linux - Distributions 0 10-06-2003 09:39 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 03:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration