LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Passing command line arguments (http://www.linuxquestions.org/questions/programming-9/passing-command-line-arguments-828502/)

jason_m 08-25-2010 08:47 PM

Passing command line arguments
 
I'm looking for some confirmation / clarification passing arguments to a program.

The code I'm going to post below is assembly, but I think my analysis holds generally for any language. I'm running linux (ubuntu 10.04), my shell is bash, and my processor is an Intel Core i3 (64-bit)

What I set out to determine is whether I can pass integer (as opposed to "C String") arguments to my program. Generally, command line args are treated as null-terminated strings. The prototype for main in C/C++ illustrates this:
Code:

int main(int argc, char* argv[])
At the end of the day though, whether you consider the value in a word/double/quad or whatever to be character data, or something else like integer data is only a matter of interpretation. 0x48692121 is both a valid character array of length 4 ("Hi!!") as well as a 32-bit integer.

To test whether I could do this or not, I wrote the following little program:
Code:

        .text
        .global _start

_start:
        pop        %rcx  #argc
        pop        %rcx  #address of argv[0]
        mov        (%rcx), %rdi #store value of argv[0] in rdi
       
        pop        %rcx  #address of argv[1]
        mov        (%rcx), %rdi #store value of argv[1] in rdi

        pop        %rcx  #address of argv[2]
        mov        (%rcx), %rdi #store value of argv[3] in rdi
...

My understanding is that the stack pointer will be pointing at the argument count when the program beings execution. Directly "on top" of that (8 bytes toward higher memory addresses) will be the address of the first command line/passed-in parameter. "On top" of that will be the address of the second command line parameter, etc., until all of the arguments are accounted for.

So I fired this program up in gdb and gave it the following input
Code:

(gdb) run $'\x00\x00\x00\x42' $'\x01\x02\x03\x00\x04'
I will explain the results I am seeing as well as the arguments that I chose. The first argument turns out to be the full path to the executable file. This is implicitly passed to the program and I was expecting this to be the case.

The next argument demonstrates a difficulty I am having passing in non-character data. It appears that the environment is seeing a null and not looking at the rest of the argument. That is, I don't think the 0x42 ever makes it onto the stack because it is preceded by 0x00, which is interpreted as a null terminator, ending the "string" argument.

Similarly, the last argument is a test to see if the 0x00 prevents the 0x04 from ever making it onto the stack.

So here's what gdb tells me:
Code:

Breakpoint 1, _start () at better_args.s:9
(gdb) n
_start () at better_args.s:10
(gdb) n
(gdb) p (char*)$rcx
$18 = 0x7fffffffeb2e "/home/jason/Development/asm/better_args"
(gdb) p
$19 = 0x7fffffffeb2e "/home/jason/Development/asm/better_args"
(gdb) n
_start () at better_args.s:13
(gdb) p/x {long}$rcx
$20 = 0x42524f0003020100
(gdb) p/x {long}($rcx+2)
$21 = 0x544942524f000302
(gdb) n
(gdb) n
_start () at better_args.s:16
(gdb) p/x {long}$rcx
$22 = 0x4942524f00030201
(gdb)

So, let's look one at a time:
Code:

(gdb) p (char*)$rcx
$18 = 0x7fffffffeb2e "/home/jason/Development/asm/better_args"

Here we can confirm that argv[0] is the full path and filename.

Code:

(gdb) p/x {long}$rcx
$20 = 0x42524f0003020100

This should be pointing to the second "string" argument. Curiously enough, the left most byte is 0x42. However, my testing tells me this is purely by chance - I normally do not see this. For further evidence that this is just junk memory values, let's see if there is a null terminator after this "character".
Code:

(gdb) p/x {long}($rcx+2)
$21 = 0x544942524f000302

Nope, 0x49 comes next. I feel comfortable saying that *my* 0x42 never made it onto the stack. My question is: is this due to the way that bash parses the command line arguments? Could something else deliver my argument to the program? Perhaps if I set up argv[] myself in C and ran the program using execve()? Testing that is on my TODO.

Finally, if you didn't notice it in the previous output, my last argument is there, and the 0x04 didn't make it.
Code:

(gdb) p/x {long}$rcx
$22 = 0x4942524f00030201

One take away for me is that the string is actually "backwards" in memory. Or at least that is how it seems to me, but maybe my notion of "forwards" and "backwards" needs work. But that's no big deal.

So that was a little long winded, but I wanted to put the facts out there as well as everything I have looked into and my understanding of the results I'm observing. Maybe this is just re-stating the obvious, but I didn't know what the results were going to be, so it wasn't obvious to me.

Where this is going is eventually another program is going to load up a little compiled binary that a parser of mine spits out. The second program basically just applies a (parsed and compiled) user-input algorithm to some data, and then returns the result back to the first program when finished. I'm thinking of using write() with the write end of a pipe to pass the result back.

I'd like to get to the point where I could write this little framework test: First program obtains a pipe, fork()s a new process, starts up the second program, passing it the write file descriptor for the pipe, and two integer arguments to add together. The second program adds the arguments together and writes the result to the pipe. Originally I started by passing string arguments and converting them to integers, which isn't all that difficult. But if possible and safe, I'd love to skip the conversion and pass the integer values directly. That is, I could pass the integer value 1, and after doing a
Code:

pop %rcx
mov (%rcx), %rsi

%rsi would contain the value 1. It is looking like I cannot do this directly. Leading bytes with all 0's, or any intermediate 0 byte will ruin the value I'm trying to pass. Is sticking with passing character data, and then parsing the integer value out of it my best course of action? What if instead of integers, it was double-precision floating point values that were being passed in? Is it still better to pass in a character representation and parse the double value out of it? I have to believe there is something better.

Any thoughts or comments on the testing I have done so far, my observations, and what I am trying to accomplish are greatly appreciated.

Thanks,

Sergei Steshenko 08-25-2010 09:58 PM

Quote:

Originally Posted by jason_m (Post 4077863)
I'm looking for some confirmation / clarification passing arguments to a program.

The code I'm going to post below is assembly, but I think my analysis holds generally for any language. I'm running linux (ubuntu 10.04), my shell is bash ...

'bash' is not black magic, it's rather a program written in "C" and using "C" standard library to start processes.

Because of this one's options are limited to what standard "C" library has WRT process launching: 'man 3 exec'.

jason_m 08-26-2010 07:59 AM

So you're suggesting it is bash breaking up the arguments (and not some "higher power")? I can understand that, I just wasn't sure ahead of time if that was going to be the case or not. I figured bash still has to parse the entire string I give it, in this case to the final single quote, so maybe it would just toss those bytes in a buffer, slap a 0x00 at the ends and send the arguments off to the program.

Are you suggesting that I could accomplish this by setting up char* argv[] myself? If I have time, I'm going try and put a program together tonight to test that out.

Maybe a more general question then is: what is the best way to pass binary data to a program? This program is going to need to know two things: (1) the file descriptor to write() back its result, and (2) where to find its input(s). Passing integer to the program was just a way for me to learn more about passing binary data. I think at the end of the day, I'll pass the address to the start of a table in memory with all of the formula inputs. All of the inputs should be at known, fixed offsets in the table once the algorithm is parsed/compiled. Should I just parse strings with these values? Or should I continue to explore passing binary data by setting up argv[] myself? Or should I be thinking about something else entirely?

Sergei Steshenko 08-26-2010 08:05 AM

Quote:

Originally Posted by jason_m (Post 4078393)
So you're suggesting it is bash breaking up the arguments (and not some "higher power")? I can understand that, I just wasn't sure ahead of time if that was going to be the case or not. I figured bash still has to parse the entire string I give it, in this case to the final single quote, so maybe it would just toss those bytes in a buffer, slap a 0x00 at the ends and send the arguments off to the program.

Are you suggesting that I could accomplish this by setting up char* argv[] myself? If I have time, I'm going try and put a program together tonight to test that out.

Maybe a more general question then is: what is the best way to pass binary data to a program? This program is going to need to know two things: (1) the file descriptor to write() back its result, and (2) where to find its input(s). Passing integer to the program was just a way for me to learn more about passing binary data. I think at the end of the day, I'll pass the address to the start of a table in memory with all of the formula inputs. All of the inputs should be at known, fixed offsets in the table once the algorithm is parsed/compiled. Should I just parse strings with these values? Or should I continue to explore passing binary data by setting up argv[] myself? Or should I be thinking about something else entirely?

You are limited to what exec* functions do. Your freedom is limited by the following quote:

Code:

The const char *arg and subsequent ellipses in the execl(), execlp(), and execle() functions can be thought of  as  arg0,  arg1,  ...,  argn.  Together  they
      describe a list of one or more pointers to null-terminated strings that represent the argument list available to the executed program.  The first argument, by
      convention, should point to the filename associated with the file being executed.  The list of arguments must be terminated by  a  NULL  pointer,  and,  since
      these are variadic functions, this pointer must be cast (char *) NULL.

.

jason_m 08-26-2010 10:36 PM

Below is an example using a call to execve() that accomplishes what I wanted to test.

write_test2.c:
Code:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>

int main(int argc, char* argv[]) {
  pid_t identity;
  int* aptr;
  int* bptr;
  int pfd[2];

  aptr = malloc(sizeof(int));
  bptr = malloc(sizeof(int));

  *aptr = 40;
  *bptr = 2;

  pipe(pfd);
  identity = fork();

  if (identity == 0) {
    // Child process
    // Setup argv[]
    char* arrrgs[4]; // Pirates?
    arrrgs[0]  = (char*)(&pfd[1]);
    arrrgs[1] = (char*)aptr;
    arrrgs[2] = (char*)bptr;
    arrrgs[3] = (char*)0;

    execve("write_test2_child", arrrgs, (char *)0);
  } else {
    int buf;
    read(pfd[0], &buf, sizeof(int));
    printf("Parent received value: %x, %d\n", buf, buf);
  }

  wait(NULL);  // Don't exit until the child is done

  printf("All done!\n");
 
  return 0;
}

write_test2_child.c
Code:


#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <string.h>

int main(int argc, char* argv[]) {
  pid_t id;
  int c;

  id = getpid();

  printf("Yo, attach a debugger to: %d\n", (int)id);
  c = getc(stdin);

  printf("argc: %d\n", argc);

  int fd = *argv[0];
  int a = *argv[1];
  int b = *argv[2];
  printf("fd: %x, %d\n", fd, fd);
  printf("a: %x, %d\n", a, a);
  printf("b: %x, %d\n", b, b);

  int rslt = a + b;
  write(fd, &rslt, sizeof(int));

  return 0;
}

Running:
Code:


jason@c0mpy:~/Development/asm$ ./write_test2
Yo, attach a debugger to: 2481
a
argc: 3
fd: 4, 4
a: 28, 40
b: 2, 2
Parent received value: 2a, 42
All done!
jason@c0mpy:~/Development/asm$

Note that when manually setting up argv[], the library correctly computes argc, but it does not enforce that argv[0] is the filename.

It is nice to know I can accomplish passing some binary data if necessary.


All times are GMT -5. The time now is 10:15 AM.