LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   How long can my command-line be? (https://www.linuxquestions.org/questions/linux-general-1/how-long-can-my-command-line-be-770451/)

Angus 11-20-2009 09:39 AM

How long can my command-line be?
 
I was writing a bash script that builds a very long list of file names that are passed to a program. Unfortunately, I was getting a lot of "Argument list too long". The last time I ran ./configure I was told that I could have command-line 2^31 long. Well, I would often get this message when I had only 1800 parameters... sometimes. Other times I could fit 2000. I came to the conclusion that the sheer number of characters on the command-line was too long. So I thought that maybe if every 100 file names I put a backslash and new line that that would work. It didn't.
So if a command-line can only have a certain number of characters, how do I find out what this number is? Can I pluck this out of an environment variable? Or is there another reason I'd be getting this?
It occurs to me that the call I was making was to tar (before I discovered --files-from) and the message was:
Quote:

/bin/tar: Argument list too long
suggesting that the message came from tar and not from bash. Is it possible that this is only a tar problem, and not a general bash problem?

catkin 11-20-2009 10:27 AM

It's a kernel limit, named ARG_MAX and defined in limits.h. You can query it with getconf
Code:

c:~$ getconf ARG_MAX
2097152

EDIT: The message appears to come from tar but probably comes from an exec* system call run by the shell to run the tar command which is returned by the kernel with an error number indicating arg list too long.

bartonski 11-21-2009 05:46 PM

Note that this buffer containing the argument list is also shared with all of your shell variables... if you have a lot of variables set, or if one variable is very large, you may run in to 'argument list too long' relatively quickly. I learned this the hard way: I have a number of shell functions set up which load data in to shell variables, and one of them has a tendency to write a lot of data to one of my variables in certain situations (this is an ... *ahem* undocumented feature). This left no space in the argument list. I had to run 'set | less' to find the offending variables, then clear them.

The answer to this problem is to use xargs:

in a directory which contains 9000 files, use

Code:

find . -maxdepth 1 -type -f | xargs gzip
rather than

Code:

gzip *
If you are doing something complicated on the command line, you may want to use a for loop instead of xargs.

Reading the man pages of xargs, and truely grokking the contents is one of those things that will make you understand linux at a deeper level.

catkin 11-21-2009 10:47 PM

Quote:

Originally Posted by bartonski (Post 3765260)
Note that this buffer containing the argument list is also shared with all of your shell variables... if you have a lot of variables set, or if one variable is very large, you may run in to 'argument list too long' relatively quickly.

That is amazing, a bizarre design decision. Do you have a script to demonstrate it?

bartonski 11-21-2009 11:49 PM

Quote:

Originally Posted by catkin (Post 3765378)
That is amazing, a bizarre design decision. Do you have a script to demonstrate it?

I tried

Code:

$ foo=$(cat /dev/urandom | cut -c -2097152 ); ls *
For some reason, this caused the shell to hang.

This worked:

Code:

$ foo=$(cat /dev/urandom | strings | head -40980); ls *
bash: /bin/ls: Argument list too long

this was done from my $HOME, not especially large:

Code:

$ ls * | wc -l
182


catkin 11-22-2009 12:50 AM

Thanks bartonski :)

Tried it but ...
Code:

c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin
[ls output snipped]
c:~$ echo ${#foo}
229678
c:~$ bash --version
GNU bash, version 3.1.17(2)-release (i486-slackware-linux-gnu)
Copyright (C) 2005 Free Software Foundation, Inc.


bartonski 11-22-2009 09:26 AM

Quote:

Originally Posted by catkin (Post 3765421)
Code:

c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin
[ls output snipped]
c:~$ echo ${#foo}
229678


The reason that you're not getting the error here is that you're not calling 'ls' with any arguments. If you had called 'ls *' or even 'ls foo.txt', this would have failed.

catkin 11-22-2009 10:38 AM

Quote:

Originally Posted by bartonski (Post 3765754)
The reason that you're not getting the error here is that you're not calling 'ls' with any arguments. If you had called 'ls *' or even 'ls foo.txt', this would have failed.

Sorry -- bad copy-and-paste. Initially I used ls * and it worked as shown so I changed to ls /usr/bin/* and it still worked but the * got lost in editing. Just tried again and here is is without any manual editing.
Code:

c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin/*
[ls output snipped]
c:~$ echo ${#foo}
229575
c:~$ ls /usr/bin/* | wc -w
5078


bartonski 11-22-2009 02:04 PM

Quote:

Originally Posted by catkin (Post 3765811)
Code:

c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin/*
[ls output snipped]
c:~$ echo ${#foo}
229575
c:~$ ls /usr/bin/* | wc -w
5078


but earlier you said

Code:

c:~$ getconf ARG_MAX
2097152

2097152 is about an order of magnitude larger than 229575. Looking at my code again:

Code:

$ foo=$(cat /dev/urandom | strings | head -40980); ls *
I'm not quite sure how I managed to get $foo over 2 meg in size. This means that the average string length would have been over 50 characters long. I guess that I just got lucky when I was filling $foo.

The way that I actually filled $foo was to run foo=$(cat /dev/urandom | strings | head $x); ls *
I started with $x=20, and manually doubled $x until I got this to fail. I posted the first value of $x which failed for me.. I'm quite certain that there's a better way to fill $foo, but I was being lazy.

Ok, I figured it out: this only happens if foo is exported.

here's the code I ran:

Code:

$ foo="xxx"; while [ ${#foo} -lt 4194305 ]; do echo -n  "${#foo}: "; export foo="$foo$foo"; ls * | wc -l; done
3: 125
6: 125
12: 125
24: 125
48: 125
96: 125
192: 125
384: 125
768: 125
1536: 125
3072: 125
6144: 125
12288: 125
24576: 125
49152: 125
98304: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
196608: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
393216: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
786432: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
1572864: bash: fork: Cannot allocate memory

I initialized foo to "xxx" because 2097152 is an even power of 2, and I wanted to make sure that the value of ${#foo} was above or below that.

actually, on my system,

Code:

$ getconf ARG_MAX
131072

which is why you see bash bailing out at ${#foo} = 98304.

I guess that I have at least 32768 bytes (32768=131072-98304) worth of stuff sitting around in the argument list buffer. That seems a little odd...

(/me opens a new shell)

Code:

$ set | wc -c
2023

Nope. Not quite sure about that one.

Code:

$ echo "$(set | wc -c) + $(ls * | wc -c)" | bc
3872

Still an order of magnitude off. Dunno.

bartonski 11-22-2009 02:40 PM

The comments in limits.h are somewhat revealing:

Code:

#define ARG_MAX      131072    /* # bytes of args + environ for exec() */
I wonder how you find out what the size of the exec() environment is, when executing a bash script. It obviously contains exported shell variables, but it must contain more than that.

catkin 11-23-2009 02:59 AM

Quote:

Originally Posted by bartonski (Post 3766000)
The comments in limits.h are somewhat revealing:

Code:

#define ARG_MAX      131072    /* # bytes of args + environ for exec() */
I wonder how you find out what the size of the exec() environment is, when executing a bash script. It obviously contains exported shell variables, but it must contain more than that.

The environ (3) man page confirms what you found in limits.h "The number of bytes available for the new process' combined argument and environment lists is {ARG_MAX}".

This is generic for *n*x processes rather than specifically for shell scripts. Netsearching did not turn up a good description but IIRC Stevens' UNIX Systems Programming described how each process has kernel-space memory and process-space memory. The kernel-space memory includes ARG_MAX space for data passed on the (v)exec* family of system calls -- executable (path) name, arguments and environmental variables. In the specific case of bash calling a bash script one of the *exec*e calls must be used or the envars would be lost.

When you write "I wonder how you find out what the size of the exec() environment is ..." it should (TM!) be a two-dimensional null-terminated array of char* pointing to null-terminated envar names and a null-terminated envar values (or equivalent). Netsearching indicated that it is implementation-dependent whether all these pointers are in the ARG_MAX space or not so the scheme I have suggested places an upper limit on the space taken out of ARG_MAX for envar storage.

Angus 11-23-2009 08:15 AM

Quote:

Originally Posted by bartonski (Post 3765260)
Note that this buffer containing the argument list is also shared with all of your shell variables... if you have a lot of variables set, or if one variable is very large, you may run in to 'argument list too long' relatively quickly.

This is why script programming really gets my goat. If someone were to shoot all the interpreters out there, then I could just confine myself to C++.

catkin 11-23-2009 10:34 AM

Quote:

Originally Posted by Angus (Post 3766689)
This is why script programming really gets my goat. If someone were to shoot all the interpreters out there, then I could just confine myself to C++.

And have exactly the same issue! The line you quote (Note that this buffer containing the argument list is also shared with all of your shell variables) is not exactly correct; "all of your shell variables" should read "all of your environment variables" and is not specific to shell script, rather to the *exec*e system calls which must be used by any executable to create a new process -- including language interpreters and complied C++.

bartonski 11-23-2009 10:38 PM

Quote:

Originally Posted by Angus (Post 3766689)
This is why script programming really gets my goat. If someone were to shoot all the interpreters out there, then I could just confine myself to C++.

It's really a non-issue in the shell. You should be using xargs anywhere that you might risk getting an 'argument list too long' error anyway.

i92guboj 11-23-2009 11:44 PM

Quote:

Originally Posted by bartonski (Post 3767521)
It's really a non-issue in the shell. You should be using xargs anywhere that you might risk getting an 'argument list too long' error anyway.

That's the only truth. It's a thing of shell programming, just like if you use C++ you are bound to classes, and if you do lisp you are bound to lists, and so on. :)

If there's a potential problem, you should be avoiding it in first place. Just like when you are in C you need to care about where your pointers are going, or you need to check whether your malloc succeeded, don't you? ;)

Sure we could get a shell with unlimited environment, and with dynamic memory handling, but this is besides the point. The truth is that, if someone wants bash to look like C++, then s/he should be using C++ in first place, because at some point, someone might also think of turning C++ into a bash clone mwhaha :jawa:

Shells have always been this way, there are probably billions of shell code lines around the world, and those scripts would be much bigger on most other programming languages, even the higher level ones.

Shell languages are as they are for a reason, they make a lot of assumptions to make the scripting a lot easier, and the limited environment is one of these assumptions. The shell way is that way. You either use xargs, or save the stuff to a file, or parse it on a loop.

By the way, this is neither new nor specific to Linux. I can perfectly remember the "out of environment space" errors in DOS (any version), with both command.com and 4dos.

The only shell that I remember right now that allocates environment space dynamically is the os/2 one, as far as I can remember. That doesn't mean there aren't more around that can do it...

bartonski 11-24-2009 06:56 AM

Quote:

Originally Posted by i92guboj (Post 3767548)
The only shell that I remember right now that allocates environment space dynamically is the os/2 one, as far as I can remember. That doesn't mean there aren't more around that can do it...

The Hurd? (/me ducks)

bartonski 11-24-2009 07:22 AM

Quote:

Originally Posted by i92guboj (Post 3767548)
Shell languages are as they are for a reason, they make a lot of assumptions to make the scripting a lot easier, and the limited environment is one of these assumptions.

on a more serious note, this is almost undoubtedly the right design decision. One of the fundamental principles of Unix is that spawning processes is computationally inexpensive.

Eric Raymond puts it best in The Art of Unix Programming

Quote:

Cooperating Processes


In the Unix experience, inexpensive process-spawning and easy inter-process communication (IPC) makes a whole ecology of small tools, pipes, and filters possible. We'll explore this ecology in Chapter 7; here, we need to point out some consequences of expensive process-spawning and IPC.
The pipe was technically trivial, but profound in its effect. However, it would not have been trivial without the fundamental unifying notion of the process as an autonomous unit of computation, with process control being programmable. As in Multics, a shell was just another process; process control did not come from God inscribed in JCL.
-- Doug McIlroy
If an operating system makes spawning new processes expensive and/or process control is difficult and inflexible, you'll usually see all of the following consequences:
  • Monster monoliths become a more natural way of programming.
  • Lots of policy has to be expressed within those monoliths. This encourages C++ and elaborately layered internal code organization, rather than C and relatively flat internal hierarchies.
  • When processes can't avoid a need to communicate, they do so through mechanisms that are either clumsy, inefficient, and insecure (such as temporary files) or by knowing far too much about each others' implementations.
  • Multithreading is extensively used for tasks that Unix would handle with multiple communicating lightweight processes.
  • Learning and using asynchronous I/O is a must.

These are examples of common stylistic traits (even in applications programming) being driven by a limitation in the OS environment.
A subtle but important property of pipes and the other classic Unix IPC methods is that they require communication between programs to be held down to a level of simplicity that encourages separation of function. Conversely, the result of having no equivalent of the pipe is that programs can only be designed to cooperate by building in full knowledge of each others' internals.

Angus 11-24-2009 08:31 AM

Quote:

Originally Posted by catkin (Post 3766840)
And have exactly the same issue!

It's been a long time since I have. Usually when I call a function wrong, the compiler lets me know right away. I used to have behavior I didn't expect when improperly using the *printf() family, but gcc has features that make that much safer.


All times are GMT -5. The time now is 01:56 AM.