Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I was writing a bash script that builds a very long list of file names that are passed to a program. Unfortunately, I was getting a lot of "Argument list too long". The last time I ran ./configure I was told that I could have command-line 2^31 long. Well, I would often get this message when I had only 1800 parameters... sometimes. Other times I could fit 2000. I came to the conclusion that the sheer number of characters on the command-line was too long. So I thought that maybe if every 100 file names I put a backslash and new line that that would work. It didn't.
So if a command-line can only have a certain number of characters, how do I find out what this number is? Can I pluck this out of an environment variable? Or is there another reason I'd be getting this?
It occurs to me that the call I was making was to tar (before I discovered --files-from) and the message was:
Quote:
/bin/tar: Argument list too long
suggesting that the message came from tar and not from bash. Is it possible that this is only a tar problem, and not a general bash problem?
It's a kernel limit, named ARG_MAX and defined in limits.h. You can query it with getconf
Code:
c:~$ getconf ARG_MAX
2097152
EDIT: The message appears to come from tar but probably comes from an exec* system call run by the shell to run the tar command which is returned by the kernel with an error number indicating arg list too long.
Note that this buffer containing the argument list is also shared with all of your shell variables... if you have a lot of variables set, or if one variable is very large, you may run in to 'argument list too long' relatively quickly. I learned this the hard way: I have a number of shell functions set up which load data in to shell variables, and one of them has a tendency to write a lot of data to one of my variables in certain situations (this is an ... *ahem* undocumented feature). This left no space in the argument list. I had to run 'set | less' to find the offending variables, then clear them.
The answer to this problem is to use xargs:
in a directory which contains 9000 files, use
Code:
find . -maxdepth 1 -type -f | xargs gzip
rather than
Code:
gzip *
If you are doing something complicated on the command line, you may want to use a for loop instead of xargs.
Reading the man pages of xargs, and truely grokking the contents is one of those things that will make you understand linux at a deeper level.
Note that this buffer containing the argument list is also shared with all of your shell variables... if you have a lot of variables set, or if one variable is very large, you may run in to 'argument list too long' relatively quickly.
That is amazing, a bizarre design decision. Do you have a script to demonstrate it?
c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin
[ls output snipped]
c:~$ echo ${#foo}
229678
The reason that you're not getting the error here is that you're not calling 'ls' with any arguments. If you had called 'ls *' or even 'ls foo.txt', this would have failed.
The reason that you're not getting the error here is that you're not calling 'ls' with any arguments. If you had called 'ls *' or even 'ls foo.txt', this would have failed.
Sorry -- bad copy-and-paste. Initially I used ls * and it worked as shown so I changed to ls /usr/bin/* and it still worked but the * got lost in editing. Just tried again and here is is without any manual editing.
Code:
c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin/*
[ls output snipped]
c:~$ echo ${#foo}
229575
c:~$ ls /usr/bin/* | wc -w
5078
c:~$ foo=$(cat /dev/urandom | strings | head -40980); ls /usr/bin/*
[ls output snipped]
c:~$ echo ${#foo}
229575
c:~$ ls /usr/bin/* | wc -w
5078
but earlier you said
Code:
c:~$ getconf ARG_MAX
2097152
2097152 is about an order of magnitude larger than 229575. Looking at my code again:
Code:
$ foo=$(cat /dev/urandom | strings | head -40980); ls *
I'm not quite sure how I managed to get $foo over 2 meg in size. This means that the average string length would have been over 50 characters long. I guess that I just got lucky when I was filling $foo.
The way that I actually filled $foo was to run foo=$(cat /dev/urandom | strings | head $x); ls *
I started with $x=20, and manually doubled $x until I got this to fail. I posted the first value of $x which failed for me.. I'm quite certain that there's a better way to fill $foo, but I was being lazy.
Ok, I figured it out: this only happens if foo is exported.
here's the code I ran:
Code:
$ foo="xxx"; while [ ${#foo} -lt 4194305 ]; do echo -n "${#foo}: "; export foo="$foo$foo"; ls * | wc -l; done
3: 125
6: 125
12: 125
24: 125
48: 125
96: 125
192: 125
384: 125
768: 125
1536: 125
3072: 125
6144: 125
12288: 125
24576: 125
49152: 125
98304: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
196608: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
393216: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
786432: bash: /usr/bin/wc: Argument list too long
bash: /bin/ls: Argument list too long
1572864: bash: fork: Cannot allocate memory
I initialized foo to "xxx" because 2097152 is an even power of 2, and I wanted to make sure that the value of ${#foo} was above or below that.
actually, on my system,
Code:
$ getconf ARG_MAX
131072
which is why you see bash bailing out at ${#foo} = 98304.
I guess that I have at least 32768 bytes (32768=131072-98304) worth of stuff sitting around in the argument list buffer. That seems a little odd...
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
I wonder how you find out what the size of the exec() environment is, when executing a bash script. It obviously contains exported shell variables, but it must contain more than that.
#define ARG_MAX 131072 /* # bytes of args + environ for exec() */
I wonder how you find out what the size of the exec() environment is, when executing a bash script. It obviously contains exported shell variables, but it must contain more than that.
The environ (3) man page confirms what you found in limits.h "The number of bytes available for the new process' combined argument and environment lists is {ARG_MAX}".
This is generic for *n*x processes rather than specifically for shell scripts. Netsearching did not turn up a good description but IIRC Stevens' UNIX Systems Programming described how each process has kernel-space memory and process-space memory. The kernel-space memory includes ARG_MAX space for data passed on the (v)exec* family of system calls -- executable (path) name, arguments and environmental variables. In the specific case of bash calling a bash script one of the *exec*e calls must be used or the envars would be lost.
When you write "I wonder how you find out what the size of the exec() environment is ..." it should (TM!) be a two-dimensional null-terminated array of char* pointing to null-terminated envar names and a null-terminated envar values (or equivalent). Netsearching indicated that it is implementation-dependent whether all these pointers are in the ARG_MAX space or not so the scheme I have suggested places an upper limit on the space taken out of ARG_MAX for envar storage.
Note that this buffer containing the argument list is also shared with all of your shell variables... if you have a lot of variables set, or if one variable is very large, you may run in to 'argument list too long' relatively quickly.
This is why script programming really gets my goat. If someone were to shoot all the interpreters out there, then I could just confine myself to C++.
This is why script programming really gets my goat. If someone were to shoot all the interpreters out there, then I could just confine myself to C++.
And have exactly the same issue! The line you quote (Note that this buffer containing the argument list is also shared with all of your shell variables) is not exactly correct; "all of your shell variables" should read "all of your environment variables" and is not specific to shell script, rather to the *exec*e system calls which must be used by any executable to create a new process -- including language interpreters and complied C++.
It's really a non-issue in the shell. You should be using xargs anywhere that you might risk getting an 'argument list too long' error anyway.
That's the only truth. It's a thing of shell programming, just like if you use C++ you are bound to classes, and if you do lisp you are bound to lists, and so on.
If there's a potential problem, you should be avoiding it in first place. Just like when you are in C you need to care about where your pointers are going, or you need to check whether your malloc succeeded, don't you?
Sure we could get a shell with unlimited environment, and with dynamic memory handling, but this is besides the point. The truth is that, if someone wants bash to look like C++, then s/he should be using C++ in first place, because at some point, someone might also think of turning C++ into a bash clone mwhaha
Shells have always been this way, there are probably billions of shell code lines around the world, and those scripts would be much bigger on most other programming languages, even the higher level ones.
Shell languages are as they are for a reason, they make a lot of assumptions to make the scripting a lot easier, and the limited environment is one of these assumptions. The shell way is that way. You either use xargs, or save the stuff to a file, or parse it on a loop.
By the way, this is neither new nor specific to Linux. I can perfectly remember the "out of environment space" errors in DOS (any version), with both command.com and 4dos.
The only shell that I remember right now that allocates environment space dynamically is the os/2 one, as far as I can remember. That doesn't mean there aren't more around that can do it...
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.