find: argument list too long
Hi all,
I've seen similar questions regarding argument list too long, but nothing exactly like my problem. My starting point is a large list of 'interesting' files, say in INTERESTINGFILES.txt. I need to search through this list a bunch of times for stuff like specific permissions, or certain extensions, and other stuff. On most systems it isn't a problem to do: Code:
INTFILES=`cat INTERESTINGFILES.txt` Code:
for item in $INTFILES |
What does an entry in the interestingfiles.txt file look like? And where are you getting them from? Because, as I see it, in an architectual way, whatever is making the interestingfiles.txt should be using a script/code that checks each file as it is being added.
|
So I think someone wanted to save time later when looking for certain files by only doing one global find. interestingfiles.txt is a big global find that looks like:
find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob..... >>INTERESTINGFILES.txt So instead of later doing a global find for each type of item, he can just search for it in interestingfiles.txt, which is (usually) a much smaller subset and thus faster. It's possible I could edit this original find command, but could you suggest a way that this original global find could be modified to sort files as they are found on the fly? |
So if that is the command you are using to create the file, then every file that is in it will fit the find description in your original question:
Code:
-perm -4000 Thought of another way,.. What is the difference between this: Code:
find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob >> INTERESTINGFILES.txt Code:
find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob >> INTERESTINGFILES.txt |
The find command has -o's, not -a's, so the files in INTERESTINGFILES.txt will match one of the requirements of the find, but not all of them. A line from INTERESTINGFILES.txt will just have a file name, the output of the global find command, matching one of the many requirements.
I now need to process this list of files and determine which of them have perm 4000, or are owned by a specific user, etc.. and put them in their own variables. The best way I know to do that is with another find command, this time searching through my smaller list instead of the whole file system, e.g.: PERMS=find $INTFILES -perm -4000 BOB=find $INTFILES -user bob but sometimes $INTFILES is too large, and I get "argument list too long" with the find. In other words, from the start someone was trying to save time by pre-gathering all the interesting files we would need to examine later. If he hadn't done this, we would just be doing: PERMS=find / -perm -4000 BOB=find / -user bob EXT=find / -name "*.txt*" but this is a wasteful number of global finds and on this hardware, takes a very long time for each of those to complete. |
Very likely this would be faster using Perl or Python.
The parameter list to a process is limited to 131072 bytes (/usr/include/linux/limits.h). So depending on the file name lengths, that could be as few as 100, or as many as around 10,000. The reason Perl or Python would be faster is that there are no internal limits to the size of the array (and no reason to use an array anyway). A simple loop in either language can do a stat of the file, and check whatever else you want - without the fork/exec overhead of doing the equivalent using find. |
Agreed. IMO This would be MUCH MUCH better as a python script, php script, or a C program (if you swing that way) than a shell script.
|
I was wondering if we could back up just a little to the original information provided.
1. Error being received :- find: argument list too long 2. Information from OP :- I've seen similar questions regarding argument list too long, but nothing exactly like my problem. 3. Further information provided :- on some systems where INTERESTINGFILES.txt is very large Now I am curious, exactly what searching did you do that didn't cover passing a very large amount of data to find (essentially the exact error message) to get said error message?? I can clearly see the reason for asking for an alternative to the loop you are using, which by the way should not be a for loop unless you can guarantee no whitespace in any path / file name, but the original premise seems highly unlikely. |
All times are GMT -5. The time now is 08:17 PM. |