LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   find: argument list too long (https://www.linuxquestions.org/questions/linux-newbie-8/find-argument-list-too-long-4175555540/)

jeriryan 10-07-2015 10:21 AM

find: argument list too long
 
Hi all,
I've seen similar questions regarding argument list too long, but nothing exactly like my problem. My starting point is a large list of 'interesting' files, say in INTERESTINGFILES.txt. I need to search through this list a bunch of times for stuff like specific permissions, or certain extensions, and other stuff. On most systems it isn't a problem to do:

Code:

INTFILES=`cat INTERESTINGFILES.txt`
PERMS=find $INTFILES -perm -4000
EXT=find $INTFILES -name "*.ext"

but on some systems where INTERESTINGFILES.txt is very large, the find commands fail because $INTFILES is too long, "argument list too long". My initial fix for this is to wrap it in a loop that runs through INTERESTINGFILES.txt and does a find on each file"

Code:

for item in $INTFILES
do
find $item -perm -4000 >> PERMS
done

and just work with the output of the PERMS file, but it's much slower using the loop than just doing it all in the find command (when it works). Another way I suppose would be to take INTERESTINGFILES.txt, do an ls on each file, stick that list in a new file, and do a grep whenever I need something, but that is definitely not as versatile as the find command (at least as far as checking permissions). Any suggestions other than the slow loop wrapper?

szboardstretcher 10-07-2015 10:32 AM

What does an entry in the interestingfiles.txt file look like? And where are you getting them from? Because, as I see it, in an architectual way, whatever is making the interestingfiles.txt should be using a script/code that checks each file as it is being added.

jeriryan 10-07-2015 10:41 AM

So I think someone wanted to save time later when looking for certain files by only doing one global find. interestingfiles.txt is a big global find that looks like:

find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob..... >>INTERESTINGFILES.txt

So instead of later doing a global find for each type of item, he can just search for it in interestingfiles.txt, which is (usually) a much smaller subset and thus faster.

It's possible I could edit this original find command, but could you suggest a way that this original global find could be modified to sort files as they are found on the fly?

szboardstretcher 10-07-2015 10:46 AM

So if that is the command you are using to create the file, then every file that is in it will fit the find description in your original question:

Code:

-perm -4000
-name "*.txt"

Because "find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob..... >>INTERESTINGFILES.txt" will only add files with permission 4000 and extension .txt to the INTERESTINGFILES.txt list.

Thought of another way,.. What is the difference between this:

Code:

find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob >> INTERESTINGFILES.txt
INTFILES=`cat INTERESTINGFILES.txt`
PERMS=find $INTFILES -perm -4000
EXT=find $INTFILES -name "*.txt"

and this?

Code:

find / -perm -4000 -o -name "*.txt" -o -group abc -o -user bob >> INTERESTINGFILES.txt
INTFILES=`cat INTERESTINGFILES.txt`
PERMS=`cat INTERESTINGFILES.txt`
EXT=`cat INTERESTINGFILES.txt`

I would hope I am misunderstanding your question. If I am,. please provide a fake line from the INTERESTINGFILES.txt file, and explain what you want to do with it.

jeriryan 10-07-2015 11:03 AM

The find command has -o's, not -a's, so the files in INTERESTINGFILES.txt will match one of the requirements of the find, but not all of them. A line from INTERESTINGFILES.txt will just have a file name, the output of the global find command, matching one of the many requirements.

I now need to process this list of files and determine which of them have perm 4000, or are owned by a specific user, etc.. and put them in their own variables. The best way I know to do that is with another find command, this time searching through my smaller list instead of the whole file system, e.g.:
PERMS=find $INTFILES -perm -4000
BOB=find $INTFILES -user bob
but sometimes $INTFILES is too large, and I get "argument list too long" with the find.

In other words, from the start someone was trying to save time by pre-gathering all the interesting files we would need to examine later. If he hadn't done this, we would just be doing:
PERMS=find / -perm -4000
BOB=find / -user bob
EXT=find / -name "*.txt*"
but this is a wasteful number of global finds and on this hardware, takes a very long time for each of those to complete.

jpollard 10-07-2015 11:40 AM

Very likely this would be faster using Perl or Python.

The parameter list to a process is limited to 131072 bytes (/usr/include/linux/limits.h). So depending on the file name lengths, that could be as few as 100, or as many as around 10,000.

The reason Perl or Python would be faster is that there are no internal limits to the size of the array (and no reason to use an array anyway). A simple loop in either language can do a stat of the file, and check whatever else you want - without the fork/exec overhead of doing the equivalent using find.

szboardstretcher 10-07-2015 11:53 AM

Agreed. IMO This would be MUCH MUCH better as a python script, php script, or a C program (if you swing that way) than a shell script.

grail 10-07-2015 12:50 PM

I was wondering if we could back up just a little to the original information provided.

1. Error being received :- find: argument list too long

2. Information from OP :- I've seen similar questions regarding argument list too long, but nothing exactly like my problem.

3. Further information provided :- on some systems where INTERESTINGFILES.txt is very large


Now I am curious, exactly what searching did you do that didn't cover passing a very large amount of data to find (essentially the exact error message) to get said error message??

I can clearly see the reason for asking for an alternative to the loop you are using, which by the way should not be a for loop unless you can guarantee no whitespace in any path / file name,
but the original premise seems highly unlikely.


All times are GMT -5. The time now is 08:17 PM.