LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (http://www.linuxquestions.org/questions/linux-software-2/)
-   -   find a string in all ASCII files of a system (http://www.linuxquestions.org/questions/linux-software-2/find-a-string-in-all-ascii-files-of-a-system-605771/)

markraem 12-10-2007 05:49 AM

find a string in all ASCII files of a system
 
Hi,

I know that
find . -type f -exec grep -il 'string1' {} \;

will look for string1 in all regular files.
However, it will also look in binary files, which takes to much time.

I only want the command to look in ASCII / TEXT files only.

I find a lot of examples like
find . -name "*.txt* -exec grep ...

but this command restricts the search to *.txt files only, but there are can be other ASCII files who do not have the the .txt extension and can also contain the info i am looking for/


I try to include the file command, as this command tells me the type of a file, but I am struggling to have this command integrated into the find statement.

can anybody help me ?


the idea is lauch the entire query from / so that I can look for parameters in configfiles without noing what configfiles are used. This helps exploring a linux distro a lot.

pixellany 12-10-2007 06:38 AM

You can integrate command#1 into command#2 if #1 produces exactly what #2 is looking for. In this case, I think you will need to actually write a small script.

BUT: config files are typically only in certain directories--eg /etc, $HOME, and few others. Thus is seems more efficient to simply restrict the search to those directories

jschiwal 12-10-2007 06:49 AM

I agree that you need a better idea where to look.

You could us the "file" command. It will look only in the beginning of a file. So you would pipe the output of "find" to "file" and the use sed to filter out the added info from each line found. Then you can use xargs to process that list of files. You will probably want to use the -print0 in the find command and -0 in the xargs command. Also because the list will be so long, use one of the "xarg" arguments to limit how much is processes at once.

Besides having an idea where to look, there are some directories where you don't want to look such as /sys, /proc, /mnt/.

matthewg42 12-10-2007 07:15 AM

A few points:
  1. Linux does not enforce file name extensions as file-type identifiers. Searching for files ending in .txt will probably find you a list of some text files.
  2. using find's -exec option works fine, but it is very inefficient for large numbers of files. This is because the specified command is invoked once per found file. If you have thousands of files, that means thousands of invocations of grep or whatever else you are running.
    Since grep can search through many files at once if they are listed one after the other on the command line, a better approach is to use xargs. xargs read a command and list of strings, and executes command for groups of the listed strings.
    Consider this example:
    Code:

    $ seq 1 10 |xargs -n 3 echo
    1 2 3
    4 5 6
    7 8 9
    10

    seq just prints number 1 to 10. You can see that xargs is grouping them into threes and appending hem a arguments to echo. You can do the same thing with find:
    Code:

    find / -type f |xargs grep -l "string1"
    However, there is a potential problem... If a file name contains a space, grep will treat it as two separate file names (because the space character is an argument de-limiter). You can protect against this by asking find to delimit its output using an ASCII NUL by using the -print0 option to find and you can tell xargs to expect this using the -0 option. It makes the command a little longer by a lot more robust:
    Code:

    find / -type f -print0 |xargs -0 grep -l "string1"
  3. Linux's filesystem is arranged in a way which groups files by type. This provides a mechanism for avoiding large binary files which you don't want to search. For example, all user files should be in /home somewhere. This is where all your documents and photos and so on should reside. So if you want to search only your work, and not through the files which are installed as part of software packages, you can start your search here:
    Code:

    find /home -type f -print0 |xargs -0 grep -l "string1"
    This will save a lot of time. You might also want to avoid anything inside any directory named bin. There are several ways to do this. You can use grep to remove any paths with /bin/ in them for example which will stop the search from bothering with any user-installed programs (which are often placed in $HOME/bin). Remember we're using the NUL delimited output from find. grep takes then option -z to understand this:
    Code:

    find /home -type f -print0 |grep -z -v /bin/ | xargs -0 grep -l "string1"

markraem 12-12-2007 03:41 AM

Thank you Matthewg42.

Your solution is indeed a quick and was the one I am looking for.

colucix 12-12-2007 04:08 AM

You may also consider the option -I of grep, which is equivalent to the long option
--binary-files=without-match. This will process only the first bytes of a file, just to assume it is a binary and then ignore it.


All times are GMT -5. The time now is 08:53 AM.