[SOLVED] Search for the file name inside the same file

koshy · 02-28-2013, 09:25 AM

I want to go through a directory recursively and to search for name of the specific file ie.. the same file name and record the matches and non matches in another file. Is there a smart way to do it without opening each file manually and verifying? Pl. help

koshy

porphyry5 · 02-28-2013, 12:05 PM

Quote:

Originally Posted by koshy

I want to go through a directory recursively and to search for name of the specific file ie.. the same file name and record the matches and non matches in another file. Is there a smart way to do it without opening each file manually and verifying? Pl. help

koshy

Are you looking for repeated occurrences of the same filename in different subdirectories, or for the occurrence of a specific filename as part of the text content of other files?

If the latter, something like

Code:

~ $ mkdir fred
~ $ cd fred
~/fred $ touch a.txt b.txt c.txt
~/fred $ x=" 
> "
~/fred $ echo "a.txt$x"b.txt$x > d.txt
~/fred $ echo "c.txt$x"b.txt$x > e.txt
~/fred $ z=$(ls)
~/fred $ for ((i=0; i<${#z[@]}; i++)); do grep a.txt ${z[$i]}; done
d.txt:a.txt
~/fred $

If the former, something like

Code:

~/fred $ cat $(find . -type f) > ofile.txt
~/fred $ grep a.txt ofile.txt
a.txt
~/fred $

suicidaleggroll · 02-28-2013, 12:18 PM

I'm not really sure what you're asking, could you clarify?

You have a directory, buried in subdirectories in this directory you have a bunch of text files. Now are you trying to search for a SINGLE file name in the contents of all of these text files, or are you trying to find which text files contain their OWN name in the contents?

Either way, you then want to create a new file with a list of which of those text files contained the name you were looking for and which didn't? How do you want this new file formatted?

Medievalist · 02-28-2013, 12:34 PM

I'm not sure I understand the question, but I'll give it a shot.

If you want to check each file in a folder hierarchy to see if it contains its own name, do this:

find /path -type f -exec grep -q {} {} \; -fprint matched.txt -o -type f -print >unmatched.txt

The find command looks recursively at everything under the starting path you give it. If you start at the root, you might get screwed if you have loops in your filesystem (for example in /sys or /proc, or if you abuse shootsnap.sh) so be careful to choose a sane starting path.

The remaining switches and options to find are processed left to right with implicit AND operators. Each one is evaluated for success or failure sequentially.

The -type f switch fails for links, directories, devices, etc. and succeeds for regular plain-jane files.

The -exec spawns a grep in quiet mode, which is the quickest, most efficient way to look inside files for fixed patterns. Grep returns failure if the string is not found or an error occurs, otherwise it returns success.

The name of the file currently being looked at will replace each set of paired curly braces, and the slash-semicolon ends the grep command we told -exec to use.

The -fprint prints the name of the file currently being worked with, if the current status is success, into an output file. (If the output file already exists, you'll append on to it, so you probably want to delete matched.txt before you start.)

The -o stands for OR (remember how everything before this is considered to be joined by an AND?) so it succeeds if anything else has failed. This is cool because it means the grep failing to find the string is going to trigger it, but the -type f will also trigger it when you're recursing through directories or links, so we need to do the -type f again if we only want regular files.

The -print prints the name of the file currently being worked with, if the current status is success (which it will be, if it's a regular file and the pattern wasn't matched) and we redirect the output to a file using normal shell I/O redirection.

This scales extremely well, but it handles loony file names poorly, so you should read the find and grep man pages if you have lunatics naming your files.

koshy · 02-28-2013, 07:30 PM

Quote:

Originally Posted by porphyry5

Are you looking for repeated occurrences of the same filename in different subdirectories, or for the occurrence of a specific filename as part of the text content of other files?

If the latter, something like

Code:

~ $ mkdir fred
~ $ cd fred
~/fred $ touch a.txt b.txt c.txt
~/fred $ x=" 
> "
~/fred $ echo "a.txt$x"b.txt$x > d.txt
~/fred $ echo "c.txt$x"b.txt$x > e.txt
~/fred $ z=$(ls)
~/fred $ for ((i=0; i<${#z[@]}; i++)); do grep a.txt ${z[$i]}; done
d.txt:a.txt
~/fred $

If the former, something like

Code:

~/fred $ cat $(find . -type f) > ofile.txt
~/fred $ grep a.txt ofile.txt
a.txt
~/fred $

I'm sorry I forgot to mention that I wanted to check for the name of the opened file inside the file

koshy · 02-28-2013, 07:35 PM

Quote:

Originally Posted by suicidaleggroll

I'm not really sure what you're asking, could you clarify?

You have a directory, buried in subdirectories in this directory you have a bunch of text files. Now are you trying to search for a SINGLE file name in the contents of all of these text files, or are you trying to find which text files contain their OWN name in the contents?

Either way, you then want to create a new file with a list of which of those text files contained the name you were looking for and which didn't? How do you want this new file formatted?

Dear suicidaleggroll
I had forgotten to mention that I wanted to look for the name of the file being checked inside the file itself. (My files are xml files). Plain text file is OK

suicidaleggroll · 02-28-2013, 07:44 PM

Quote:

Originally Posted by koshy

Dear suicidaleggroll
I had forgotten to mention that I wanted to look for the name of the file being checked inside the file itself. (My files are xml files). Plain text file is OK

In that case you should read Medievalist's post, looks like a good solution.

sag47 · 02-28-2013, 08:52 PM

Here's a solution similar to Medievalist except using the base name of the file in the grep rather than the full path of the file.

Code:

find /path -type f | while read line;do if grep -ql "$(basename "$line")" "$line";then echo "$line";else echo "$line" > /dev/stderr;fi;done 1> matched.txt 2> unmatched.txt

grep -ql will search even faster than just grep -q because grep -l will stop searching the file upon the first match.

Here's the above one liner again in a more human readable expanded format.

Code:

find /path -type f | while read line;do
  if grep -ql "$(basename "$line")" "$line";then
    echo "$line"
  else
    echo "$line" > /dev/stderr
  fi
done 1> matched.txt 2> unmatched.txt