Bash scripting problem: Can't get a list of all files, including hidden ones

oxi · 02-13-2007, 11:05 PM

Hi,

I'm making a bash script to search for a word recursively using grep.

Problem is, I don't find a reliable way to get a full listing of files for the current dir.

Cases are:

Code:

for $file in `ls -a`
...

This will get me a list with all files, including hidden ones. Problem is, they are separated by spaces, and if a filename has spaces in it, I have no way to make a difference between the chunks of that filename and others.

Code:

for $file in `ls -am`
...

Nope, sorry but this one isn't good enough for me. This will get me a list of filenames separated by a comma and a space each. Then I'd make fixes to make it usable for "for". Problem is, you can create a filename called "blah, blah" this way: "$ touch blah\,\ blah". So I just can't use it. Now, if it used another character instead of a comma...

Code:

for $file in ./*
...

This gives me a list of the files, which is correctly assigned each loop for $file, that is, $file value is what it is supposed to be each loop, but I don't get the hidden files

Code:

for $file in ./.*
...

With this I get only the hidden files.

So, I was wondering if I could use any kind of regular expression or glob to get both hidden and unhidden filenames. Or, if there's way to make "./.*", then "./*" and the join them in one list. Or if I can make ls use other separator than a ", ".

I've been googling for a looooot of time and couldn't find a solution to my problem, that's why i'm asking here.

I really appreciate your help and thanks in advance pals.

Cheers

bartonski · 02-14-2007, 12:30 AM

Quote:

I'm making a bash script to search for a word recursively using grep.

Try using

Code:

find -exec grep fnord {} /dev/null \;

Find takes care of traversing the directory structure, and doesn't worry about hidden files. The '{}' in the -exec part of the expression is the current file.

Normally grep, when called with a single argument will just echo the matching line of text.

Adding the file /dev/null as the second file means that grep will echo the file name and the matching text.

Note that under gnu grep

grep -r

will also read through directories recursively.

bartonski · 02-14-2007, 12:47 AM

Quote:

So, I was wondering if I could use any kind of regular expression or glob to get both hidden and unhidden filenames. Or, if there's way to make "./.*", then "./*" and the join them in one list.

try:

Code:

for $file in ./* ./.*

colucix · 02-14-2007, 03:36 AM

Quote:

Originally Posted by oxi

Problem is, they are separated by spaces, and if a filename has spaces in it, I have no way to make a difference between the chunks of that filename and others.

If the problem is only space inside file names, you can do something like

Code:

#!/bin/bash
for file in `ls -a | sed s/\ /_/g` ; do
   if [ ! -f $file ] ; then
      file=`echo $file | sed s/_/\ /g` 
   fi
   # my commands using $file here
done

This convert space inside filename to underscore "_" and when the for loop encounter a non-existent filename, the code inside if condition reverts them back. Note: this does not rename the file, simply manage the filenames internally. By the way, I think - as told by bartonski - that the find -exec solution is better.

gnashley · 02-14-2007, 04:43 AM

IFS=","

for $file in `ls -am`

Resetting the internal file separator temporarily should let you parse more easily.

mickyg · 02-14-2007, 06:51 AM

FYI, to get a list from ls you could have used 'ls -a1'.

oxi · 02-14-2007, 03:55 PM

Wow, lots of replies. I really appreciate your effort. Thank you all

To the point:

Quote:

FYI, to get a list from ls you could have used 'ls -a1'.

This doesn't work for me. It won't handle spaces in filenames properly.

Quote:

IFS=","

for $file in `ls -am`

Resetting the internal file separator temporarily should let you parse more easily.

I'm not sure I understand this one very well. It actually worked for filenames with spaces, but it didn't for filenames with commas.

Now for the ones which did work:

colucix' solution did work, however, there's something I don't like about the renaming; I can't explain what, thou.

This one I named b1:

Code:

#!/bin/bash

#Parameter check omitted
WORD=$1

function search () # $1 = dir to search
{
        for file in `ls -a "$1" 2> /dev/null | sed s/\ /_/g 2> /dev/null`
        do
                file="$1/$file"
                if [ "$file" = "$1/." -o "$file" = "$1/.." -o "$file" = "$1/*" ]; then
                        continue
                fi

                if [ ! -e "$file" ] ; then
                        file=`echo "$file" 2> /dev/null | sed s/_/\ /g 2> /dev/null`
                fi

                if [ -d "$file" ]; then
                        search "$file"
                        continue
                fi
                if [ -f "$file" ]; then
                        grep $WORD > /dev/null 2> /dev/null < "$file"
                        if [ $? -eq 0 ]; then
                                echo "$file" 2> /dev/null
                                continue
                        fi
                        continue
                fi
        done
}

search `pwd`

This is an implementation for bartonski's second reply, which I named b0:

Code:

#!/bin/bash

#Parameter check omitted
WORD=$1

function search () # $1 = dir to search
{
        for file in $1/.* $1/*
        do
                if [ "$file" = "$1/." -o "$file" = "$1/.." -o "$file" = "$1/*" ]; then
                        continue
                fi
                if [ -d "$file" ]; then
                        search "$file"
                        continue
                fi
                if [ -f "$file" ]; then
                        grep $WORD > /dev/null 2> /dev/null < "$file"
                        if [ $? -eq 0 ]; then
                                echo "$file" 2> /dev/null
                                continue
                        fi
                        continue
                fi
        done
}

search `pwd`

As for the "find" thing, I couldn't come up with the proper syntax to filter out the garbage and get just the filenames. I tried "find -exec grep cool {} /dev/null \;" as sugested, then "find -exec grep cool /dev/null \;", and then lots of combinations. Problem is, I think "find" won't let you put more than a command after one "-exec", at least I wasn't able to escape it, and if you use 2 "-exec"s, I don't know how to redirect the command streams. IMHO it's all a little messy.

Now, to be honest, using "grep -r" seems to me the best and fastest way to accomplish this. My point was to try to use as few as possible commands(I don't mean number of lines, but rather the number of total "external-to-bash" commands), and this way, you use very few commands, indeed. I like the second reply solution thou, where ./.* and ./* are used, because its more "mechanical".

Also, b0 is slightly faster than b1. Here:

Code:

oxi@oxibox /etc $ time b0 cool
/etc/config-archive/etc/cvsd/cvsd.conf
/etc/config-archive/etc/cvsd/cvsd.conf.dist
/etc/config-archive/etc/gimp/2.0/gimprc
/etc/config-archive/etc/gimp/2.0/gimprc.dist
/etc/config-archive/etc/sensors.conf
/etc/config-archive/etc/sensors.conf.dist
/etc/cvsd/cvsd.conf
/etc/gimp/2.0/gimprc
/etc/mime.types
/etc/pcmcia/wireless
/etc/sensors.conf

real    0m21.178s
user    0m15.999s
sys     0m3.226s
oxi@oxibox /etc $ time b1 cool
/etc/config-archive/etc/cvsd/cvsd.conf
/etc/config-archive/etc/cvsd/cvsd.conf.dist
/etc/config-archive/etc/gimp/2.0/gimprc
/etc/config-archive/etc/gimp/2.0/gimprc.dist
/etc/config-archive/etc/sensors.conf
/etc/config-archive/etc/sensors.conf.dist
/etc/cvsd/cvsd.conf
/etc/gimp/2.0/gimprc
/etc/mime.types
/etc/pcmcia/wireless
/etc/sensors.conf

real    0m24.457s
user    0m17.484s
sys     0m4.936s
oxi@oxibox /etc $ time grep -r cool . 2> /dev/null | cut -d: -f1
./cvsd/cvsd.conf
./gimp/2.0/gimprc
./config-archive/etc/cvsd/cvsd.conf
./config-archive/etc/cvsd/cvsd.conf.dist
./config-archive/etc/gimp/2.0/gimprc
./config-archive/etc/gimp/2.0/gimprc.dist
./config-archive/etc/sensors.conf.dist
./config-archive/etc/sensors.conf.dist
./config-archive/etc/sensors.conf
./config-archive/etc/sensors.conf
./pcmcia/wireless
./sensors.conf
./sensors.conf
./mime.types

real    0m14.714s
user    0m12.908s
sys     0m0.593s

Only problem I see with "grep -r" is that it repeats filenames where it finds more than one occurence, but I guess that could be filtered someway. Nevertheless I think I'll stick to this one for my personal needs

Again, thank you all for your replies!

Cheers!

colucix · 02-14-2007, 04:57 PM

Quote:

Originally Posted by oxi

Only problem I see with "grep -r" is that it repeats filenames where it finds more than one occurence, but I guess that could be filtered someway.

Just a note to this one: the command to filter out multiple istances of the same line is uniq. Cheers.

makyo · 02-14-2007, 06:51 PM

Hi.

This may be useful to get just the filename and get it only once ... cheers, makyo

Quote:

-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match.
-- excerpt from man grep

Quigi · 02-15-2007, 02:16 PM

Quote:

Originally Posted by oxi

Only problem I see with "grep -r" is that it repeats filenames where it finds more than one occurence, but I guess that could be filtered someway.

What exactly do you want? I see you're throwing away the output of grep where it gives you the matching line. I think you want a list of all files containing a match, so use "grep -lr". Originally you just told us

Quote:

I'm making a bash script to search for a word recursively using grep.

I think grep does all you need, and you're not making a bash script

Regarding woes with spaces in file names:
Do NOT put spaces in file names. Your life will be more pleasant. (I know there's "C:\Program Files\" ... oops, wrong OS

). Anyway,

Quote:

colucix' solution did work, however, there's something I don't like about the renaming; I can't explain what, thou.

Maybe that it handles file names containing space and comma but not underscore? (nor newline)

Anyway, if you decided not to "grep -rl", I'd use find.

Code:

find -exec grep fnord {} /dev/null \;

That forks off a grep process for every file (and directory; probably not your intent), which can be slow.

To take care of space/comma/underscore/newlines in file names, use NUL (ASCII 0) as delimiter. (There are two characters that cannot occur in a file name: NUL and slash (/). The other 254 are possible.)

So:

Code:

find directory -type f -print0 | xargs -0 grep -l word_to_search_for

The action -print0 tells find to print NUL after each file name. The option -0 (that's digit zero) tells xargs that its input is NUL separated. It invokes grep with many arguments. Read your man pages (grep, find, xargs).

archtoad6 · 02-17-2007, 07:48 AM

I second both the use grep -l & "don't put spaces in file names". I would go 1 step further & say, "change the spaces to underscores in the filenames you have":

Code:

# SpaceOut 	archtoad6 	Feb. 2007
for F in *' '*
do
  mv "$F" `echo $F | tr \  _`
done

or as a 1-liner:

Code:

for F in *' '*;do mv "$F" `echo $F|tr \  _`;done

I 1ce made the 2nd ver. the name of an empty file & put it in a directory of ripped songs. Then anytime I wanted to clean up that dir., I would (in an X term.) double-click on the "filename", middle click, & press the "Any"

key.

Exploring "Spaced Out" Filenames
For grins, I ran the following:

Code:

locate \  |wc -l

The result? 44604 !!!
Next I filtered out the files salvaged from an old "Winders" drive: 1671, no where near as bad. Then I filtered out a dir. containing mostly stuff from clueless "Winders" sites & software: 202, getting better. Last I ran:

Code:

locate \  |grep -v "$CLUELESS\|\.html\|\.htm\|\.jpeg\|\.jpg" |wc -l

: 91, still not good.

I ran

Code:

locate \  |grep -v "$CLUELESS\|\.html\|\.htm\|\.jpeg\|\.jpg" |less -SN

& I was aghast at the # of folks, folks who should know better -- including WINE & Konqueror & KMail & MEPIS & Kubuntu -- that contributed to the list.

cfaj · 02-18-2007, 09:23 PM

Quote:

Originally Posted by oxi

Hi,

I'm making a bash script to search for a word recursively using grep.

Problem is, I don't find a reliable way to get a full listing of files for the current dir.

Cases are:

Code:

for $file in `ls -a`
...

for $file in `ls -am`
...

Use wildcards, not ls to generate a list of files.

for $file in ./*
...[/CODE]

This gives me a list of the files, which is correctly assigned each loop for $file, that is, $file value is what it is supposed to be each loop, but I don't get the hidden files

Code:

for $file in ./.*
...

With this I get only the hidden files.

So, I was wondering if I could use any kind of regular expression or glob to get both hidden and unhidden filenames. Or, if there's way to make "./.*", then "./*" and the join them in one list. Or if I can make ls use other separator than a ", ".

I've been googling for a looooot of time and couldn't find a solution to my problem, that's why i'm asking here.

In all of your examples, you should have "for file ...", not "for $file ...".

Either:

for file in ./* ./.*

Or:

shopt -s dotglob
for file in ./*

Dark_Helmet · 02-18-2007, 09:56 PM

This kind of question has been answered a few times before. I know, because I've written a number of them

Try this:

contents of simple_script.bash:

Code:

#!/bin/bash

old_ifs=${IFS}
IFS=$'
'

for filename in $( ls -1A ) ; do
  echo "File found: ${filename}"
done

IFS=${old_ifs}

Please, please, please realize that YES the second/matching single quote to the original IFS assignment is supposed to be on the next line. There should be nothing after the first quote of that same assignment (no space, no tab--nothing other than a newline).

at the command prompt:

Code:

$ chmod u+x simple_script.bash
$ touch nospace.txt
$ touch with\ space.txt
$ touch .nospace.hidden
$ touch .with\ space.hidden
$ touch with\,comma.txt
$ touch .with\,comma.hidden
$ ./simple_script.bash
File found: .nospace.hidden
File found: nospace.txt
File found: simple_script.bash
File found: with,comma.txt
File found: .with,comma.hidden
File found: .with space.hidden
File found: with space.txt

Also realize that if you plan to feed these filenames to another program (such as grep), you need to quote them--just as though you would type them manually. Otherwise, grep will not process the with,comma.txt file properly (nor the filenames with spaces). My suggestion would be to quote them like this:

Code:

for filename in $( ls -1A ) ; do
  ...
  grep -l "some text" "${filename}"
  ...
done

And if you aren't familiar with the IFS variable, then read the bash man page (man bash).

cfaj · 02-21-2007, 12:13 AM

Quote:

Originally Posted by Dark_Helmet

This kind of question has been answered a few times before. I know, because I've written a number of them

Try this:

contents of simple_script.bash:

Code:

#!/bin/bash

old_ifs=${IFS}
IFS=$'
'

In bash, you can do:

Code:

IFS=$'\n'

Quote:

Code:

for filename in $( ls -1A ) ; do
 echo "File found: ${filename}"
done

If you do that, quoting $filename will only prevent filename expansion on any wildcards in the name; you have already split pathological filenames into words by using ls instead of a wildcard.

Quote:

Code:

IFS=${old_ifs}

...

Also realize that if you plan to feed these filenames to another program (such as grep), you need to quote them--just as though you would type them manually. Otherwise, grep will not process the with,comma.txt file properly (nor the filenames with spaces).

There would not be a problem with with,comma.txt (unless IFS contains one of the characters in the name), but there would with names containing spaces or wildcard characters or other characters special to the shell (e.g., & and |).

Yes, a variable containing a filename should always be quoted, but that is too late to prevent the word splitting that will already have occurred by using for filename in $( ls -lA ).

Quote:

My suggestion would be to quote them like this:

Code:

for filename in $( ls -1A ) ; do
  ...
  grep -l "some text" "${filename}"
  ...
done

And if you aren't familiar with the IFS variable, then read the bash man page (man bash).

Dark_Helmet · 02-22-2007, 09:56 PM

Quote:

Originally Posted by cfaj

In bash, you can do:

Code:

IFS=$'\n'

I certainly agree. I prefer the other because it's one fewer keystroke

and the fact that you press Enter/Return reinforces the idea that word-splitting occurs only on newlines.

Quote:

Originally Posted by cfaj

If you do that, quoting $filename will only prevent filename expansion on any wildcards in the name; you have already split pathological filenames into words by using ls instead of a wildcard.

I'm afraid I don't understand what you're saying here. By having the shell split on newlines, the filename variable will contain the as-displayed-by-ls filename (because the shell already did the wildcard expansion for the '*' before executing the ls command). Unless the filename contains a double-quote, then quoting it will guarantee its accuracy--whether it contains spaces, wildcards, or none of the above. Because the original issue was about spaces, the filename must be quoted to indicate to grep (or any other utility used on the file) that the filename is a single argument. For instance:

Code:

filename="my test file.txt"
...
grep ${filename}
# The above evaluates to: grep my test file.txt
# Giving three separate arguments to grep: my, test, and file.txt

grep "${filename}"
# The above evaluates to: grep "my test file.txt"
# Which gives a single, correct argument to grep
# I'm not at m box, but it may require this instead:

grep "\"${filename}\""

Quote:

Originally Posted by cfaj

There would not be a problem with with,comma.txt (unless IFS contains one of the characters in the name), but there would with names containing spaces or wildcard characters or other characters special to the shell (e.g., & and |).

Yes, a variable containing a filename should always be quoted, but that is too late to prevent the word splitting that will already have occurred by using for filename in $( ls -lA ).

Again, I'm not sure what you're getting at here. IFS does not contain one of the characters in the filename--only a single newline. If the filename has a newline in it (if that's even possible--haven't tried), then the OP is S.O.L. The last statement given in the sample script

Code:

IFS=${old_ifs}

has no bearing on anything already accomplished. It simply restores the IFS variable to pre-modification in case the OP wants to continue with some other task. It returns the shell to default operation in regards to word-splitting and was not intended to imply that it would have any effect on the "for filename" loop.

And just to be thorough, I used a 1 (one) in the ls command--not an l (el). Makes a difference in the output