LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices



Reply
 
Search this Thread
Old 02-14-2007, 12:05 AM   #1
oxi
LQ Newbie
 
Registered: Oct 2006
Location: Milky Way
Distribution: Gentoo
Posts: 9

Rep: Reputation: 0
Bash scripting problem: Can't get a list of all files, including hidden ones


Hi,

I'm making a bash script to search for a word recursively using grep.

Problem is, I don't find a reliable way to get a full listing of files for the current dir.

Cases are:


Code:
for $file in `ls -a`
...
This will get me a list with all files, including hidden ones. Problem is, they are separated by spaces, and if a filename has spaces in it, I have no way to make a difference between the chunks of that filename and others.


Code:
for $file in `ls -am`
...
Nope, sorry but this one isn't good enough for me. This will get me a list of filenames separated by a comma and a space each. Then I'd make fixes to make it usable for "for". Problem is, you can create a filename called "blah, blah" this way: "$ touch blah\,\ blah". So I just can't use it. Now, if it used another character instead of a comma...


Code:
for $file in ./*
...
This gives me a list of the files, which is correctly assigned each loop for $file, that is, $file value is what it is supposed to be each loop, but I don't get the hidden files


Code:
for $file in ./.*
...
With this I get only the hidden files.


So, I was wondering if I could use any kind of regular expression or glob to get both hidden and unhidden filenames. Or, if there's way to make "./.*", then "./*" and the join them in one list. Or if I can make ls use other separator than a ", ".

I've been googling for a looooot of time and couldn't find a solution to my problem, that's why i'm asking here.

I really appreciate your help and thanks in advance pals.

Cheers
 
Old 02-14-2007, 01:30 AM   #2
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Rep: Reputation: 47
Quote:
I'm making a bash script to search for a word recursively using grep.
Try using

Code:
find -exec grep fnord {} /dev/null \;
Find takes care of traversing the directory structure, and doesn't worry about hidden files. The '{}' in the -exec part of the expression is the current file.

Normally grep, when called with a single argument will just echo the matching line of text.

Adding the file /dev/null as the second file means that grep will echo the file name and the matching text.

Note that under gnu grep

grep -r

will also read through directories recursively.
 
Old 02-14-2007, 01:47 AM   #3
bartonski
Member
 
Registered: Jul 2006
Location: Louisville, KY
Distribution: Fedora 12, Slackware, Debian, Ubuntu Karmic, FreeBSD 7.1
Posts: 443
Blog Entries: 1

Rep: Reputation: 47
Quote:
So, I was wondering if I could use any kind of regular expression or glob to get both hidden and unhidden filenames. Or, if there's way to make "./.*", then "./*" and the join them in one list.
try:

Code:
for $file in ./* ./.*
 
Old 02-14-2007, 04:36 AM   #4
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by oxi
Problem is, they are separated by spaces, and if a filename has spaces in it, I have no way to make a difference between the chunks of that filename and others.
If the problem is only space inside file names, you can do something like
Code:
#!/bin/bash
for file in `ls -a | sed s/\ /_/g` ; do
   if [ ! -f $file ] ; then
      file=`echo $file | sed s/_/\ /g` 
   fi
   # my commands using $file here
done
This convert space inside filename to underscore "_" and when the for loop encounter a non-existent filename, the code inside if condition reverts them back. Note: this does not rename the file, simply manage the filenames internally. By the way, I think - as told by bartonski - that the find -exec solution is better.
 
Old 02-14-2007, 05:43 AM   #5
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,775

Rep: Reputation: 481Reputation: 481Reputation: 481Reputation: 481Reputation: 481
IFS=","

for $file in `ls -am`

Resetting the internal file separator temporarily should let you parse more easily.
 
Old 02-14-2007, 07:51 AM   #6
mickyg
Member
 
Registered: Oct 2004
Location: UK
Distribution: Ubuntu/Kubuntu
Posts: 245

Rep: Reputation: 30
FYI, to get a list from ls you could have used 'ls -a1'.
 
Old 02-14-2007, 04:55 PM   #7
oxi
LQ Newbie
 
Registered: Oct 2006
Location: Milky Way
Distribution: Gentoo
Posts: 9

Original Poster
Rep: Reputation: 0
Thumbs up Thank you everybody

Wow, lots of replies. I really appreciate your effort. Thank you all

To the point:

Quote:
FYI, to get a list from ls you could have used 'ls -a1'.
This doesn't work for me. It won't handle spaces in filenames properly.

Quote:
IFS=","

for $file in `ls -am`

Resetting the internal file separator temporarily should let you parse more easily.
I'm not sure I understand this one very well. It actually worked for filenames with spaces, but it didn't for filenames with commas.

Now for the ones which did work:

colucix' solution did work, however, there's something I don't like about the renaming; I can't explain what, thou.

This one I named b1:
Code:
#!/bin/bash

#Parameter check omitted
WORD=$1

function search () # $1 = dir to search
{
        for file in `ls -a "$1" 2> /dev/null | sed s/\ /_/g 2> /dev/null`
        do
                file="$1/$file"
                if [ "$file" = "$1/." -o "$file" = "$1/.." -o "$file" = "$1/*" ]; then
                        continue
                fi

                if [ ! -e "$file" ] ; then
                        file=`echo "$file" 2> /dev/null | sed s/_/\ /g 2> /dev/null`
                fi

                if [ -d "$file" ]; then
                        search "$file"
                        continue
                fi
                if [ -f "$file" ]; then
                        grep $WORD > /dev/null 2> /dev/null < "$file"
                        if [ $? -eq 0 ]; then
                                echo "$file" 2> /dev/null
                                continue
                        fi
                        continue
                fi
        done
}

search `pwd`
This is an implementation for bartonski's second reply, which I named b0:
Code:
#!/bin/bash

#Parameter check omitted
WORD=$1

function search () # $1 = dir to search
{
        for file in $1/.* $1/*
        do
                if [ "$file" = "$1/." -o "$file" = "$1/.." -o "$file" = "$1/*" ]; then
                        continue
                fi
                if [ -d "$file" ]; then
                        search "$file"
                        continue
                fi
                if [ -f "$file" ]; then
                        grep $WORD > /dev/null 2> /dev/null < "$file"
                        if [ $? -eq 0 ]; then
                                echo "$file" 2> /dev/null
                                continue
                        fi
                        continue
                fi
        done
}

search `pwd`
As for the "find" thing, I couldn't come up with the proper syntax to filter out the garbage and get just the filenames. I tried "find -exec grep cool {} /dev/null \;" as sugested, then "find -exec grep cool /dev/null \;", and then lots of combinations. Problem is, I think "find" won't let you put more than a command after one "-exec", at least I wasn't able to escape it, and if you use 2 "-exec"s, I don't know how to redirect the command streams. IMHO it's all a little messy.

Now, to be honest, using "grep -r" seems to me the best and fastest way to accomplish this. My point was to try to use as few as possible commands(I don't mean number of lines, but rather the number of total "external-to-bash" commands), and this way, you use very few commands, indeed. I like the second reply solution thou, where ./.* and ./* are used, because its more "mechanical".

Also, b0 is slightly faster than b1. Here:

Code:
oxi@oxibox /etc $ time b0 cool
/etc/config-archive/etc/cvsd/cvsd.conf
/etc/config-archive/etc/cvsd/cvsd.conf.dist
/etc/config-archive/etc/gimp/2.0/gimprc
/etc/config-archive/etc/gimp/2.0/gimprc.dist
/etc/config-archive/etc/sensors.conf
/etc/config-archive/etc/sensors.conf.dist
/etc/cvsd/cvsd.conf
/etc/gimp/2.0/gimprc
/etc/mime.types
/etc/pcmcia/wireless
/etc/sensors.conf

real    0m21.178s
user    0m15.999s
sys     0m3.226s
oxi@oxibox /etc $ time b1 cool
/etc/config-archive/etc/cvsd/cvsd.conf
/etc/config-archive/etc/cvsd/cvsd.conf.dist
/etc/config-archive/etc/gimp/2.0/gimprc
/etc/config-archive/etc/gimp/2.0/gimprc.dist
/etc/config-archive/etc/sensors.conf
/etc/config-archive/etc/sensors.conf.dist
/etc/cvsd/cvsd.conf
/etc/gimp/2.0/gimprc
/etc/mime.types
/etc/pcmcia/wireless
/etc/sensors.conf

real    0m24.457s
user    0m17.484s
sys     0m4.936s
oxi@oxibox /etc $ time grep -r cool . 2> /dev/null | cut -d: -f1
./cvsd/cvsd.conf
./gimp/2.0/gimprc
./config-archive/etc/cvsd/cvsd.conf
./config-archive/etc/cvsd/cvsd.conf.dist
./config-archive/etc/gimp/2.0/gimprc
./config-archive/etc/gimp/2.0/gimprc.dist
./config-archive/etc/sensors.conf.dist
./config-archive/etc/sensors.conf.dist
./config-archive/etc/sensors.conf
./config-archive/etc/sensors.conf
./pcmcia/wireless
./sensors.conf
./sensors.conf
./mime.types

real    0m14.714s
user    0m12.908s
sys     0m0.593s
Only problem I see with "grep -r" is that it repeats filenames where it finds more than one occurence, but I guess that could be filtered someway. Nevertheless I think I'll stick to this one for my personal needs

Again, thank you all for your replies!

Cheers!
 
Old 02-14-2007, 05:57 PM   #8
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Quote:
Originally Posted by oxi
Only problem I see with "grep -r" is that it repeats filenames where it finds more than one occurence, but I guess that could be filtered someway.
Just a note to this one: the command to filter out multiple istances of the same line is uniq. Cheers.
 
Old 02-14-2007, 07:51 PM   #9
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 719

Rep: Reputation: 72
Hi.

This may be useful to get just the filename and get it only once ... cheers, makyo
Quote:
-l, --files-with-matches
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match.
-- excerpt from man grep
 
Old 02-15-2007, 03:16 PM   #10
Quigi
Member
 
Registered: Mar 2003
Location: Cambridge, MA, USA
Distribution: Ubuntu (Dapper and Heron)
Posts: 377

Rep: Reputation: 31
Quote:
Originally Posted by oxi
Only problem I see with "grep -r" is that it repeats filenames where it finds more than one occurence, but I guess that could be filtered someway.
What exactly do you want? I see you're throwing away the output of grep where it gives you the matching line. I think you want a list of all files containing a match, so use "grep -lr". Originally you just told us
Quote:
I'm making a bash script to search for a word recursively using grep.
I think grep does all you need, and you're not making a bash script

Regarding woes with spaces in file names:
Do NOT put spaces in file names. Your life will be more pleasant. (I know there's "C:\Program Files\" ... oops, wrong OS ). Anyway,

Quote:
colucix' solution did work, however, there's something I don't like about the renaming; I can't explain what, thou.
Maybe that it handles file names containing space and comma but not underscore? (nor newline)

Anyway, if you decided not to "grep -rl", I'd use find.
Code:
find -exec grep fnord {} /dev/null \;
That forks off a grep process for every file (and directory; probably not your intent), which can be slow.

To take care of space/comma/underscore/newlines in file names, use NUL (ASCII 0) as delimiter. (There are two characters that cannot occur in a file name: NUL and slash (/). The other 254 are possible.)

So:
Code:
find directory -type f -print0 | xargs -0 grep -l word_to_search_for
The action -print0 tells find to print NUL after each file name. The option -0 (that's digit zero) tells xargs that its input is NUL separated. It invokes grep with many arguments. Read your man pages (grep, find, xargs).
 
Old 02-17-2007, 08:48 AM   #11
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 231Reputation: 231Reputation: 231
I second both the use grep -l & "don't put spaces in file names". I would go 1 step further & say, "change the spaces to underscores in the filenames you have":
Code:
# SpaceOut 	archtoad6 	Feb. 2007
for F in *' '*
do
  mv "$F" `echo $F | tr \  _`
done
or as a 1-liner:
Code:
for F in *' '*;do mv "$F" `echo $F|tr \  _`;done
I 1ce made the 2nd ver. the name of an empty file & put it in a directory of ripped songs. Then anytime I wanted to clean up that dir., I would (in an X term.) double-click on the "filename", middle click, & press the "Any" key.


Exploring "Spaced Out" Filenames
For grins, I ran the following:
Code:
locate \  |wc -l
The result? 44604 !!!
Next I filtered out the files salvaged from an old "Winders" drive: 1671, no where near as bad. Then I filtered out a dir. containing mostly stuff from clueless "Winders" sites & software: 202, getting better. Last I ran:
Code:
locate \  |grep -v "$CLUELESS\|\.html\|\.htm\|\.jpeg\|\.jpg" |wc -l
: 91, still not good.

I ran
Code:
locate \  |grep -v "$CLUELESS\|\.html\|\.htm\|\.jpeg\|\.jpg" |less -SN
& I was aghast at the # of folks, folks who should know better -- including WINE & Konqueror & KMail & MEPIS & Kubuntu -- that contributed to the list.
 
Old 02-18-2007, 10:23 PM   #12
cfaj
Member
 
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221

Rep: Reputation: 31
Quote:
Originally Posted by oxi
Hi,

I'm making a bash script to search for a word recursively using grep.

Problem is, I don't find a reliable way to get a full listing of files for the current dir.

Cases are:


Code:
for $file in `ls -a`
...

for $file in `ls -am`
...
Use wildcards, not ls to generate a list of files.

for $file in ./*
...[/CODE]

This gives me a list of the files, which is correctly assigned each loop for $file, that is, $file value is what it is supposed to be each loop, but I don't get the hidden files


Code:
for $file in ./.*
...
With this I get only the hidden files.


So, I was wondering if I could use any kind of regular expression or glob to get both hidden and unhidden filenames. Or, if there's way to make "./.*", then "./*" and the join them in one list. Or if I can make ls use other separator than a ", ".

I've been googling for a looooot of time and couldn't find a solution to my problem, that's why i'm asking here.
In all of your examples, you should have "for file ...", not "for $file ...".

Either:

for file in ./* ./.*

Or:

shopt -s dotglob
for file in ./*
 
Old 02-18-2007, 10:56 PM   #13
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
This kind of question has been answered a few times before. I know, because I've written a number of them

Try this:

contents of simple_script.bash:
Code:
#!/bin/bash

old_ifs=${IFS}
IFS=$'
'

for filename in $( ls -1A ) ; do
  echo "File found: ${filename}"
done

IFS=${old_ifs}
Please, please, please realize that YES the second/matching single quote to the original IFS assignment is supposed to be on the next line. There should be nothing after the first quote of that same assignment (no space, no tab--nothing other than a newline).

at the command prompt:
Code:
$ chmod u+x simple_script.bash
$ touch nospace.txt
$ touch with\ space.txt
$ touch .nospace.hidden
$ touch .with\ space.hidden
$ touch with\,comma.txt
$ touch .with\,comma.hidden
$ ./simple_script.bash
File found: .nospace.hidden
File found: nospace.txt
File found: simple_script.bash
File found: with,comma.txt
File found: .with,comma.hidden
File found: .with space.hidden
File found: with space.txt
Also realize that if you plan to feed these filenames to another program (such as grep), you need to quote them--just as though you would type them manually. Otherwise, grep will not process the with,comma.txt file properly (nor the filenames with spaces). My suggestion would be to quote them like this:

Code:
for filename in $( ls -1A ) ; do
  ...
  grep -l "some text" "${filename}"
  ...
done
And if you aren't familiar with the IFS variable, then read the bash man page (man bash).
 
Old 02-21-2007, 01:13 AM   #14
cfaj
Member
 
Registered: Dec 2003
Location: Toronto, Canada
Distribution: Mint, Mandriva
Posts: 221

Rep: Reputation: 31
Quote:
Originally Posted by Dark_Helmet
This kind of question has been answered a few times before. I know, because I've written a number of them

Try this:

contents of simple_script.bash:
Code:
#!/bin/bash

old_ifs=${IFS}
IFS=$'
'
In bash, you can do:

Code:
IFS=$'\n'
Quote:
Code:
for filename in $( ls -1A ) ; do
 echo "File found: ${filename}"
done
If you do that, quoting $filename will only prevent filename expansion on any wildcards in the name; you have already split pathological filenames into words by using ls instead of a wildcard.

Quote:
Code:
IFS=${old_ifs}
...

Also realize that if you plan to feed these filenames to another program (such as grep), you need to quote them--just as though you would type them manually. Otherwise, grep will not process the with,comma.txt file properly (nor the filenames with spaces).
There would not be a problem with with,comma.txt (unless IFS contains one of the characters in the name), but there would with names containing spaces or wildcard characters or other characters special to the shell (e.g., & and |).

Yes, a variable containing a filename should always be quoted, but that is too late to prevent the word splitting that will already have occurred by using for filename in $( ls -lA ).

Quote:
My suggestion would be to quote them like this:

Code:
for filename in $( ls -1A ) ; do
  ...
  grep -l "some text" "${filename}"
  ...
done
And if you aren't familiar with the IFS variable, then read the bash man page (man bash).
 
Old 02-22-2007, 10:56 PM   #15
Dark_Helmet
Senior Member
 
Registered: Jan 2003
Posts: 2,786

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by cfaj
In bash, you can do:
Code:
IFS=$'\n'
I certainly agree. I prefer the other because it's one fewer keystroke and the fact that you press Enter/Return reinforces the idea that word-splitting occurs only on newlines.

Quote:
Originally Posted by cfaj
If you do that, quoting $filename will only prevent filename expansion on any wildcards in the name; you have already split pathological filenames into words by using ls instead of a wildcard.
I'm afraid I don't understand what you're saying here. By having the shell split on newlines, the filename variable will contain the as-displayed-by-ls filename (because the shell already did the wildcard expansion for the '*' before executing the ls command). Unless the filename contains a double-quote, then quoting it will guarantee its accuracy--whether it contains spaces, wildcards, or none of the above. Because the original issue was about spaces, the filename must be quoted to indicate to grep (or any other utility used on the file) that the filename is a single argument. For instance:
Code:
filename="my test file.txt"
...
grep ${filename}
# The above evaluates to: grep my test file.txt
# Giving three separate arguments to grep: my, test, and file.txt

grep "${filename}"
# The above evaluates to: grep "my test file.txt"
# Which gives a single, correct argument to grep
# I'm not at m box, but it may require this instead:

grep "\"${filename}\""
Quote:
Originally Posted by cfaj
There would not be a problem with with,comma.txt (unless IFS contains one of the characters in the name), but there would with names containing spaces or wildcard characters or other characters special to the shell (e.g., & and |).

Yes, a variable containing a filename should always be quoted, but that is too late to prevent the word splitting that will already have occurred by using for filename in $( ls -lA ).
Again, I'm not sure what you're getting at here. IFS does not contain one of the characters in the filename--only a single newline. If the filename has a newline in it (if that's even possible--haven't tried), then the OP is S.O.L. The last statement given in the sample script
Code:
IFS=${old_ifs}
has no bearing on anything already accomplished. It simply restores the IFS variable to pre-modification in case the OP wants to continue with some other task. It returns the shell to default operation in regards to word-splitting and was not intended to imply that it would have any effect on the "for filename" loop.

And just to be thorough, I used a 1 (one) in the ls command--not an l (el). Makes a difference in the output

Last edited by Dark_Helmet; 02-22-2007 at 11:35 PM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash Scripting: Editing external files trek413 Linux - Software 1 11-02-2006 05:11 PM
A list of files I own in bash? subnet_rx Linux - Software 4 07-10-2006 01:01 PM
moving files that have spaces in variables -bash scripting bhar0761 Programming 10 09-22-2005 08:30 AM
bash scripting - editing files brian0918 Linux - Newbie 2 07-01-2003 03:27 PM
bash scripting - editing files brian0918 Programming 1 06-30-2003 07:16 PM


All times are GMT -5. The time now is 03:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration