LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   How do I exclude multiple directories in awk with find? (https://www.linuxquestions.org/questions/linux-newbie-8/how-do-i-exclude-multiple-directories-in-awk-with-find-809089/)

Damarr 05-20-2010 10:05 AM

How do I exclude multiple directories in awk with find?
 
Hello :)

I found a script on webmaster world that mostly does what I need it to, but have been making modifications to tailor it to my specific needs.

Here is the original command:

Code:

find . -type f ¦ awk '!/\/\..*/ {dir=gensub(/(.+\/).+/,"\\1","g",$0); dir_list[dir]++} END {for (d in dir_list) printf "%s %s\n",d,dir_list[d]}' ¦ sort
I made it into a bash script that will monitor directories to let me know if they are getting too big.

Code:

#dirwatch.sh
#!/bin/bash
#set up the infinte loop
while [ 1 ]
do

#the important stuff
find . -mindepth 2 -type f | awk '!/\/\..*/ {dir=gensub(/(.+\/).+/,"\\1","g",$0); dir_list[dir]++} END {for (d in dir_list) printf "%s %s\n",dir_list[d],d}'| sort -n

echo "";

#put the time/date stamp
date
echo ----------------;

#set how long to sleep between cycles
sleep 5
done

I know that /\/\..*/ tells awk to ignore hidden directories, how do I define more directories to ignore? (i.e. temp, var, etc)? I've tried playing with prune before the awk command with limited success...I know that there are many ways to do the same thing and keep running into brick walls.

Thanks!

Damarr

colucix 05-20-2010 10:31 AM

You can add multiple regexp concatenated by the logical AND, e.g.
Code:

awk '!/\/\..*/&&!/\/var/&&!/\/etc/{ blah blah blah }'
or you can exclude them from the search in the first instance:
Code:

find / \( -wholename /var -o -wholename /etc -o -wholename /tmp \) -prune -o -mindepth 2 -type f -print | awk blah blah blah

grail 05-20-2010 11:10 AM

Just thought I would point out some redundancies in your awk:

1. dir=gensub(/(.+\/).+/,"\\1","g",$0) - sub, gsub and gensub work on $0 unless otherwise specified, so it is not required here

2. dir_list[dir]++ - as dir is only used in this one spot you could easily combine this with the previous step, ie dir_list[gensub(/(.+\/).+/,"\\1","g")]++

3. printf "%s %s\n",dir_list[d],d - two things on this one: a. as you have input a newline you may as well use print. b. by using print and because you have not
changed the output field separator (OFS) you can use a comma to achieve the space like so - print dir_list[d],d

All these suggestions are just that, suggestions :)

Damarr 05-20-2010 11:57 AM

Awesome!

Thanks for the help guys. Colucix, I ended up adding the && to the awk statement to exclude directories. Also found out how to put a space in, if the directory had a space in it...

&&!/\/My\ Documents/

Grail, ended up incorporating all of your changes to neaten up the code. Thanks again :)

-Damarr

grail 05-20-2010 06:58 PM

Sorry I was a bit sleepy when i looked at this and I do now have a suggestion for your actual question.
You should just be able to use the pipe alternator in your regex:
Code:

find <blah> | awk  '!/\/(..*|var|etc|My Documents)/{<blah>}'
And unless something funky is going on you should not require to escape the space for My Documents in regex

Damarr 05-21-2010 07:18 AM

Grail,

Even though it works, I decided to try your new code out as it's cleaner, however I couldnt get it to work properly.

-Damarr

grail 05-21-2010 07:32 AM

It doesn't find the directories or it finds the wrong ones?

Damarr 05-24-2010 11:53 AM

It doesn't find any.

-Damarr

EDIT: Got it, was missing a \ inside the paren

Code:

awk '!/\/(\..*|var|temp)/

colucix 05-24-2010 12:43 PM

The solution suggested by grail should be
Code:

find <blah> | awk  '!/\/(\.|var|etc|My Documents)/{<blah>}'
that is with the escaped dot, otherwise ..* matches any sequence of one or more characters (so that any file will be excluded), since dot means any single character. The regexp above will match any file or directory name that contains
Code:

/. OR /var OR /etc OR /My Documents
The negation does the rest.

Damarr 05-24-2010 01:14 PM

The script has gone from 20 lines to 140 because of some different options I'm putting in (menu selection for output to shell, a log etc)

Since I don't want to have to change the search in each iteration I define it at the beginning

Code:

searchstring='!/\/(\.|archive|my folder)/'
and

Code:

awk $searchstring
It doesn't like the space in "my folder" though. However, if I just do "my" it will ignore "my folder"

It did work when it was still part of the awk statement before I moved it up to define it only in one place...

colucix 05-24-2010 02:44 PM

Double quotes should do the trick:
Code:

awk "$searchstring"
they prevent word splitting when the shell encounters a space character (or a tab or a newline) in a string. In other words the space is interpreted literally as part of the string and not as field separator.

Damarr 05-24-2010 02:52 PM

Your suggestion worked, thanks a bunch :)

-damarr


All times are GMT -5. The time now is 04:05 AM.