LinuxQuestions.org
Go Job Hunting at the LQ Job Marketplace
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 05-20-2010, 10:05 AM   #1
Damarr
LQ Newbie
 
Registered: May 2010
Posts: 15

Rep: Reputation: 0
How do I exclude multiple directories in awk with find?


Hello

I found a script on webmaster world that mostly does what I need it to, but have been making modifications to tailor it to my specific needs.

Here is the original command:

Code:
find . -type f  awk '!/\/\..*/ {dir=gensub(/(.+\/).+/,"\\1","g",$0); dir_list[dir]++} END {for (d in dir_list) printf "%s %s\n",d,dir_list[d]}'  sort
I made it into a bash script that will monitor directories to let me know if they are getting too big.

Code:
#dirwatch.sh
#!/bin/bash
#set up the infinte loop
while [ 1 ]
do

#the important stuff
find . -mindepth 2 -type f | awk '!/\/\..*/ {dir=gensub(/(.+\/).+/,"\\1","g",$0); dir_list[dir]++} END {for (d in dir_list) printf "%s %s\n",dir_list[d],d}'| sort -n

echo "";

#put the time/date stamp
date
echo ----------------;

#set how long to sleep between cycles
sleep 5
done
I know that /\/\..*/ tells awk to ignore hidden directories, how do I define more directories to ignore? (i.e. temp, var, etc)? I've tried playing with prune before the awk command with limited success...I know that there are many ways to do the same thing and keep running into brick walls.

Thanks!

Damarr

Last edited by Damarr; 05-20-2010 at 10:07 AM.
 
Old 05-20-2010, 10:31 AM   #2
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,508

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
You can add multiple regexp concatenated by the logical AND, e.g.
Code:
awk '!/\/\..*/&&!/\/var/&&!/\/etc/{ blah blah blah }'
or you can exclude them from the search in the first instance:
Code:
find / \( -wholename /var -o -wholename /etc -o -wholename /tmp \) -prune -o -mindepth 2 -type f -print | awk blah blah blah

Last edited by colucix; 05-20-2010 at 10:42 AM. Reason: Added find -prune solution.
 
Old 05-20-2010, 11:10 AM   #3
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,564

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
Just thought I would point out some redundancies in your awk:

1. dir=gensub(/(.+\/).+/,"\\1","g",$0) - sub, gsub and gensub work on $0 unless otherwise specified, so it is not required here

2. dir_list[dir]++ - as dir is only used in this one spot you could easily combine this with the previous step, ie dir_list[gensub(/(.+\/).+/,"\\1","g")]++

3. printf "%s %s\n",dir_list[d],d - two things on this one: a. as you have input a newline you may as well use print. b. by using print and because you have not
changed the output field separator (OFS) you can use a comma to achieve the space like so - print dir_list[d],d

All these suggestions are just that, suggestions
 
Old 05-20-2010, 11:57 AM   #4
Damarr
LQ Newbie
 
Registered: May 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Awesome!

Thanks for the help guys. Colucix, I ended up adding the && to the awk statement to exclude directories. Also found out how to put a space in, if the directory had a space in it...

&&!/\/My\ Documents/

Grail, ended up incorporating all of your changes to neaten up the code. Thanks again

-Damarr
 
Old 05-20-2010, 06:58 PM   #5
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,564

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
Sorry I was a bit sleepy when i looked at this and I do now have a suggestion for your actual question.
You should just be able to use the pipe alternator in your regex:
Code:
find <blah> | awk  '!/\/(..*|var|etc|My Documents)/{<blah>}'
And unless something funky is going on you should not require to escape the space for My Documents in regex
 
Old 05-21-2010, 07:18 AM   #6
Damarr
LQ Newbie
 
Registered: May 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Grail,

Even though it works, I decided to try your new code out as it's cleaner, however I couldnt get it to work properly.

-Damarr
 
Old 05-21-2010, 07:32 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,564

Rep: Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939Reputation: 1939
It doesn't find the directories or it finds the wrong ones?
 
Old 05-24-2010, 11:53 AM   #8
Damarr
LQ Newbie
 
Registered: May 2010
Posts: 15

Original Poster
Rep: Reputation: 0
It doesn't find any.

-Damarr

EDIT: Got it, was missing a \ inside the paren

Code:
awk '!/\/(\..*|var|temp)/

Last edited by Damarr; 05-24-2010 at 12:41 PM.
 
Old 05-24-2010, 12:43 PM   #9
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,508

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
The solution suggested by grail should be
Code:
find <blah> | awk  '!/\/(\.|var|etc|My Documents)/{<blah>}'
that is with the escaped dot, otherwise ..* matches any sequence of one or more characters (so that any file will be excluded), since dot means any single character. The regexp above will match any file or directory name that contains
Code:
/. OR /var OR /etc OR /My Documents
The negation does the rest.
 
Old 05-24-2010, 01:14 PM   #10
Damarr
LQ Newbie
 
Registered: May 2010
Posts: 15

Original Poster
Rep: Reputation: 0
The script has gone from 20 lines to 140 because of some different options I'm putting in (menu selection for output to shell, a log etc)

Since I don't want to have to change the search in each iteration I define it at the beginning

Code:
searchstring='!/\/(\.|archive|my folder)/'
and

Code:
awk $searchstring
It doesn't like the space in "my folder" though. However, if I just do "my" it will ignore "my folder"

It did work when it was still part of the awk statement before I moved it up to define it only in one place...
 
Old 05-24-2010, 02:44 PM   #11
colucix
Moderator
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,508

Rep: Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957Reputation: 1957
Double quotes should do the trick:
Code:
awk "$searchstring"
they prevent word splitting when the shell encounters a space character (or a tab or a newline) in a string. In other words the space is interpreted literally as part of the string and not as field separator.
 
Old 05-24-2010, 02:52 PM   #12
Damarr
LQ Newbie
 
Registered: May 2010
Posts: 15

Original Poster
Rep: Reputation: 0
Your suggestion worked, thanks a bunch

-damarr
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
CVS Exclude : Exclude sub directories from check out On Linux from command line shajay12 Linux - Newbie 1 08-03-2009 12:36 AM
exclude directories from find noir911 Linux - General 3 11-22-2006 06:39 AM
How exclude | from txt.file using awk or sed? sarajevo Programming 2 08-21-2006 07:26 AM
Find excluding multiple directories with a wildcard pteigeler Linux - Software 1 09-02-2005 10:57 AM
find and copy files into multiple directories avargas22 Linux - Newbie 2 04-01-2004 11:11 AM


All times are GMT -5. The time now is 03:55 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration