LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 07-15-2011, 02:12 PM   #1
AndrewJS
LQ Newbie
 
Registered: Dec 2010
Posts: 16

Rep: Reputation: 0
use Awk to isolate a specific directory level...


hello

I have used Awk in the past to isolate the file name from a given path..that is to say, I may have a list of files contained in list.txt:

FIG. 1.

dir1/dir2/dir3/file1.dat
dir4/dir5/dir6/file2.dat
dir7/dir8/dir9/file3.dat
dir10/dir11/dir12/file4.dat
...and so on....

and I used the Awk command:

Code:
cat list.txt | awk -F "/" '{print $NF}'
to remove the prepended path name and so end up with list of the form:

FIG. 2.

file1.dat
file2.dat
file3.dat
file4.dat
..and so on...

I now want to do almost the exact opposite and instead of isolate the file name I want to isolate, say the middle directory in the list I have shown in Fig. 1, that is to say I want to end up with an output that would read:

Fig. 3.

dir2
dir5
dir8
dir11
...and so on...

Can someone please post the Awk command that would do this? (I assume it will be very similar in form to the Awk command I showed above.)
The point is, sometimes I may want to isolate the second directory, sometimes I may want to isolate the third directory or tenth or whatever - so I am hoping that if someone posts the Awk command to isolate the second level directory (to produce the output I showed in Fig.3) it should be fairly obvious by looking at the form of this command how to alter it and so isolate any other directory I want.

I hope I've been clear in what I'm asking!
 
Old 07-15-2011, 02:21 PM   #2
opnsrc
LQ Newbie
 
Registered: Dec 2005
Posts: 28

Rep: Reputation: 1
Yes, very similar, replace $NF with $2.
 
Old 07-15-2011, 02:21 PM   #3
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 13.1
Posts: 1,329

Rep: Reputation: 254Reputation: 254Reputation: 254
What about checking the man page of awk, section Fields.
 
Old 07-15-2011, 02:50 PM   #4
MTK358
LQ 5k Club
 
Registered: Sep 2009
Posts: 6,443
Blog Entries: 3

Rep: Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721Reputation: 721
Quote:
Originally Posted by AndrewJS View Post
and I used the Awk command:

Code:
cat list.txt | awk -F "/" '{print $NF}'
Awk is really unnecessary here. First, there's the basename command which is made just for this:

Code:
$ basename path/to/file
file
Also, it's possible to do it all in bash without using a command:

Code:
path=path/to/file
echo "${path##*/}"
Quote:
Originally Posted by AndrewJS View Post
Can someone please post the Awk command that would do this? (I assume it will be very similar in form to the Awk command I showed above.)
The point is, sometimes I may want to isolate the second directory, sometimes I may want to isolate the third directory or tenth or whatever - so I am hoping that if someone posts the Awk command to isolate the second level directory (to produce the output I showed in Fig.3) it should be fairly obvious by looking at the form of this command how to alter it and so isolate any other directory I want.

I hope I've been clear in what I'm asking!
If you understand that "$NF" lets the NFth field, then it should be really obvious.
 
1 members found this post helpful.
Old 07-15-2011, 03:11 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,564

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
I would add that cat is a wasted command here as well .. Just pass the file name to awk.
 
Old 07-15-2011, 03:54 PM   #6
PTrenholme
Senior Member
 
Registered: Dec 2004
Location: Olympia, WA, USA
Distribution: Fedora, (K)Ubuntu
Posts: 4,186

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
If, however, you want a list of the unique directory names, do something like this:

gawk -F'/' '{++directory[$3]} END {for (i in directory) {print i " (" directory[i] " files)"}}'

Here's what the output looks like:
Code:
$ ls -1 */*/*/* | gawk -F'/' '{++directory[$3]}; END {for (i in directory) {print i " (" directory[i] " files)"}}'
 (41792 files)
The Two Faces of Tomorrow (183 files)
Bennett, Nigel (1 files)
Screen Savers (18 files)
Harald (177 files)
4 1635-The Cannon Law (118 files)
Series - Belisarius (18 files)
...
I'm not sure what you're parsing. In my example, I was piping the file list from the ls command which is not a very efficient way to do this sort of thing. (An easier way would be find ./ -maxdepth 3 -mindepth 3 -type d, but you wouldn't get the count.)

Last edited by PTrenholme; 07-15-2011 at 03:55 PM.
 
Old 09-04-2011, 02:37 AM   #7
archtoad6
Senior Member
 
Registered: Oct 2004
Location: Houston, TX (usa)
Distribution: MEPIS, Debian, Knoppix,
Posts: 4,727
Blog Entries: 15

Rep: Reputation: 234Reputation: 234Reputation: 234
Quote:
Originally Posted by grail View Post
I would add that cat is a wasted command here as well .. Just pass the file name to awk.
Unnecessary, yes; wasted, maybe not. I sometimes use cat this way to make the name of the file being processed stand out. IMO, this is a good programming style.
 
Old 09-04-2011, 03:53 AM   #8
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 46
Quote:
Originally Posted by archtoad6 View Post
Unnecessary, yes; wasted, maybe not. I sometimes use cat this way to make the name of the file being processed stand out. IMO, this is a good programming style.
Making the name stand out using cat like that is less important than trying to reduce overheads plus the annoyance of the pipe chaining not able to "see" the scope of variables defined outside...eg

Code:
var=0
cat file| while read line
do
  ((var++))
done
echo "var outside: $var"

var=0
while read line
do
  ((var++))
done < file
echo "var outside: $var"
test run:
Code:
$ bash test.sh
var outside: 0
var outside: 4
Thus, IMO, this is not a good shell scripting practice and should be avoided if possible.

Last edited by kurumi; 09-04-2011 at 03:54 AM.
 
1 members found this post helpful.
Old 09-04-2011, 07:46 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,564

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
+1 to kurumi's post as my sentiments exactly.
 
Old 09-04-2011, 08:31 AM   #10
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,361
Blog Entries: 55

Rep: Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547Reputation: 3547
Quote:
Originally Posted by kurumi View Post
Making the name stand out using cat like that is less important than trying to reduce overheads
...which many will recognize as UUOC (aka "The Award For The Most Gratuitous Use Of The Word Cat In A Serious Shell Script").
 
Old 09-04-2011, 10:33 AM   #11
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,837

Rep: Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981
Bash v4.2 has introduced the lastpipe shell option, which makes the last command in a pipe chain run in the current environment, ksh-style. So the variable-scope problem can now be avoided, at least. However, I think it's still better to use bash's built-in file access instead of forking off a process for the external cat.

As for the OP's request, there are also several ways we can go about it inside bash.

The first and probably best is use an array to separate the name into fields.
Code:
IFS=/
while read -a dirs; do
	echo "${dirs[1]}"		#gives you the second directory
done <file.txt
The second requires going through multiple steps parameter expansion to extract the field you want.
Code:
while read dirname; do
	dirname2="${dirname#*/}"
	dirname2="${dirname%%/*}"
	echo "$dirname2"		#gives you the second directory
done <file.txt
Finally, you can use a regular expression inside bash's [[ test to do the same.
Code:
re='([^/]+)/([^/]+)/([^/]+)/([^/]+)'
while read dirname; do
	[[ "$dirname" =~ $re ]]
	echo "${BASH_REMATCH[2]}"	#gives you the second directory
done <file.txt
I suppose you have to be careful how you construct the regex, though.
 
Old 09-04-2011, 11:01 AM   #12
Reuti
Senior Member
 
Registered: Dec 2004
Location: Marburg, Germany
Distribution: openSUSE 13.1
Posts: 1,329

Rep: Reputation: 254Reputation: 254Reputation: 254
Quote:
Originally Posted by unSpawn View Post
...which many will recognize as UUOC (aka "The Award For The Most Gratuitous Use Of The Word Cat In A Serious Shell Script").
+1

If itís important to have the name of the file in question at the beginning of the statement, I would suggest to define a function for it. Inside the function you can put it at the end to feed the while loop, but in the function call itís the argument.
 
Old 09-04-2011, 11:06 AM   #13
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,564

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
David you forgot one of my favourite array style options
Code:
while read -r dirs; do
    set -- ${dirs//\// }
    echo "$1"		#gives you the first directory
done <file.txt
 
Old 09-04-2011, 11:35 AM   #14
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,837

Rep: Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981Reputation: 1981
My bad.

Actually I don't like recommending the positional parameters, at least not without a warning for the newbies. Since set overwrites any previous values, you might mess up your script if they're already in use for other things.

Still, it does have the benefit of not needing to set IFS.

BTW, the UUOC award text demonstrates how you can list the filename first without the use of cat. I don't know if it's any more readable, though.
Code:
<cat list.txt awk -F "/" '{print $NF}'
Redirections can be defined anywhere on the line, remember?
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] awk - summing over specific lines ilukacevic Programming 7 03-31-2011 05:33 AM
replace specific character after specific line by awk Syed Tarique Moin Programming 2 07-19-2010 02:47 PM
Specific result of awk script jeesun Programming 3 01-24-2010 06:42 AM
Get all lines containing 23 specific words with AWK cgcamal Programming 3 11-05-2008 11:51 AM
Searching a specific directory for a specific extension? RoaCh Of DisCor Linux - Newbie 3 08-13-2005 04:28 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 02:00 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration