LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-18-2012, 02:20 AM   #1
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Explain the awk syntax


Suppose you're using following awk code to filter unique content (i.e. lines) from a file:
awk '!_[$0]++' <filename.txt>
Could anybody explain this awk code? What does $0 and ++ do here? What does !_ do?
 
Old 10-18-2012, 07:22 AM   #2
kabamaru
Member
 
Registered: Dec 2011
Location: Greece
Distribution: Slackware
Posts: 276

Rep: Reputation: 134Reputation: 134
"_" is actually an array name, although a little cryptic. It could easily be "myarray" or any other valid name.

Arrays in AWK are associative, i.e. they are consisted of key-value pairs. You access the value of an element through its key. If an array doesn't exist, it is created. If you name a key that doesn't exist it is created. If a key is not associated with a value, that value becomes "" (empty string).

"_[$0]" checks if the key $0 (the entire current line) is associated with a zero (or empty string), or nonzero value (or nonempty string). The first case returns false, while the second returns true. Awk will perform an action (if not specified, the default is "print $0") when the test returns true. The leading "!" in "!_[$0]" negates the behavior; it will perform the action (print $0) only when the test returns false.

The "++" adds 1 to the value associated with the key, AFTER the value has been returned to AWK.

Confusing? To make this more clear, lets say we have a file with the contents below:

Code:
john
mary
paul
mary
john
john
phil
paul
AWK reads the first line ("john"). It creates an array named "_" and a key "john" ($0). Because we don't assign a value to _["john"]:

_["john"] = ""
"" evaluates to false, so the pattern returns false
add 1 to _["john"], so now _["john"] = 1

The first line will be printed because it returned false (remember the "!").

Here's a visualization of AWK parsing each line:

Code:
_["john"] is 0 (or "")		return false		_["john"] = 0 + 1 = 1
_["mary"] is 0 (or "")		return false		_["mary"] = 0 + 1 = 1
_["paul"] is 0 (or "")		return false		_["paul"] = 0 + 1 = 1
_["mary"] is 1			return true		_["mary"] = 1 + 1 = 2
_["john"] is 1			return true		_["john"] = 1 + 1 = 2
_["john"] is 2			return true		_["john"] = 2 + 1 = 3
_["phil"] is 0 (or "")		return false		_["phil"] = 0 + 1 = 1
_["paul"] is 1			return true		_["phil"] = 1 + 1 = 2
A line returns false only the first time it occurs, so only then it will be printed:
Code:
john
mary
paul
phil
 
1 members found this post helpful.
Old 10-20-2012, 10:45 AM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
You can find another explanation for it here, as entry #43:

http://www.catonmat.net/blog/awk-one...ined-part-two/

The whole series is very educational
 
  


Reply

Tags
awk



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Could someone please explain the concept of associative arrays in AWK programming? AJAY E Linux - Newbie 6 05-27-2012 07:01 PM
[SOLVED] Explain; 1rwxrwxrwx 1 root 4 Aug 16 14:20 awk -> gawk panda12 Linux - General 5 07-30-2009 02:02 AM
Can anyone explain what program language uses this syntax? pr&int martinwprior Programming 3 05-07-2008 03:44 AM
Can someone explain this syntax please? stefaandk Linux - General 1 09-27-2006 08:27 AM
Please help to explain the syntax for 'update-alternatives' davidas Linux - Newbie 1 04-11-2004 06:22 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:21 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration