LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 11-03-2017, 09:53 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,814

Rep: Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639
Sorting with awk


I have a file consisting of personal first names, one name per line.

With this InFile ...
Code:
Albert
Albert
Charles
Bernard
Albert
David
Edward
Charles
It is desired to make a frequency table.

This code ...
Code:
 sort  $InFile  \
|uniq -c        \
|sort -nrk1     \
>$OutFile1
... produces the desired result ...
Code:
      3 Albert
      2 Charles
      1 Edward
      1 David
      1 Bernard
This awk ...
Code:
awk '{NameCount[$0]++}  \
  END{for (Name in NameCount)
        print NameCount[Name],Name}'  \
  $InFile >$OutFile2
... produced a table with correct counts ...
Code:
1 Edward
1 David
2 Charles
3 Albert
1 Bernard
... but it is not sorted in descending order.

I've fumbled with asort and asorti without success. Please advise.

Daniel B. Martin

.
 
Old 11-03-2017, 11:46 AM   #2
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,523
Blog Entries: 3

Rep: Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786
You have to use asort() or asorti() to create a second, sorted array and refer to that when showing your results:

Code:
#!/usr/bin/awk -f

/^[[:alnum:]]/ { NameCount[$0]++ }

END {
        n = asorti(NameCount, NameSorted)
        for ( i=1; i<=n; i++ ) {
                print NameCount[NameSorted[i]], NameSorted[i] 
        }
}
 
1 members found this post helpful.
Old 11-03-2017, 12:30 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,814

Original Poster
Rep: Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639
Quote:
Originally Posted by Turbocapitalist View Post
Code:
#!/usr/bin/awk -f

/^[[:alnum:]]/ { NameCount[$0]++ }

END {
        n = asorti(NameCount, NameSorted)
        for ( i=1; i<=n; i++ ) {
                print NameCount[NameSorted[i]], NameSorted[i] 
        }
}
I made changes so your code fits my bash shell. Hope I didn't
inadvertently introduce a bug...

With this InFile ...
Code:
Albert
Albert
Charles
Bernard
Albert
David
Edward
Charles
... this awk ...
Code:
awk '{NameCount[$0]++}
   END{n=asorti(NameCount,NameSorted)
      for (i=1;i<=n;i++)
      print NameCount[NameSorted[i]],NameSorted[i]}' \
$InFile >$OutFile7
... produced this OutFile ...
Code:
3 Albert
1 Bernard
2 Charles
1 David
1 Edward
The OutFile lines have been sorted on name but not count. One of my failed attempts produced the same result. Can you fix?

Daniel B. Martin

.
 
Old 11-03-2017, 12:37 PM   #4
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,523
Blog Entries: 3

Rep: Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786
Quote:
Originally Posted by danielbmartin View Post
Can you fix?
Yes, but you are quite close already.

https://www.gnu.org/software/gawk/ma...ting-Functions

Remember that asort() and sort() are not portable and only part of gawk, not the other AWK interpreters.
 
Old 11-03-2017, 06:33 PM   #5
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Adding the highlighted line to your original code:
Code:
awk '{NameCount[$0]++}  \
  END{
       PROCINFO["sorted_in"]="@val_num_desc"
       for (Name in NameCount)
          print NameCount[Name],Name
     }' $InFile >$OutFile2
Yields:
Code:
3 Albert
2 Charles
1 Edward
1 David
1 Bernard
From here: https://www.gnu.org/software/gawk/ma...-sorted_005fin
 
2 members found this post helpful.
Old 11-03-2017, 06:51 PM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,814

Original Poster
Rep: Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639
Quote:
Originally Posted by norobro View Post
Code:
awk '{NameCount[$0]++}  \
  END{
       PROCINFO["sorted_in"]="@val_num_desc"
       for (Name in NameCount)
          print NameCount[Name],Name
     }' $InFile >$OutFile2
Yields:
Code:
3 Albert
2 Charles
1 Edward
1 David
1 Bernard
Lovely! Thank you!

Daniel B. Martin

.
 
Old 11-03-2017, 10:16 PM   #7
Turbocapitalist
LQ Guru
 
Registered: Apr 2005
Distribution: Linux Mint, Devuan, OpenBSD
Posts: 5,523
Blog Entries: 3

Rep: Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786Reputation: 2786
The tuning with PROCINFO is a very nice tip. The values listed there can even be passed to asort() or asorti()

Code:
n = asorti(NameCount, NameSorted, "@val_num_desc")
or

Code:
n = asorti(NameCount, NameSorted, "@val_num_asc")
See "man gawk"

That part of the manual also points to the variables that gawk passes to custom sorting functions if they are used. A custom sorting function is needed if you want it sorted by frequency and the, in the case of a tie, by name:

Code:
#!/usr/bin/awk -f

function sort_by_value(i1, v1, i2, v2, a, b) {
        if (v1 < v2)
                return 1 }
        else if (v1 == v2) {
                if (i1 > i2)
                        return 1
                else
                        return 0
        }
        else {
                return -1 }
}

/^[[:alnum:]]/ { NameCount[$0]++ }

END {
        n = asorti(NameCount, NameSorted, "sort_by_value")
        for ( i=1; i<=n; i++ ) {
                print NameCount[NameSorted[i]], NameSorted[i]
        }
}
The /^[[:alnum:]]/ pattern is needed to keep blank lines, if there are any, from coming into to the results.

Last edited by Turbocapitalist; 11-03-2017 at 10:17 PM.
 
Old 11-04-2017, 10:51 AM   #8
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,814

Original Poster
Rep: Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639Reputation: 639
Quote:
Originally Posted by Turbocapitalist View Post
The tuning with PROCINFO is a very nice tip. The values listed there can even be passed to asort() or asorti().
This language construct streamlines the code.

I tried to "rep" you but LQ software requires me to "spread it around." That's an annoying limitation. A good post deserves recognition!

Daniel B. Martin
 
  


Reply

Tags
asort, asorti, awk, sorting


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash Shell, awk and sorting. nenokmagic Linux - Newbie 5 03-17-2016 01:55 AM
awk and sorting BerzinTehvs Linux - Software 8 07-31-2010 08:44 AM
Help with pattern matching, sorting data with awk/gawk or perl placem Programming 2 09-11-2008 02:26 PM
awk sort function not sorting from lowest to highest. skuz_ball Programming 4 08-09-2008 12:20 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:34 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration