LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-17-2010, 01:55 PM   #1
staticd
LQ Newbie
 
Registered: Jan 2010
Distribution: _MANY_
Posts: 9

Rep: Reputation: 0
Post gawk string sorting


I have a text file with letters that are all CAP and have no space in between them. I wanted to write a gawk script that pulls out only the letters that are in the "string" and prints them to a new file. After much strife with the asort() function, which produced unpredictable results (gawk version 3.1.3, supposedly this is fixed in version 3.1.7), I wrote a script that gives me reliable and consistent results, so I wanted to share.

Here's the pseudo-code:

Code:
awk 'BEGIN
{
FS = "";
RS = "\n"
}; {
for(i=65;i<=90;i++)
{
letters[i]=sprintf("%c",i)
};
for(j=65;j<=90;j++)
{
match_letters[j]=0
};
for(k=1;k<=NF;k++)
{
for(m=65;m<=90;m++)
{
if($k ~ letters[m])
{
match_letters[m]++
}
}
};
for(n=65;n<=90;n++)
{
if(match_letters[n] > 0)
{
printf"%c",n
}
};
printf"\n"
}' input_file > output_file
The idea is that there is a string with no spaces and you only want to know what letters are in the string and then print them in order. The major problem with awk is that you cannot evaluate characters as an integer.

Last edited by staticd; 02-09-2010 at 10:51 PM. Reason: correct code format
 
Old 01-19-2010, 02:44 AM   #2
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Rep: Reputation: 31
Did you just want to count the stat of appearances of some characters in a file?


% cat a.awk
BEGIN {
FS = ""
RS = "\n"
}

{
for (i = 1; i <= NF; i++) letter_cnt[$i] ++;
}

END {
n = length(your_str)
for(i = 1; i <= n; i++) {
ch = substr(your_str,i,1)
if (ch in letter_cnt) print "'" ch "' : " letter_cnt[ch]
}
}


% gawk -vyour_str="ABC" -f a.awk inputfile
 
Old 01-19-2010, 03:06 AM   #3
Jerry Mcguire
Member
 
Registered: Jul 2009
Location: Hong Kong SAR
Distribution: RedHat, Fedora
Posts: 201

Rep: Reputation: 31
or ?

BEGIN {
FS = ""
n = length(your_str)
for (i=1; i<=n; i++) target_ch[substr(your_str,i,1)] = 0;
}

{
for (i=1; i<=NF; i++) if ($i in target_ch) printf "%c", $i
printf "\n"
}
 
Old 01-19-2010, 06:12 AM   #4
konsolebox
Senior Member
 
Registered: Oct 2005
Distribution: Gentoo, Slackware, LFS
Posts: 2,248
Blog Entries: 8

Rep: Reputation: 235Reputation: 235Reputation: 235
I just found an awk version of a favorite algorithm of mine - quicksort. I learned the algorithm from a book and I ported it to my bash scripts.

http://en.literateprograms.org/Quicksort_%28AWK%29
 
Old 01-19-2010, 06:29 AM   #5
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Staticd: Your psuedo-code would be more readable if you used indentation.
 
Old 01-19-2010, 08:05 AM   #6
staticd
LQ Newbie
 
Registered: Jan 2010
Distribution: _MANY_
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by jschiwal View Post
Staticd: Your psuedo-code would be more readable if you used indentation.
Indeed, I used the appropriate indentation but it must have been lost in translation. I tried several times and the indentation tags never held. I think I know what I did wrong now. The indentation tags have to go before and after each line that must be indented?
 
Old 01-21-2010, 02:13 AM   #7
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
Spaces and tabs are preserved in the CODE blocks.
 
Old 01-21-2010, 09:53 AM   #8
staticd
LQ Newbie
 
Registered: Jan 2010
Distribution: _MANY_
Posts: 9

Original Poster
Rep: Reputation: 0
Wink

Quote:
Originally Posted by jschiwal View Post
Spaces and tabs are preserved in the CODE blocks.
got it.
 
Old 01-21-2010, 09:55 AM   #9
staticd
LQ Newbie
 
Registered: Jan 2010
Distribution: _MANY_
Posts: 9

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by konsolebox View Post
I just found an awk version of a favorite algorithm of mine - quicksort. I learned the algorithm from a book and I ported it to my bash scripts.

http://en.literateprograms.org/Quicksort_%28AWK%29
Thank you! I will have to see if I can work this into my script...
 
Old 02-10-2010, 08:06 AM   #10
staticd
LQ Newbie
 
Registered: Jan 2010
Distribution: _MANY_
Posts: 9

Original Poster
Rep: Reputation: 0
Cool

Quote:
Originally Posted by jschiwal View Post
Staticd: Your psuedo-code would be more readable if you used indentation.
Took me a while, but I got it. Thanks for being critical...
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with pattern matching, sorting data with awk/gawk or perl placem Programming 2 09-11-2008 02:26 PM
gawk help.... visitnag Linux - Newbie 1 04-12-2008 11:55 AM
getting gawk to operate on a string in a bash script Lordandmaker Linux - Newbie 3 02-26-2007 12:01 PM
gawk question luxpops Programming 1 09-12-2004 04:46 AM
FS=? in gawk realos Programming 2 05-28-2003 07:30 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:12 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration