LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-10-2008, 08:12 PM   #1
placem
LQ Newbie
 
Registered: Sep 2008
Posts: 1

Rep: Reputation: 0
Help with pattern matching, sorting data with awk/gawk or perl


I am new to awk/gawk. I am trying to re-sort some data.

Name District Start End GAP Status
Tom 6 326 332 0.24 1
Jane 6 362 368 0.24 1
Doe 6 878 884 0.24 1
Paul 6 934 940 0.24 1
Pete 6 2396 2402 0.441 1
Caleb 6 2458 2463 0.441 1
Peter 6 3153 3147 0.441 1
Pan 6 3198 3192 0.441 1
Jake 6 3270 3264 0.441 1
John 6 3497 3503 0.778 1
Chris 6 3675 3681 0.778 1
June 6 3896 3901 0.778 1
Apri 6 4031 4037 0.778 1
Jan 7 22346 22340 -0.101 -1
March 7 22662 22668 -0.101 -1
May 7 22800 22794 -0.101 -1
Dec 7 22885 22879 -0.101 -1
Feb 7 26927 26933 -0.281 -1

The list is endless. I would like to create a table summarizing this data for each district, based on grouping the data in the GAP column....such that I have, the district name, range of start and end for which the GAP data is the same and print the corresponding status. Then I also want to print the names from column 1 corresponding to each range.

Example of output:

District Start End Range Status Names in range
6 326 940 326-940 1 Tom, Jane, Doe, Paul
6 2396 3264 2396-3264 1 Pete, Caleb,Peter,Pan, Jake

Thanks
Placem
 
Old 09-10-2008, 11:32 PM   #2
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,261

Rep: Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028Reputation: 2028
For Perl I'd use a hash with 1st key on GAP, then HoA (ie same hash) for names, HoH named key for max/min values encountered per GAP.
Incidentally, min/max is equiv to range, so you don't need both..
Is it a given that neither Status nor District can change within a 'GAP'?
Your example seems to imply that.
Otherwise, you'd need to specify a rule for each occurrence.
How many records are there?
 
Old 09-11-2008, 02:26 PM   #3
jan61
Member
 
Registered: Jun 2008
Posts: 235

Rep: Reputation: 46
Moin,

I stored your sample in a file data and used it to create a little awk script. It has not full functionality - only to demonstrate a way to do it:
Code:
jan@jack:~/tmp/gap> cat sort_gap.awk
{ if (dist_gaps[$2,$5]=="") {
    dist_gaps[$2,$5]=$5;
    if (gaps[$2]=="") gaps[$2]=$5;
    else gaps[$2]=gaps[$2] " " $5;
  }
  if (names[$2,$5]=="") names[$2,$5]=$1;
  else names[$2,$5]=names[$2,$5] "," $1;
  if (min_arr[$2,$5]=="" || min_arr[$2,$5]>$3) min_arr[$2,$5]=$3;
  if (max_arr[$2,$5]=="" || max_arr[$2,$5]<$4) max_arr[$2,$5]=$4;
}
END {
  for (district in gaps) {
    split(gaps[district], gap_vals);
    for (gap_no in gap_vals) {
      gap=gap_vals[gap_no];
      print district, min_arr[district,gap], "-", max_arr[district,gap], names[district,gap];
    }
  }
}

jan@jack:~/tmp/gap> awk -f sort_gap.awk data
6 326 - 940 Tom,Jane,Doe,Paul
6 2396 - 3264 Pete,Caleb,Peter,Pan,Jake
6 3497 - 4037 John,Chris,June,Apri
7 22346 - 22879 Jan,March,May,Dec
7 26927 - 26933 Feb
hth
Jan
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Perl pattern matching - greedy wondergirl Programming 2 06-17-2008 03:32 PM
Perl pattern matching in VB rigel_kent Programming 1 05-30-2006 11:00 AM
awk/gawk/sed - read lines from file1, comment out or delete matching lines in file2 rascal84 Linux - General 1 05-24-2006 09:19 AM
Perl Pattern Matching Question pete1234 Programming 2 08-27-2005 10:26 AM
pattern matching in perl ludeKing Programming 9 04-02-2004 09:53 AM


All times are GMT -5. The time now is 12:36 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration