Help with pattern matching, sorting data with awk/gawk or perl
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Help with pattern matching, sorting data with awk/gawk or perl
I am new to awk/gawk. I am trying to re-sort some data.
Name District Start End GAP Status
Tom 6 326 332 0.24 1
Jane 6 362 368 0.24 1
Doe 6 878 884 0.24 1
Paul 6 934 940 0.24 1
Pete 6 2396 2402 0.441 1
Caleb 6 2458 2463 0.441 1
Peter 6 3153 3147 0.441 1
Pan 6 3198 3192 0.441 1
Jake 6 3270 3264 0.441 1
John 6 3497 3503 0.778 1
Chris 6 3675 3681 0.778 1
June 6 3896 3901 0.778 1
Apri 6 4031 4037 0.778 1
Jan 7 22346 22340 -0.101 -1
March 7 22662 22668 -0.101 -1
May 7 22800 22794 -0.101 -1
Dec 7 22885 22879 -0.101 -1
Feb 7 26927 26933 -0.281 -1
The list is endless. I would like to create a table summarizing this data for each district, based on grouping the data in the GAP column....such that I have, the district name, range of start and end for which the GAP data is the same and print the corresponding status. Then I also want to print the names from column 1 corresponding to each range.
Example of output:
District Start End Range Status Names in range
6 326 940 326-940 1 Tom, Jane, Doe, Paul
6 2396 3264 2396-3264 1 Pete, Caleb,Peter,Pan, Jake
For Perl I'd use a hash with 1st key on GAP, then HoA (ie same hash) for names, HoH named key for max/min values encountered per GAP.
Incidentally, min/max is equiv to range, so you don't need both..
Is it a given that neither Status nor District can change within a 'GAP'?
Your example seems to imply that.
Otherwise, you'd need to specify a rule for each occurrence.
How many records are there?
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.