LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-25-2012, 06:39 AM   #1
hubleo
LQ Newbie
 
Registered: Apr 2012
Posts: 2

Rep: Reputation: Disabled
Selecting lowest and highest values in columns 1 and 2, based on subsets in column 3


Hi,

I have a file with the following columns:

Code:
      361459	447394	  CHL1
      290282	290282	  CHL1
      361459	447394	  CHL1
      361459	447394	  CHL1
      178352861	178363529 AGA
      178352861	178363529 AGA
      178363657	178363657 AGA
Essentially, using CHL1 as an example. For any line that has CHL1 in column 3, I want to select the lowest value in column 1 , and the highest value in column 2. These should then produce a single line that looks like
Code:
290282 447394 CHL1
.

Using the same principle, for AGA, the line should look like
Code:
178352861 178363657 AGA
.

The whole file contains a about 500 unique names in column 3. CHL1 and AGA would be 2/500. Some sort of loop to run through each of these names would be perfect. I'm very new to Linux and have a bit of knowledge in very basic loops using AWK, sed, grep ect but I'm unsure how to maniplulate the file to get the output as stated above.

Any help would be very much appreciated!
 
Old 04-25-2012, 06:47 AM   #2
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 23,685

Rep: Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820
that can be easily done with awk.
use two arrays, one for min and the other for max values. So you will have the result as minval[AGA] and maxval[AGA], minval[CHL1], maxval[CHL1] and so on.
I would try something like this:
Code:
awk ' 
# $3 is column 3
# if not exists minval[$3] then minval[$3] = $1
# else
# if $1 < minval[$3] then minval[$3] = $1

# if not exists maxval[$3] then maxval[$3] = $2
# else
# if $2 > maxval[$3] then maxval[$3] = $2

# print minval and maxval in loop
' filename

Last edited by pan64; 04-25-2012 at 08:19 AM.
 
1 members found this post helpful.
Old 04-25-2012, 07:09 AM   #3
anon237
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi,

Have a look at this:
Code:
#!/bin/bash

awk '{
if ( larr[$3] == "" ) {
  larr[$3] = $1
  harr[$3] = $2 
}
if ( larr[$3] != "" ) {
  if ( larr[$3] >= $1 ) { larr[$3] = $1 }
  if ( harr[$3] <= $2 ) { harr[$3] = $2 }
}
}
END { for (x in larr)
     print larr[x], harr[x],x
}' infile
Testrun on data provided:
Code:
$ ./low.high.sh 
290282 447394 CHL1
178352861 178363657 AGA
Hope this helps.
 
2 members found this post helpful.
Old 04-25-2012, 07:22 AM   #4
hubleo
LQ Newbie
 
Registered: Apr 2012
Posts: 2

Original Poster
Rep: Reputation: Disabled
Smile

Hi, thank you both for your quick replies! I didn't think I'd get responses that quickly!

pan64, I tried your method but it threw up a couple of errors - I probably needed to change some extra things in the script!
Drunna, that was fantastic. I just ran it on my data and the whole 500 lines I needed came out perfectly. Many thanks for your help
 
Old 04-25-2012, 07:24 AM   #5
anon237
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
You're welcome
 
Old 04-25-2012, 07:25 AM   #6
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594

Rep: Reputation: 141Reputation: 141
Maybe someone knows how to remove the newlines in awk except the last.
Code:
bash-4.1$ grep "CHL1" test | sort -n | awk 'NR == 1 { print $1" "}END{ print $2" "$3}'| tr -d "\n";echo
290282 447394 CHL1
bash-4.1$

Last edited by whizje; 04-25-2012 at 07:43 AM.
 
Old 04-25-2012, 07:40 AM   #7
whizje
Member
 
Registered: Sep 2008
Location: The Netherlands
Distribution: Slackware64 current
Posts: 594

Rep: Reputation: 141Reputation: 141
The words could be captured with
Code:
bash-4.1$ uniq -f2 test
      361459	447394	  CHL1
      178352861	178363529 AGA
 
Old 04-25-2012, 08:08 AM   #8
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 23,685

Rep: Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820
Quote:
Originally Posted by hubleo View Post
Hi, thank you both for your quick replies! I didn't think I'd get responses that quickly!

pan64, I tried your method but it threw up a couple of errors - I probably needed to change some extra things in the script!
Drunna, that was fantastic. I just ran it on my data and the whole 500 lines I needed came out perfectly. Many thanks for your help
druuna just implemented what I explained (exactly), but didn't use my pseudo code.

Last edited by pan64; 04-25-2012 at 08:20 AM.
 
Old 04-25-2012, 08:17 AM   #9
anon237
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532

Rep: Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405Reputation: 2405
Hi pan64,
Quote:
Originally Posted by pan64 View Post
drunna just implemented what I explained (exactly).
drunna didn't do anything.... druuna did, but he did not use your pseudo code

BTW: There's an error in your pseudo code:
Code:
# if not exists maxval[$3] then minval[$3] = $2
You might want to fix that for future reference.
 
Old 04-25-2012, 08:30 AM   #10
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 23,685

Rep: Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820Reputation: 7820
sorry, fixed
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How to display 2 different column field values as one column value in mysql VijayaRaghavanLakshman Linux - General 2 04-16-2012 10:56 AM
Display all files below a directory path with their size from lowest to highest .... mfarber Linux - Newbie 2 01-24-2012 08:10 PM
[SOLVED] Delete rows based on values in a column using sed captainentropy Linux - Newbie 6 01-19-2011 09:59 AM
Selecting certain parts of a list of columns in BASH mikejreading Linux - Newbie 6 05-07-2009 05:48 AM
awk sort function not sorting from lowest to highest. skuz_ball Programming 4 08-09-2008 01:20 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 12:42 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration