LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-08-2006, 12:04 PM   #1
moo-cow
Member
 
Registered: Mar 2006
Distribution: Debian
Posts: 105

Rep: Reputation: 26
Sort File by Field - but with a Twist! ;)


I have a file like this:

saegh iubiae iabezu PATTERN cbizge atvet faw
efenmi PATTERN beub htp rubwi riwbr
iauebiubg ubneiu PATTERN aoihgr zvezg
...

I want to sort the lines of the file with the field to the right of PATTERN as the sort key. The correctly sorted example file would look like this:

iauebiubg ubneiu PATTERN aoihgr zvezg
efenmi PATTERN beub htp rubwi riwbr
saegh iubiae iabezu PATTERN cbizge atvet faw

Any idea how to accomplish this?
Thanks!
moo-cow
 
Old 06-08-2006, 01:17 PM   #2
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
You could use the following to copy the expression following PATTERN to the beginning of each line:
Code:
sed -e "s/\(.*PATTERN \([^ ]\+\).*\)/\2 \1/"
Then sort the result and remove the copied part again using "cut".
 
Old 06-08-2006, 01:19 PM   #3
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Nice little challenge there.

This works but may not be the most elegant solution:

Code:
for NEXTWORD in `awk -FPATTERN '{print $2}' test |awk '{print $1}' |sort`
do grep " PATTERN $NEXTWORD " filename
done
In the above "filename" would be replaced by whatever your file's name is. "PATTERN" would be whatever your pattern is.

NEXTWORD is an abitrary name for the variable - you can call it BILLYBOB or anything else you prefer.

awk -FPATTERN '{print $2}' says to print anything that occurs after your PATTERN in the file. This of course starts with the next word following PATTERN. (-F tells it to use PATTERN as the delimiter instead of white space).

This is then piped into the next awk which prints only the first word from the previous awk which is the word you were interested in sorting on. (Note this uses white space as the delimiter because as noted above that is the default for awk - if your next word contains any white space you'd have to figure out a different delimiter to use.)

It then sorts the list of next words alphabetically using the sort command.

Finally it greps for any line that contains the next word found by the awk/awk/sort combo that follows directly after your PATTERN (and for good measure puts a space between those and surrounding words so it doesn't accidentally hit on an embedded word).

This will work fine so long as you only have the next word following pattern in your file once. If they appear twice it will still work relative to other next words but the two lines themselves may not be in the order you want.
 
Old 06-08-2006, 01:37 PM   #4
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
A small remark on jlightner's solution: It seems to me that a NEXTWORD appearing twice will also make each corresponding line appear twice in the result, as the file will be grepped twice for NEXTWORD. You can avoid this by piping the output of "sort" through "uniq".
 
Old 06-11-2006, 06:12 PM   #5
moo-cow
Member
 
Registered: Mar 2006
Distribution: Debian
Posts: 105

Original Poster
Rep: Reputation: 26
Works great, thanks for your help!
 
Old 06-12-2006, 09:30 AM   #6
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Quote:
Originally Posted by spirit receiver
A small remark on jlightner's solution: It seems to me that a NEXTWORD appearing twice will also make each corresponding line appear twice in the result, as the file will be grepped twice for NEXTWORD. You can avoid this by piping the output of "sort" through "uniq".
It won't appear twice unless it is in the file twice. I think you're confusing this with the standard "ps -ef |grep WORD" solution where you have to remember to grep out the word grep itself. As an FYI I had tested it against his example before posting it.

Restated: my solution can beat up your solution
 
Old 06-12-2006, 09:48 AM   #7
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
I was talking about the following effect

This is the content of the file to be sorted:
Code:
saegh iubiae iabezu PATTERN cbizge atvet faw
efenmi PATTERN beub htp rubwi riwbr
iauebiubg ubneiu PATTERN aoihgr zvezg
efenmi PATTERN beub faw zvezg
Note that there are two lines with the key "beub". But if your script is applied, it will return four lines with that key:
Code:
iauebiubg ubneiu PATTERN aoihgr zvezg
efenmi PATTERN beub htp rubwi riwbr
efenmi PATTERN beub faw zvezg
efenmi PATTERN beub htp rubwi riwbr
efenmi PATTERN beub faw zvezg
saegh iubiae iabezu PATTERN cbizge atvet faw
Only if I add "uniq" as stated above, the output will look as follows, which is probably what was intended:
Code:
iauebiubg ubneiu PATTERN aoihgr zvezg
efenmi PATTERN beub htp rubwi riwbr
efenmi PATTERN beub faw zvezg
saegh iubiae iabezu PATTERN cbizge atvet faw
To sum up: I win.
 
Old 06-12-2006, 09:57 AM   #8
MensaWater
LQ Guru
 
Registered: May 2005
Location: Atlanta Georgia USA
Distribution: Redhat (RHEL), CentOS, Fedora, CoreOS, Debian, FreeBSD, HP-UX, Solaris, SCO
Posts: 7,831
Blog Entries: 15

Rep: Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669Reputation: 1669
Quote:
To sum up: I win.
Only in a Judo sort of way - you used my awk against me instead of your sed

Actually you made a good point. I was confused by you saying "NEXTWORD twice" because I was thinking you meant I used the variable twice - you meant the word the variable represented could have appeared twice.
 
Old 06-12-2006, 11:26 AM   #9
spirit receiver
Member
 
Registered: May 2006
Location: Frankfurt, Germany
Distribution: SUSE 10.2
Posts: 424

Rep: Reputation: 33
Quote:
Originally Posted by jlightner
Only in a Judo sort of way
I even considered using Voodoo at first, so you should be happy with that.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
How can I sort the lines in a file? windhair Linux - Software 2 11-17-2005 08:37 AM
Sort file based on only ONE colum smart_sagittari Linux - Newbie 6 07-08-2005 12:25 AM
what sort of file attribute is this: b--Srws-wt BrianK Linux - General 6 12-17-2004 11:26 PM
What is the data type field definition to save RTF file? Linux4BC Linux - General 3 06-02-2004 04:19 AM
Reading data from file (field organizzation) eiem Programming 1 03-29-2004 05:03 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:46 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration