LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-09-2009, 02:50 PM   #1
mbag102
LQ Newbie
 
Registered: Oct 2009
Posts: 7

Rep: Reputation: 0
Linux sorting unique help?


I am grepping thru a file and want to sort the uniques by where they first appear in the file. for instance.

ABCDEFG
1234567

ABCDEFG
1234567

ABCDEFGH
234456

ABCDEFGH
234456

I want the output to be
ABCDEFG
1234567
ABCDEFGH
234456

how would i do this?
 
Old 10-09-2009, 03:14 PM   #2
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,576
Blog Entries: 31

Rep: Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195Reputation: 1195
What have you tried?
 
Old 10-09-2009, 03:17 PM   #3
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 102Reputation: 102
Quote:
Originally Posted by mbag102 View Post
I am grepping thru a file and want to sort the uniques by where they first appear in the file. for instance.

ABCDEFG
1234567

ABCDEFG
1234567

ABCDEFGH
234456

ABCDEFGH
234456

I want the output to be
ABCDEFG
1234567
ABCDEFGH
234456

how would i do this?
If you really want what you say you want, then the order in which they are printed out is the same as their order in the file. But since you have posted, I know what you say you want is not what you really want, so you need to explain what you really want, in plain English.
 
Old 10-09-2009, 03:18 PM   #4
mbag102
LQ Newbie
 
Registered: Oct 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by catkin View Post
What have you tried?
i have tried using a simple grep -u but that readjusts the ordering of my pairs. In some cases it will give me

ABCDEFG
223456
ABCDEFGH
1234567

I need some sort of command that will care about order. I was thinking of using grep to do from line x to y and having it do all the uniques but im not entirely sure how this would work or if it is even possible

Last edited by mbag102; 10-09-2009 at 03:22 PM.
 
Old 10-09-2009, 03:20 PM   #5
mbag102
LQ Newbie
 
Registered: Oct 2009
Posts: 7

Original Poster
Rep: Reputation: 0
"If you really want what you say you want, then the order in which they are printed out is the same as their order in the file. But since you have posted, I know what you say you want is not what you really want, so you need to explain what you really want, in plain English."

I was told that grep -u doesn't care about order found. Is that true or does it print the uniques from where they are first discovered, in that order?

Last edited by mbag102; 10-09-2009 at 03:21 PM.
 
Old 10-09-2009, 03:40 PM   #6
ramram29
Member
 
Registered: Jul 2003
Location: Miami, Florida, USA
Distribution: Debian
Posts: 848
Blog Entries: 1

Rep: Reputation: 47
I've done this before. You can create a while loop to read this contet from a text file; loop once for the first occurance ABCDEFG and twice for the second occurance 1234567. Reset the counter with an if statement after the second loop, then keep going till you reach the end of the text file. Put the results in variables then append them to another temporary text file, the results being side by side with a space. At the end run the sort command.
 
Old 10-09-2009, 03:45 PM   #7
mbag102
LQ Newbie
 
Registered: Oct 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by ramram29 View Post
I've done this before. You can create a while loop to read this contet from a text file; loop once for the first occurance ABCDEFG and twice for the second occurance 1234567. Reset the counter with an if statement after the second loop, then keep going till you reach the end of the text file. Put the results in variables then append them to another temporary text file, the results being side by side with a space. At the end run the sort command.
oh that makes sense. thanks alot ramram. i'll try that out
 
Old 10-09-2009, 04:08 PM   #8
johnsfine
LQ Guru
 
Registered: Dec 2007
Distribution: Centos
Posts: 5,286

Rep: Reputation: 1181Reputation: 1181Reputation: 1181Reputation: 1181Reputation: 1181Reputation: 1181Reputation: 1181Reputation: 1181Reputation: 1181
There are plenty of examples of sed or other methods for adding and removing line numbers from files. (I don't use such things myself so I don't know which is best, but many versions are easy to find with google).
Then use sort twice.

1) Add line numbers.
2) sort -u --key=2
3) sort -n
4) Remove the line numbers.
 
Old 10-09-2009, 04:10 PM   #9
john test
Member
 
Registered: Jul 2008
Distribution: ubuntu 9.10
Posts: 527
Blog Entries: 1

Rep: Reputation: 35
Can you read each line into sequentially numbered variables
append variable 1 to a file
compare variable 2 to variable 1 if not equeal append to file
compare variable 3 to vareiables 1 and 2 if not equal append to file
 
Old 10-09-2009, 05:20 PM   #10
mbag102
LQ Newbie
 
Registered: Oct 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Let me rephrase the question. I don't think i have given you guys the proper description of what is really going on.

I am ultimately trying to sort multiple lines that stay together.

Here is a better example.

John Doe
Phone number = (123) 456 7890
Address = 1 main street USA


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA


Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

i need the output


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

does this make a little more sense? I am grouping multiple lines together which must all be sorted uniquely
 
Old 10-09-2009, 08:04 PM   #11
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 333

Rep: Reputation: 141Reputation: 141
From http://sed.sourceforge.net/sed1line.txt
Code:
SELECTIVE DELETION OF CERTAIN LINES:
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'
But trying it gave me the error
'sed: -e expression #1, char 34: Invalid range end'

However, after it was modified it worked with both of your examples
Code:
sed -n 'G; s/\n/&&/; /^\([[:print:]]*\n\).*\n\1/d; s/\n//; h; P'
EDIT
Problem: if two otherwise unique line-groups contained an identical line it would be deleted from one of the line-groups. So to use this sed method each line-group would have to be temporarily put onto one line, for example as comma separated fields.
Code:
sed -n '/./{:a N;/\n$/!s/\n/,/; ta; s/\n$//;p}' infile |
sed -n 'G; s/\n/&&/; /^\([[:print:]]*\n\).*\n\1/d; s/\n//; h; P'|
sed 's/,\|$/\n/g' > outfile
The first sed creates comma separated fields:-
John Doe,Phone number = (123) 456 7890,Address = 1 main street USA
John Doe,Phone number = (123) 456 7890,Address = 1 main street USA
Jane Doe,Phone number = (098) 765 4321,Address = 9 side street USA
Jane Doe,Phone number = (098) 765 4321,Address = 9 side street USA

The second sed deletes duplicate lines.
The third sed converts the commas back to newlines and puts a blank line after each line-group:-
John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

Last edited by Kenhelm; 10-09-2009 at 11:45 PM.
 
Old 10-09-2009, 09:47 PM   #12
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 102Reputation: 102
Quote:
Originally Posted by mbag102 View Post
Let me rephrase the question. I don't think i have given you guys the proper description of what is really going on.

I am ultimately trying to sort multiple lines that stay together.

Here is a better example.

John Doe
Phone number = (123) 456 7890
Address = 1 main street USA


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA


Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

i need the output


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

does this make a little more sense? I am grouping multiple lines together which must all be sorted uniquely
That's more like it. Now I have two more questions -- (1) is the above a hypothetical example or is it the actual format of your file? If the latter, the problem is trivial to solve.

Question (2): as with all such sorts, you need to formally specify the criteria for the sort. Is it last name, first name, address, or something else?

The thing about computer programming is that you need to be precise in your thinking and your descriptions.
 
Old 10-10-2009, 01:12 AM   #13
mbag102
LQ Newbie
 
Registered: Oct 2009
Posts: 7

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by lutusp View Post
That's more like it. Now I have two more questions -- (1) is the above a hypothetical example or is it the actual format of your file? If the latter, the problem is trivial to solve.

Question (2): as with all such sorts, you need to formally specify the criteria for the sort. Is it last name, first name, address, or something else?

The thing about computer programming is that you need to be precise in your thinking and your descriptions.

This is a hypothetical example. If I gave the exact example it would be very confusing for you to figure out exactly what was going on. The post above yours looks like it could be pretty close to what I am looking for. I wish I was a little better with awk and sed because I know they are very powerful tools for problems like this.
 
Old 10-10-2009, 01:17 AM   #14
jstephens84
Senior Member
 
Registered: Sep 2004
Location: Nashville
Distribution: Manjaro, RHEL, CentOS
Posts: 2,098

Rep: Reputation: 102Reputation: 102
What about using the uniq command. this is what man says about uniq which I believe might help you out.

uniq

Uniquify files, write out the unique lines from the given InputFile.
If an InputFile of `-' (or nothing) is given, then uniq will read from standard input.
 
Old 10-10-2009, 02:49 AM   #15
lutusp
Member
 
Registered: Sep 2009
Distribution: Fedora
Posts: 835

Rep: Reputation: 102Reputation: 102
Quote:
Originally Posted by mbag102 View Post
This is a hypothetical example. If I gave the exact example it would be very confusing for you to figure out exactly what was going on.
In that case, you are on your own. Without the exact same data you are trying to process, we cannot help you.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Unique Sorting Of Lists And Lists Of Lists With Perl For Linux Or Unix LXer Syndicated Linux News 0 09-05-2008 02:50 PM
Linux - Determine number of unique hosts connected to server linux_linux Linux - Networking 2 03-15-2008 10:41 PM
Linux File Sorting linuxgodrh Linux - General 7 08-01-2007 10:33 AM
LXer: The unique relationship between Hollywood Movies and Linux LXer Syndicated Linux News 0 10-18-2006 01:54 PM
MSN internet on linux (unique) Almack Linux - Networking 1 06-26-2004 03:34 PM


All times are GMT -5. The time now is 12:39 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration