LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Linux sorting unique help? (https://www.linuxquestions.org/questions/linux-newbie-8/linux-sorting-unique-help-760830/)

mbag102 10-09-2009 01:50 PM

Linux sorting unique help?
 
I am grepping thru a file and want to sort the uniques by where they first appear in the file. for instance.

ABCDEFG
1234567

ABCDEFG
1234567

ABCDEFGH
234456

ABCDEFGH
234456

I want the output to be
ABCDEFG
1234567
ABCDEFGH
234456

how would i do this?

catkin 10-09-2009 02:14 PM

What have you tried?

lutusp 10-09-2009 02:17 PM

Quote:

Originally Posted by mbag102 (Post 3713771)
I am grepping thru a file and want to sort the uniques by where they first appear in the file. for instance.

ABCDEFG
1234567

ABCDEFG
1234567

ABCDEFGH
234456

ABCDEFGH
234456

I want the output to be
ABCDEFG
1234567
ABCDEFGH
234456

how would i do this?

If you really want what you say you want, then the order in which they are printed out is the same as their order in the file. But since you have posted, I know what you say you want is not what you really want, so you need to explain what you really want, in plain English.

mbag102 10-09-2009 02:18 PM

Quote:

Originally Posted by catkin (Post 3713796)
What have you tried?

i have tried using a simple grep -u but that readjusts the ordering of my pairs. In some cases it will give me

ABCDEFG
223456
ABCDEFGH
1234567

I need some sort of command that will care about order. I was thinking of using grep to do from line x to y and having it do all the uniques but im not entirely sure how this would work or if it is even possible

mbag102 10-09-2009 02:20 PM

"If you really want what you say you want, then the order in which they are printed out is the same as their order in the file. But since you have posted, I know what you say you want is not what you really want, so you need to explain what you really want, in plain English."

I was told that grep -u doesn't care about order found. Is that true or does it print the uniques from where they are first discovered, in that order?

ramram29 10-09-2009 02:40 PM

I've done this before. You can create a while loop to read this contet from a text file; loop once for the first occurance ABCDEFG and twice for the second occurance 1234567. Reset the counter with an if statement after the second loop, then keep going till you reach the end of the text file. Put the results in variables then append them to another temporary text file, the results being side by side with a space. At the end run the sort command.

mbag102 10-09-2009 02:45 PM

Quote:

Originally Posted by ramram29 (Post 3713831)
I've done this before. You can create a while loop to read this contet from a text file; loop once for the first occurance ABCDEFG and twice for the second occurance 1234567. Reset the counter with an if statement after the second loop, then keep going till you reach the end of the text file. Put the results in variables then append them to another temporary text file, the results being side by side with a space. At the end run the sort command.

oh that makes sense. thanks alot ramram. i'll try that out

johnsfine 10-09-2009 03:08 PM

There are plenty of examples of sed or other methods for adding and removing line numbers from files. (I don't use such things myself so I don't know which is best, but many versions are easy to find with google).
Then use sort twice.

1) Add line numbers.
2) sort -u --key=2
3) sort -n
4) Remove the line numbers.

john test 10-09-2009 03:10 PM

Can you read each line into sequentially numbered variables
append variable 1 to a file
compare variable 2 to variable 1 if not equeal append to file
compare variable 3 to vareiables 1 and 2 if not equal append to file

mbag102 10-09-2009 04:20 PM

Let me rephrase the question. I don't think i have given you guys the proper description of what is really going on.

I am ultimately trying to sort multiple lines that stay together.

Here is a better example.

John Doe
Phone number = (123) 456 7890
Address = 1 main street USA


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA


Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

i need the output


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

does this make a little more sense? I am grouping multiple lines together which must all be sorted uniquely

Kenhelm 10-09-2009 07:04 PM

From http://sed.sourceforge.net/sed1line.txt
Code:

SELECTIVE DELETION OF CERTAIN LINES:
# delete duplicate, nonconsecutive lines from a file. Beware not to
# overflow the buffer size of the hold space, or else use GNU sed.
sed -n 'G; s/\n/&&/; /^\([ -~]*\n\).*\n\1/d; s/\n//; h; P'

But trying it gave me the error
'sed: -e expression #1, char 34: Invalid range end'

However, after it was modified it worked with both of your examples
Code:

sed -n 'G; s/\n/&&/; /^\([[:print:]]*\n\).*\n\1/d; s/\n//; h; P'
EDIT
Problem: if two otherwise unique line-groups contained an identical line it would be deleted from one of the line-groups. So to use this sed method each line-group would have to be temporarily put onto one line, for example as comma separated fields.
Code:

sed -n '/./{:a N;/\n$/!s/\n/,/; ta; s/\n$//;p}' infile |
sed -n 'G; s/\n/&&/; /^\([[:print:]]*\n\).*\n\1/d; s/\n//; h; P'|
sed 's/,\|$/\n/g' > outfile

The first sed creates comma separated fields:-
John Doe,Phone number = (123) 456 7890,Address = 1 main street USA
John Doe,Phone number = (123) 456 7890,Address = 1 main street USA
Jane Doe,Phone number = (098) 765 4321,Address = 9 side street USA
Jane Doe,Phone number = (098) 765 4321,Address = 9 side street USA

The second sed deletes duplicate lines.
The third sed converts the commas back to newlines and puts a blank line after each line-group:-
John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

lutusp 10-09-2009 08:47 PM

Quote:

Originally Posted by mbag102 (Post 3713932)
Let me rephrase the question. I don't think i have given you guys the proper description of what is really going on.

I am ultimately trying to sort multiple lines that stay together.

Here is a better example.

John Doe
Phone number = (123) 456 7890
Address = 1 main street USA


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA


Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

i need the output


John Doe
Phone number = (123) 456 7890
Address = 1 main street USA

Jane Doe
Phone number = (098) 765 4321
Address = 9 side street USA

does this make a little more sense? I am grouping multiple lines together which must all be sorted uniquely

That's more like it. Now I have two more questions -- (1) is the above a hypothetical example or is it the actual format of your file? If the latter, the problem is trivial to solve.

Question (2): as with all such sorts, you need to formally specify the criteria for the sort. Is it last name, first name, address, or something else?

The thing about computer programming is that you need to be precise in your thinking and your descriptions.

mbag102 10-10-2009 12:12 AM

Quote:

Originally Posted by lutusp (Post 3714102)
That's more like it. Now I have two more questions -- (1) is the above a hypothetical example or is it the actual format of your file? If the latter, the problem is trivial to solve.

Question (2): as with all such sorts, you need to formally specify the criteria for the sort. Is it last name, first name, address, or something else?

The thing about computer programming is that you need to be precise in your thinking and your descriptions.


This is a hypothetical example. If I gave the exact example it would be very confusing for you to figure out exactly what was going on. The post above yours looks like it could be pretty close to what I am looking for. I wish I was a little better with awk and sed because I know they are very powerful tools for problems like this.

jstephens84 10-10-2009 12:17 AM

What about using the uniq command. this is what man says about uniq which I believe might help you out.

uniq

Uniquify files, write out the unique lines from the given InputFile.
If an InputFile of `-' (or nothing) is given, then uniq will read from standard input.

lutusp 10-10-2009 01:49 AM

Quote:

Originally Posted by mbag102 (Post 3714203)
This is a hypothetical example. If I gave the exact example it would be very confusing for you to figure out exactly what was going on.

In that case, you are on your own. Without the exact same data you are trying to process, we cannot help you.


All times are GMT -5. The time now is 10:38 AM.