[SOLVED] CUT | SORT | UNIQ -D | Line number of original file?
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Introduction to Linux - A Hands on Guide
This guide was created as an overview of the Linux Operating System, geared toward new users as an exploration tour and getting started guide, with exercises at the end of each chapter.
For more advanced trainees it can be a desktop reference, and a collection of the base knowledge needed to proceed with system and network administration. This book contains many real life examples derived from the author's experience as a Linux system and network administrator, trainer and consultant. They hope these examples will help you to get a better understanding of the Linux system and that you feel encouraged to try out things on your own.
Click Here to receive this Complete Guide absolutely free.
CUT | SORT | UNIQ -D | Line number of original file?
Hi I'm trying to get this some way.. hope its not challenging for others.. I have just started to learn shell..
Below is my reqmnt..
I'm taking out uniq lines and duplicate lines for the categorized fields in cut, with the number of times duplicates occured group by the categorized fields. But the challenge is I once I sort and find the uniq and duplicates with count(dups) seperated I should take their line number of the first occurence from the original file and tag it any where in below output. Not sure if I have explained what I need rightly.. any thoughts to print the line number whereas ignore-chars in uniq removes the char from printing itself.
I need to take out duplicates and count(each duplicates categorized by f1,2,5,14). Then insert into database with the first duplicate occurence record entire fields and tag the count(dups) in another column. For this I need to cut all the 4 mentioned fields and sort and find the dups using uniq -d and for counts I used -c. Now again coming back after all sorting out of dups and it counts I need the output to be in the below form.
Whereas three being the number of repeated dups for f1,2,5,14 and rest of the fields can be from any of the dup rows.
By this way dups should be removed from the original file and show in the above format.
And the remaining in the original file will be uniq ones they go as it is...
What I have done is.. let me not confuse.. this needs a different point of view.. and my brain is clinging on my approach.. need a cigar..
The first field contains the line number of the duplicate line. The second field contains the line number of the original. Now for each line in field 1, tag with value of field 2. You could do this in a for loop or perhaps generate a SED or AWK script from these values.
grep -v -f firstdupes alldupes | cut -f1 | sed 's# *\([[:digit:]]*\)#/\1/d#' >removedupes.sed
sed -f removedupes.sed test
Then you could create a separate script where instead of deleting, that line is printed.
grep -v -f firstdupes alldupes | cut -f1 | sed 's# *\([[:digit:]]*\)#/\1/p#' savedupes.sed
sed -n -f savedupes.sed test
If you have a problem explaining the problem, you will have an even harder time finding a solution. Try to define the problem (to yourself) so it is crystal clear. After that, the solution will be easier to find.
Your posted sample is small, and doesn't have duplicates. Maybe this one would be better:
I would rather solve it in perl or awk. You can read the lines, parse and sort them, whatever you want. A comment: we do not need to sort lines (in perl) to find duplicates and count occurences. We just need to read lines and use a counter. Finally you can select lines by counter or sort by any key. Also I do not know if first occurrence is important, but it is also an additional variable, so one additional line in the script (storing the line numbers is also similar)
Last edited by pan64; 04-21-2012 at 12:20 PM.