LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Software (https://www.linuxquestions.org/questions/linux-software-2/)
-   -   Quick question about uniq (https://www.linuxquestions.org/questions/linux-software-2/quick-question-about-uniq-611758/)

Seventh 01-07-2008 10:11 AM

Quick question about uniq
 
I hope this is the right forum for this, if not, my apologies.

I'm using 'uniq' to pull unique lines out of a textfile. I have a list that's about 8000 lines long, and I just want to strip out the duplicates.

I'm running it as 'uniq -i -u input.txt > output.txt', which should ignore the case and only print unique lines. However the output file is still showing duplicates. I'm not sure, but I have a feeling it might be due to whitespaces trailing/preceeding the entries in the original list.

If anyone could shed some light on how I would go about only outputting uniques, or how to ignore whitespaces - or a better tool for doing just that, I'd really appreciate it. Thanks in advance. :)

makyo 01-07-2008 10:27 AM

Hi.

I'm guessing that you are running into this situation:
Quote:

The input need not be sorted, but repeated input lines are detected
only if they are adjacent. If you want to discard non-adjacent
duplicate lines, perhaps you want to use `sort -u'.
-- excerpt from info coreutils uniq
If you do not wish to sort the file, then you will need to use awk, perl, etc., to read the file, mark the duplicates and then print the unique items.

If you cannot do that yourself, then I suggest you search the forums. If you still cannot find something, then I trust that someone will stop in and provide such a script.

However, if your lines are adjacent, then perhaps it is the whitespace, in which case, we might need to normalize the whitespace -- e.g. turn all runs of space and TABS into a single space -- command tr might help with that. It may turn out that we'd need to see a sample ... cheers, makyo


All times are GMT -5. The time now is 01:32 PM.