LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 01-07-2008, 10:11 AM   #1
Seventh
Member
 
Registered: Dec 2003
Location: Boston, MA
Distribution: Redhat / Debian
Posts: 269

Rep: Reputation: 30
Question Quick question about uniq


I hope this is the right forum for this, if not, my apologies.

I'm using 'uniq' to pull unique lines out of a textfile. I have a list that's about 8000 lines long, and I just want to strip out the duplicates.

I'm running it as 'uniq -i -u input.txt > output.txt', which should ignore the case and only print unique lines. However the output file is still showing duplicates. I'm not sure, but I have a feeling it might be due to whitespaces trailing/preceeding the entries in the original list.

If anyone could shed some light on how I would go about only outputting uniques, or how to ignore whitespaces - or a better tool for doing just that, I'd really appreciate it. Thanks in advance.
 
Old 01-07-2008, 10:27 AM   #2
makyo
Member
 
Registered: Aug 2006
Location: Saint Paul, MN, USA
Distribution: {Free,Open}BSD, CentOS, Debian, Fedora, Solaris, SuSE
Posts: 735

Rep: Reputation: 76
Hi.

I'm guessing that you are running into this situation:
Quote:
The input need not be sorted, but repeated input lines are detected
only if they are adjacent. If you want to discard non-adjacent
duplicate lines, perhaps you want to use `sort -u'.
-- excerpt from info coreutils uniq
If you do not wish to sort the file, then you will need to use awk, perl, etc., to read the file, mark the duplicates and then print the unique items.

If you cannot do that yourself, then I suggest you search the forums. If you still cannot find something, then I trust that someone will stop in and provide such a script.

However, if your lines are adjacent, then perhaps it is the whitespace, in which case, we might need to normalize the whitespace -- e.g. turn all runs of space and TABS into a single space -- command tr might help with that. It may turn out that we'd need to see a sample ... cheers, makyo

Last edited by makyo; 01-07-2008 at 10:41 AM. Reason: Add whitespace information
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Question Concerning ISO's and one quick question. evrae Linux - Software 2 06-21-2004 03:53 AM
uniq in tcl/tk liguorir Linux - Software 1 05-20-2004 10:34 PM
samba smb.config question (quick question) TheDOGG Linux - Networking 1 03-02-2004 07:19 AM
Catch-22 with 'uniq' - sed, awk, another way out? slakmagik Linux - General 5 07-31-2003 09:18 AM
uniq Cyth Linux - General 2 02-19-2003 09:29 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:50 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration