LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 08-22-2013, 03:58 AM   #1
ls_milkyway
LQ Newbie
 
Registered: Aug 2013
Distribution: BT5R2
Posts: 28

Rep: Reputation: Disabled
uniq command not able to remove duplicate entries


Hi,

Removing duplicates using sort and then uniq command is not working on my file, which contains blacklisted urls.For eg.

10.44.56.78
10.44.56.78
10.wrs.org
10.wrs.org
baby.us
baby.us
zym.com
zym.com
.
.
.
So...on

The command
uniq input.txt > output.txt results in:

10.44.56.78
10.44.56.78
10.wrs.org
10.wrs.org
baby.us
zym.com
.
.
.
So...on

whereas, I want output:

10.44.56.78
10.wrs.org
baby.us
zym.com
.
.
.
So...on

Can you plz suggest how to remove these duplicates (ip addresses or integer values)??

Thanks in advance.
 
Old 08-22-2013, 04:03 AM   #2
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,021
Blog Entries: 1

Rep: Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110
Quote:
Originally Posted by ls_milkyway View Post
The command
uniq input.txt > output.txt results in:
No need for '>' redirect...

Code:
uniq input.txt output.txt
Should do it.

The redirect in your command simply told it to write input.txt to output.txt, as you see...

Last edited by astrogeek; 08-22-2013 at 04:12 AM.
 
Old 08-22-2013, 04:20 AM   #3
ls_milkyway
LQ Newbie
 
Registered: Aug 2013
Distribution: BT5R2
Posts: 28

Original Poster
Rep: Reputation: Disabled
Ok! Thanks for quick response.
Well, you mean to say ">" simply redirecting input
But, still I am not able to understand that why duplicate entries of baby.us
zym.com were removed.
 
Old 08-22-2013, 04:27 AM   #4
Firerat
Senior Member
 
Registered: Oct 2008
Distribution: Debian Jessie / sid
Posts: 1,471

Rep: Reputation: 444Reputation: 444Reputation: 444Reputation: 444Reputation: 444
should work.. so are they 'uniq' ?

lets look at uniq --help

tells you that it won't work unless lines are adjacent , it also hints that you can use sort -u

so try
Code:
sort -u
assuming that still doesn't work, we go back to my cryptic question.

Since you mention blacklists, I'm assuming you want to weed out duplicates from several lists, to make one big list.
So it is possible that some of those lists have 'Dos' EOL ( end of line ) while others are 'Unix' EOL

confirm with

Code:
cat -A input.txt
you may see some lines ending $ and some ^M$


so try
Code:
tr -d "\r" < input.txt | sort -u > output.txt
 
1 members found this post helpful.
Old 08-22-2013, 04:31 AM   #5
shm0
Member
 
Registered: Aug 2012
Location: Bahrain
Distribution: Slackware
Posts: 58

Rep: Reputation: 16
Quote:
Originally Posted by ls_milkyway View Post
Can you plz suggest how to remove these duplicates (ip addresses or integer values)??

Thanks in advance.
You must have some extra space/tab or probably some other hidden characters. I tried your list and it works fine, but when I added extra space to one of the lines, it appeared twice. So, remove that extra space and check.
 
Old 08-22-2013, 04:40 AM   #6
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,021
Blog Entries: 1

Rep: Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110
Quote:
Originally Posted by ls_milkyway View Post
Ok! Thanks for quick response.
Well, you mean to say ">" simply redirecting input
But, still I am not able to understand that why duplicate entries of baby.us
zym.com were removed.
As firerat says, there may be different line endings if they come from different sources.

Also, now that I think about it, it is not entirely clear whether your file is sorted. Are you running sort on some sources then redirecting output to a the file for uniq? Or are you running sort on the file and expecting it to be sorted?

Can you verify that the file used by uniq is actually sorted... sanity check.
 
Old 08-22-2013, 04:55 AM   #7
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=14, FreeBSD_10{.0|.1|.2}
Posts: 4,021
Blog Entries: 1

Rep: Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110Reputation: 2110
I offer the following script to strip M$ line endings from the file.

Copy paste to a file, I name it undos, make it executable, then ./undos filename.txt.

NOTE: USE AT YOUR OWN RISK!! It will prompt you for the --really option to confirm use!

But it should work well enough for this...

Code:
#!/bin/bash
#Quick util to strip \r from text files

if [[ $# == 0 ]]
then
        echo "Usage $0 filename --really"
        exit;
fi

for what in $*
do
        if [[ $what == '--really' ]]
        then
                ok=1
        fi
done

if [[ $ok == 1 ]]
then
sed 's/\r//g' $1 -i
else
        echo "You are about to strip characters from a file, --really to continue!"
fi
So, to be clear....

Code:
sort sorrcefile(s) > sorted.txt
./undos sorted.txt
uniq sorted.txt outfile.txt
That should get you there
 
Old 08-22-2013, 05:02 AM   #8
ls_milkyway
LQ Newbie
 
Registered: Aug 2013
Distribution: BT5R2
Posts: 28

Original Poster
Rep: Reputation: Disabled
Thanks Firerat you got it!

Yes, astrogeek, file was sorted using sort command

The lines have DOS EOL so needs to be converted by tr or dos2unix.

Thanks to all.
 
Old 08-22-2013, 06:18 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,424

Rep: Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823Reputation: 2823
How about change the tool:
Code:
awk '!_[$1]++' RS="[\n\r]+" file
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Remove duplicate entries on a row sebelk Programming 2 11-01-2010 10:43 AM
remove duplicate entries from first column?? kadvar Programming 2 05-12-2010 07:22 PM
[SOLVED] uniq -u : does not seem to remove duplicate lines boxb29 Linux - General 7 08-15-2009 07:34 PM
removing duplicate entries shabev Linux - Enterprise 3 03-25-2008 11:36 AM
duplicate entries in grub d_GeNeRiT Fedora 5 01-26-2006 08:22 AM


All times are GMT -5. The time now is 07:59 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration