LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-16-2012, 03:37 AM   #1
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Extract only unique values from a file


I have a large log file which contains a list of IP addresses (only a list of IP adresses, nothing else), like this:

more <logfile.txt>
10.199.1.1
10.199.1.2
10.199.1.3
10.199.1.1
10.199.1.5
10.199.1.3
10.199.1.4
10.199.1.4
And so on...


But I want to extract only unique values i.e. IP adresses from this list. I have tried sort -u and uniq commands as filters, but everytime I am out of the luck .
I am surprized that even after using sort -u or uniq or uniq -u, the values are repeating!! So is there any way to sort it out? Any thing from awk? Thanks a lot!

Last edited by shivaa; 10-16-2012 at 03:41 AM.
 
Old 10-16-2012, 04:05 AM   #2
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,125

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ...
 
Old 10-16-2012, 04:08 AM   #3
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by syg00 View Post
"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ...
In order to get unique values I have to use sort -u 2 times i.e. filename | sort -u | sort -u
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values.
But I want something simple, so I need not to use sort 2 times.
 
Old 10-16-2012, 04:24 AM   #4
syg00
LQ Veteran
 
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 21,125

Rep: Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120Reputation: 4120
Answer my questions - specifically. Waffling will get you nowhere. I already told you "sort -u" worked for me on that limited data.
 
Old 10-16-2012, 04:30 AM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Code:
awk '!_[$1]++' logfile.txt
 
Old 10-16-2012, 04:40 AM   #6
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
Quote:
Originally Posted by meninvenus View Post
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values.
The sort command will do string sorting by default; this will look wrong if you want the ips in order, but it won't matter for removing duplicates.
 
Old 10-16-2012, 04:47 AM   #7
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Rocky 9.2
Posts: 18,358

Rep: Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751Reputation: 2751
I'm with syg00; sort -u works perfectly on that data; I even get them in order ...
 
Old 10-16-2012, 05:15 AM   #8
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by colucix View Post
Code:
awk '!_[$1]++' logfile.txt
It's giving an error, _[ event not found. Did you check it and your side?
Could you test it again and rectify?

Last edited by shivaa; 10-16-2012 at 05:17 AM.
 
Old 10-16-2012, 10:16 AM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Quote:
It's giving an error, _[ event not found. Did you check it and your side?
Could you test it again and rectify?
Instead of us checking, why don't you provide the exact error, what system you are running it on and what version and type of awk (mawk, nawk, gawk, awk ...) you are using?
 
Old 10-16-2012, 10:26 AM   #10
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800

Original Poster
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by grail View Post
Instead of us checking, why don't you provide the exact error, what system you are running it on and what version and type of awk (mawk, nawk, gawk, awk ...) you are using?
I have tried it on both Linux as well as Solaris.
RHEL 5 and awk version is 3.1.5
Solaris 10 and awk version I can't find.
==========
more /home/jack/logfile.txt | awk '!_[$1]++'
_[: Event not found.
================
It's perhaps considering "_[" after "!" as any perviously run command, which it can't find, thus throwing the error... Am I right? I can although use '\!_[$1]++', but it's also not working on Solaris.
 
Old 10-16-2012, 01:04 PM   #11
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
It looks like history expansion, it's only on by default for interactive use (from the command prompt). You can turn it off with
Code:
set +o histexpand
I really think this should be the default setting always; nobody uses history expansion any more.
 
Old 10-17-2012, 09:56 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
I would add that the use of more in this case is quite redundant. See colucix's example for the appropriate way to execute the example.
 
  


Reply

Tags
awk, sort



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Using Perl to extract values from an HTML file zero_maniac Programming 2 07-14-2012 10:11 PM
print only unique values Alok Behria Linux - General 8 06-25-2012 10:51 AM
how to extract ascii separated values in a text file? depam Linux - General 4 01-27-2012 12:43 AM
Python, find unique values in a tuple or dictionary action_owl Programming 2 05-11-2010 07:16 PM
HOWTO access Processor ID / other unique values?? kartheekpn Linux - Security 5 03-20-2005 04:10 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 09:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration