Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place! |
| Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
 |
GNU/Linux Basic Guide
This 255-page guide will provide you with the keys to understand the philosophy of free software, teach you how to use and handle it, and give you the tools required to move easily in the world of GNU/Linux. Many users and administrators will be taking their first steps with this GNU/Linux Basic guide and it will show you how to approach and solve the problems you encounter.
Click Here to receive this Complete Guide absolutely free. |
|
 |
10-16-2012, 03:37 AM
|
#1
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,636
|
Extract only unique values from a file
I have a large log file which contains a list of IP addresses (only a list of IP adresses, nothing else), like this:
more <logfile.txt>
10.199.1.1
10.199.1.2
10.199.1.3
10.199.1.1
10.199.1.5
10.199.1.3
10.199.1.4
10.199.1.4
And so on...
But I want to extract only unique values i.e. IP adresses from this list. I have tried sort -u and uniq commands as filters, but everytime I am out of the luck  .
I am surprized that even after using sort -u or uniq or uniq -u, the values are repeating!! So is there any way to sort it out? Any thing from awk? Thanks a lot!
Last edited by shivaa; 10-16-2012 at 03:41 AM.
|
|
|
|
10-16-2012, 04:05 AM
|
#2
|
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 11,234
|
"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ... 
|
|
|
|
10-16-2012, 04:08 AM
|
#3
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,636
Original Poster
|
Quote:
Originally Posted by syg00
"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ... 
|
In order to get unique values I have to use sort -u 2 times i.e. filename | sort -u | sort -u
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values.
But I want something simple, so I need not to use sort 2 times.
|
|
|
|
10-16-2012, 04:24 AM
|
#4
|
|
LQ Veteran
Registered: Aug 2003
Location: Australia
Distribution: Lots ...
Posts: 11,234
|
Answer my questions - specifically. Waffling will get you nowhere. I already told you "sort -u" worked for me on that limited data.
|
|
|
|
10-16-2012, 04:30 AM
|
#5
|
|
Moderator
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.4 OpenSuSE 12.2
Posts: 9,897
|
Code:
awk '!_[$1]++' logfile.txt
|
|
|
|
10-16-2012, 04:40 AM
|
#6
|
|
Senior Member
Registered: Nov 2005
Distribution: Debian
Posts: 2,023
|
Quote:
Originally Posted by meninvenus
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values.
|
The sort command will do string sorting by default; this will look wrong if you want the ips in order, but it won't matter for removing duplicates.
|
|
|
|
10-16-2012, 04:47 AM
|
#7
|
|
Guru
Registered: Aug 2004
Location: Brisbane
Distribution: Centos 6.4, Centos 5.9
Posts: 15,022
|
I'm with syg00; sort -u works perfectly on that data; I even get them in order ...
|
|
|
|
10-16-2012, 05:15 AM
|
#8
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,636
Original Poster
|
Quote:
Originally Posted by colucix
Code:
awk '!_[$1]++' logfile.txt
|
It's giving an error, _[ event not found. Did you check it and your side?
Could you test it again and rectify?
Last edited by shivaa; 10-16-2012 at 05:17 AM.
|
|
|
|
10-16-2012, 10:16 AM
|
#9
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,328
|
Quote:
It's giving an error, _[ event not found. Did you check it and your side?
Could you test it again and rectify?
|
Instead of us checking, why don't you provide the exact error, what system you are running it on and what version and type of awk (mawk, nawk, gawk, awk ...) you are using?
|
|
|
|
10-16-2012, 10:26 AM
|
#10
|
|
Senior Member
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,636
Original Poster
|
Quote:
Originally Posted by grail
Instead of us checking, why don't you provide the exact error, what system you are running it on and what version and type of awk (mawk, nawk, gawk, awk ...) you are using?
|
I have tried it on both Linux as well as Solaris.
RHEL 5 and awk version is 3.1.5
Solaris 10 and awk version I can't find.
==========
more /home/jack/logfile.txt | awk '!_[$1]++'
_[: Event not found.
================
It's perhaps considering "_[" after "!" as any perviously run command, which it can't find, thus throwing the error... Am I right? I can although use '\!_[$1]++', but it's also not working on Solaris.
|
|
|
|
10-16-2012, 01:04 PM
|
#11
|
|
Senior Member
Registered: Nov 2005
Distribution: Debian
Posts: 2,023
|
It looks like history expansion, it's only on by default for interactive use (from the command prompt). You can turn it off with I really think this should be the default setting always; nobody uses history expansion any more.
|
|
|
|
10-17-2012, 09:56 AM
|
#12
|
|
Guru
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 6,328
|
I would add that the use of more in this case is quite redundant. See colucix's example for the appropriate way to execute the example.
|
|
|
|
| Thread Tools |
Search this Thread |
|
|
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
All times are GMT -5. The time now is 06:47 PM.
|
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|