LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Extract only unique values from a file (https://www.linuxquestions.org/questions/linux-newbie-8/extract-only-unique-values-from-a-file-4175432430/)

shivaa 10-16-2012 03:37 AM

Extract only unique values from a file
 
I have a large log file which contains a list of IP addresses (only a list of IP adresses, nothing else), like this:

more <logfile.txt>
10.199.1.1
10.199.1.2
10.199.1.3
10.199.1.1
10.199.1.5
10.199.1.3
10.199.1.4
10.199.1.4
And so on...


But I want to extract only unique values i.e. IP adresses from this list. I have tried sort -u and uniq commands as filters, but everytime I am out of the luck :(.
I am surprized that even after using sort -u or uniq or uniq -u, the values are repeating!! So is there any way to sort it out? Any thing from awk? Thanks a lot!

syg00 10-16-2012 04:05 AM

"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ... :p

shivaa 10-16-2012 04:08 AM

Quote:

Originally Posted by syg00 (Post 4806887)
"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ... :p

In order to get unique values I have to use sort -u 2 times i.e. filename | sort -u | sort -u
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values.
But I want something simple, so I need not to use sort 2 times.

syg00 10-16-2012 04:24 AM

Answer my questions - specifically. Waffling will get you nowhere. I already told you "sort -u" worked for me on that limited data.

colucix 10-16-2012 04:30 AM

Code:

awk '!_[$1]++' logfile.txt

ntubski 10-16-2012 04:40 AM

Quote:

Originally Posted by meninvenus (Post 4806889)
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values.

The sort command will do string sorting by default; this will look wrong if you want the ips in order, but it won't matter for removing duplicates.

chrism01 10-16-2012 04:47 AM

I'm with syg00; sort -u works perfectly on that data; I even get them in order ...

shivaa 10-16-2012 05:15 AM

Quote:

Originally Posted by colucix (Post 4806916)
Code:

awk '!_[$1]++' logfile.txt

It's giving an error, _[ event not found. Did you check it and your side?
Could you test it again and rectify?

grail 10-16-2012 10:16 AM

Quote:

It's giving an error, _[ event not found. Did you check it and your side?
Could you test it again and rectify?
Instead of us checking, why don't you provide the exact error, what system you are running it on and what version and type of awk (mawk, nawk, gawk, awk ...) you are using?

shivaa 10-16-2012 10:26 AM

Quote:

Originally Posted by grail (Post 4807263)
Instead of us checking, why don't you provide the exact error, what system you are running it on and what version and type of awk (mawk, nawk, gawk, awk ...) you are using?

I have tried it on both Linux as well as Solaris.
RHEL 5 and awk version is 3.1.5
Solaris 10 and awk version I can't find.
==========
more /home/jack/logfile.txt | awk '!_[$1]++'
_[: Event not found.
================
It's perhaps considering "_[" after "!" as any perviously run command, which it can't find, thus throwing the error... Am I right? I can although use '\!_[$1]++', but it's also not working on Solaris.

ntubski 10-16-2012 01:04 PM

It looks like history expansion, it's only on by default for interactive use (from the command prompt). You can turn it off with
Code:

set +o histexpand
I really think this should be the default setting always; nobody uses history expansion any more.

grail 10-17-2012 09:56 AM

I would add that the use of more in this case is quite redundant. See colucix's example for the appropriate way to execute the example.


All times are GMT -5. The time now is 10:19 AM.