Extract only unique values from a file
I have a large log file which contains a list of IP addresses (only a list of IP adresses, nothing else), like this:
more <logfile.txt> 10.199.1.1 10.199.1.2 10.199.1.3 10.199.1.1 10.199.1.5 10.199.1.3 10.199.1.4 10.199.1.4 And so on... But I want to extract only unique values i.e. IP adresses from this list. I have tried sort -u and uniq commands as filters, but everytime I am out of the luck :(. I am surprized that even after using sort -u or uniq or uniq -u, the values are repeating!! So is there any way to sort it out? Any thing from awk? Thanks a lot! |
"sort -u" works for me - what do you get ?. And what system are you using ?. "uniq" is a bit unique ... :p
|
Quote:
I think it's because, IP addresses are 4 digit numbers, and thus sort command is getting little confused as which digit it should sort with. That is why it's leaving duplicate values. But I want something simple, so I need not to use sort 2 times. |
Answer my questions - specifically. Waffling will get you nowhere. I already told you "sort -u" worked for me on that limited data.
|
Code:
awk '!_[$1]++' logfile.txt |
Quote:
|
I'm with syg00; sort -u works perfectly on that data; I even get them in order ...
|
Quote:
Could you test it again and rectify? |
Quote:
|
Quote:
RHEL 5 and awk version is 3.1.5 Solaris 10 and awk version I can't find. ========== more /home/jack/logfile.txt | awk '!_[$1]++' _[: Event not found. ================ It's perhaps considering "_[" after "!" as any perviously run command, which it can't find, thus throwing the error... Am I right? I can although use '\!_[$1]++', but it's also not working on Solaris. |
It looks like history expansion, it's only on by default for interactive use (from the command prompt). You can turn it off with
Code:
set +o histexpand |
I would add that the use of more in this case is quite redundant. See colucix's example for the appropriate way to execute the example.
|
All times are GMT -5. The time now is 10:19 AM. |