LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Get only uniq content from a file (https://www.linuxquestions.org/questions/linux-newbie-8/get-only-uniq-content-from-a-file-4175431955/)

shivaa 10-13-2012 12:06 AM

Get only uniq content from a file
 
I have a very large log file, containing more than 5 lac entries of different host IPs. I want to get only unix values out the list of IPs, but uniq command is not helping me.
I used more <logfile> | uniq -u but it still giving me repeated lines in it i.e. same IP addresses are showing repeatedly in the output. So can anybody help in this?
Thanks a lot.

druuna 10-13-2012 02:28 AM

Without knowing what the entries actually look like it will be hard to point you to a working solution.

If using uniq doesn't solve the problem then I have to assume that the lines are not the same. The IP addresses might be, but other info on that line differs. One thing that comes to mind first: Is there a time-stamp present in those lines?

Like I already mentioned; Without more info (what do the lines look like, which entries should be made (in)visible etc) we cannot assist you.

shivaa 10-13-2012 02:54 AM

Quote:

Originally Posted by druuna (Post 4804495)
Without knowing what the entries actually look like it will be hard to point you to a working solution.

If using uniq doesn't solve the problem then I have to assume that the lines are not the same. The IP addresses might be, but other info on that line differs. One thing that comes to mind first: Is there a time-stamp present in those lines?

Like I already mentioned; Without more info (what do the lines look like, which entries should be made (in)visible etc) we cannot assist you.

Suppose, some hosts connect to my server, and my server maintains a log file which records IP address of each host connect to it. Now the situation is that, a host can connect to my server many times and each time server notes it IP address in it's log file. So the task is, though log file contains many entries of same IP, and I want to list of all IPs only once. For example:
10.199.1.2
10.199.1.3
10.199.1.4
10.199.1.5
10.199.1.2
10.199.1.3
10.199.1.1
10.199.1.2
10.199.1.2
10.199.1.1

And I want:
10.199.1.2
10.199.1.3
10.199.1.4
10.199.1.1

But unfortunately, uniq is not helping me. Meanwhile I have got a solution by using sort -u filter. If you know any other way, please suggest.

druuna 10-13-2012 03:11 AM

The reason why uniq doesn't work as expected by you; The following is from the uniq man page:
Quote:

Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use `sort -u' without `uniq'.
The following would have worked:
Code:

sort logfile | uniq -u
But, as you already figured out, sort can do both and the uniq part isn't needed:
Code:

sort -u logfile

grail 10-13-2012 04:46 AM

Or an awk alternative:
Code:

awk '!_[$0]++' file

shivaa 10-13-2012 08:46 AM

Quote:

Originally Posted by grail (Post 4804549)
Or an awk alternative:
Code:

awk '!_[$0]++' file

Awk is magical, it always works!!
Thanks everyone. It's solved.


All times are GMT -5. The time now is 11:56 PM.