Need a shell script to filter out unique hostnames from two text files.

Thaidog · 11-01-2011, 09:47 AM

I have two huge lists of server names that I need to find the unique servers in. These lists are jumbled up and the hostnames are not in order so a diff will not work by itself... and ideas on how to best tackle this?

thesnow · 11-01-2011, 10:07 AM

Can you post samples/examples of the file(s)?

If it is just one column of server names, you could use "sort" piped to "uniq" to get shorter, ordered lists.

grail · 11-01-2011, 10:19 AM

I would also ask what have you tried? This seems to be a fairly trivial task on the surface unless you have missed some further details?

Thaidog · 11-01-2011, 12:19 PM

The two lists would look something like:

list 1
hostname 2
hostname abc
hostname 1

List 2
hostname 1
hostname 2

In this case I would only need output for hostname abc - if I used diff I would get:

$ diff testhost1 testhost2
1,2c1,3
< hostname 1
< hostname 2
\ No newline at end of file
---
> hostname 2
> hostname abc
> hostname 1
\ No newline at end of file

And if I sort the file I would still get issues with diff if there are more servers in one list - which there is...

thesnow · 11-01-2011, 12:41 PM

Code:

[root@lm:~/testing]$ cat list1
hostname 2
hostname abc
hostname 1
[root@lm:~/testing]$ cat list2
hostname 1
hostname 2
[root@lm:~/testing]$ cat list1 list2 | sort | uniq -u
hostname abc

David the H. · 11-01-2011, 12:48 PM

uniq -u = print only unique lines

Code:

cat file1 file2 | sort | uniq -u

Edit: dangnabit! Beaten to the answer...

crts · 11-01-2011, 01:58 PM

@OP: If your sample is representative then the suggested solutions will do fine. However, if your actual data also has varying elements, e.g. timestamps etc., and you want to ignore those fields when checking for *equal* lines then you might have to consider a slightly different solution. What I mean is, consider the following data:

Code:

$ cat file1
hostname1 [22:22]
hostname2 [23:23]
hostname3 [00:00]
$ cat file2
hostname1 [22:00]
hostname3 [00:30]

Now, if you do not care about the timestamps and only hostname2 should be printed then, e.g., you could do for the above data:

Code:

$ cat file1 file2 |sort|rev| uniq -f 1 -u|rev
hostname2 [23:23]

If you have a variable number of columns then you'd probably have to switch to an 'awk' solution. Let us know your exact requirement.