LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 05-06-2012, 06:44 AM   #1
pkramerruiz
LQ Newbie
 
Registered: Jan 2010
Distribution: Ubuntu
Posts: 19

Rep: Reputation: 0
IP to DNS converter that can handle a huge number of entries


Hi guys!
Last week I began to collect many unwanted IP addresses. Last night I finished.
Result? Now I have a 22.9 Mb text file on the desktop that has exactly 1.631.301 entries listed. (Each line an Ip address).

I'm looking desperately for a script that:
* Can handle this large number of entries.
* Detects whether an entry is not an IP address: (If the line contains a letter, or is not a number between three dots, like x.x.x.x or xx.xx.xx.xx, it should replace the whole line with the words NO IP).
* Each IP is replaced by its DNS.
* For the IP addresses that do not have names, should not be an error output. Only leave that line intact.
* If possible, dont cause too much server load, with so many request. Instead it would be much better to heavy load, my 64bit PC.

Last edited by pkramerruiz; 05-06-2012 at 06:48 AM. Reason: possible misunderstanding
 
Old 05-06-2012, 03:55 PM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
dig is your best bet, but I'd use a different approach.

First, I'd use awk to filter out only valid IPv4 addresses from your list, and convert to the reverse order used for DNS requests:
Code:
awk '#
    BEGIN {
        RS="[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"
        FS="[.]"
    }

    (NF==4 && $1>=0 && $1<=255 && $2>=0 && $2<=255 && $3>=0 && $3<=255 && $4>=0 && $4<=255) {

        # Loopback address?
        if ($1 == 127) next

        # Private address?
        if ($1 == 10) next
        if ($1 == 172 && $2 >= 16 && $2 <= 31) next
        if ($1 == 192 && $2 == 168) next

        # Link-local address?
        if ($1 == 169 && $2 == 254) next

        # Multicast address?
        if ($1 >= 224 && $1 <= 239) next

        # This seems like a real IP address.
        printf("%d.%d.%d.%d.in-addr.arpa.\n", $4, $3, $2, $1)
    }
' original-file > ipv4.list
Now you can use dig to go through the IPv4 address list in batch mode. It is basically the most lightweight option. If you want to reduce the load on your name servers, install dnscache so you do the queries directly to the target nameservers, not relying on your normal nameservers -- but I would not bother. The command to run is
Code:
dig +noall +answer -t any -f ipv4.list > ipv4.lookup
After that completes, you can edit the lookup results so they are easier to process:
Code:
awk '#
    BEGIN {
        RS = "[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"
        FS = "[\t\v\f ]+"
    }

    NF > 3 {
        if (split($1, ip, ".") < 6) next
        name = $NF
        sub(/\.$/, "", name)
        printf("%d.%d.%d.%d %s\n", ip[4], ip[3], ip[2], ip[1], name)
    }
' ipv4.lookup > ipv4.names
At this point, you have a list of IPv4 addresses and matching hostnames in ipv4.names . Now you can easily repeat the filtering step you did first, except this time, use the name list to classify each address:
Code:
awk -v names="ipv4.names" '#
    BEGIN {
        RS="[\t\v\f ]*(\r\n|\n\r|\r|\n)[\t\v\f ]*"
        FS="[\t\v\f ]+"

        while ((getline < names) > 0)
            if (NF == 2)
                name[$1] = $2
    }

    (NF > 1) {
        printf("%s BAD_INPUT\n", $1)
        next
    }

    (NF == 1) {
        if (split($1, ip, ".") < 4) {
            printf("%s NO_IP\n", $1)
            next
        }
        if (ip[1] < 0 || ip[1] > 255 || ip[2] < 0 || ip[2] > 255 ||
            ip[3] < 0 || ip[3] > 255 || ip[4] < 0 || ip[4] > 255) {
            printf("%s NO_IP\n", $1)
            next
        }

        if (ip[1] == 127) {
            printf("%s LOOPBACK\n", $1)
            next
        }

        if ((ip[1] == 10) ||
            (ip[1] = 172 && ip[2] >= 16 && ip[2] <= 31) ||
            (ip[1] == 192 && ip[2] == 168)) {
            printf("%s PRIVATE\n", $1)
            next
        }

        if (ip[1] == 169 && ip[2] == 254) {
            printf("%s LINK_LOCAL\n", $1)
            next
        }

        if (ip[1] >= 224 && ip[1] <= 239) {
            printf("%s MULTICAST\n", $1)
            next
        }

        addr = sprintf("%d.%d.%d.%d", ip[1], ip[2], ip[3], ip[4])
        if (addr in name)
            printf("%s KNOWN %s\n", $1, name[addr])
        else
            printf("%s UNKNOWN\n", $1)
    }
' original-file > final-results
In the final-results file, the original IP address will be in the first column, reason in the second column, and if the second column contains KNOWN, the name is in the third column.

Note: The above scriptlets have not been thoroughly tested, so there might be typos.
 
  


Reply

Tags
dns, ip


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sort -n, huge number kaz2100 Programming 4 01-23-2011 07:12 AM
Huge traffic, strange entries in access-logs.. ? jaggy00 Linux - Security 3 09-25-2008 08:58 AM
Number of Bootable OS's Grub can handle trinkett42 Linux - Software 10 09-21-2005 05:00 PM
Problem with huge number of pthreads Berng Programming 7 12-17-2003 08:33 AM
need help to set up caching only dns server to with bogus DNS entries ullas Linux - Networking 1 10-28-2003 02:54 PM


All times are GMT -5. The time now is 12:24 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration