LinuxQuestions.org
LinuxAnswers - the LQ Linux tutorial section.
Go Back   LinuxQuestions.org > Blogs > Musings on technology, philosophy, and life in the corporate world
User Name
Password

Notices

Hi. I'm a Unix Administrator, mathematics enthusiast, and amateur philosopher. This is where I rant about that which upsets me, laugh about that which amuses me, and jabber about that which holds my interest most: Unix.
Rate this Entry

GeoIP for serious nerds

Posted 09-13-2011 at 12:41 PM by rocket357
Updated 09-13-2011 at 01:01 PM by rocket357

I'm working on a project to make a pretty graphical display of customers who are having speed issues so we can quickly isolate potential internet issues. I'd admin'd a (very minor) open source project called "GeoXPlanet", which was based on a project I was assigned in college, both of which used GeoIP technology heavily, so I have a bit of experience with geolocation and graphing networks. It's fun stuff, really, it is.

I have this machine:

Code:
top - 12:18:39 up 124 days,  7:10,  2 users,  load average: 1.01, 0.67, 0.42
Tasks: 142 total,   1 running, 141 sleeping,   0 stopped,   0 zombie
Cpu0  :  4.2%us,  0.6%sy,  0.0%ni, 83.0%id, 11.9%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu1  :  4.3%us,  1.2%sy,  0.0%ni, 86.0%id,  8.1%wa,  0.0%hi,  0.3%si,  0.0%st
Cpu2  :  3.8%us,  1.0%sy,  0.0%ni, 83.4%id, 11.1%wa,  0.0%hi,  0.6%si,  0.0%st
Cpu3  :  3.8%us,  1.0%sy,  0.0%ni, 85.9%id,  9.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  16465852k total, 16375656k used,    90196k free,    10524k buffers
Swap:  2907724k total,    17520k used,  2890204k free, 15573164k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                     
 4555 postgres  20   0 2159m 2.1g 2.1g D   18 13.2   3:19.77 postgres: postgres sp_logs 192.168.1.16(41078) SELECT        
 6128 jstone    20   0 1450m 139m 8040 S    1  0.9   0:08.45 java -Xmx1024M -Dart.version=3 -Djdbc.audit.url=jdbc:jtds:sql
   47 root      15  -5     0    0    0 S    1  0.0 648:24.26 [kswapd0]                                                    
 6142 postgres  20   0 2151m  65m  62m S    1  0.4   0:03.86 postgres: jstone boxi_aggregator 192.168.1.201(48603) idle
This is a single quad-core Intel Xeon E5450 @ 3.00GHz with 16 GB RAM and 16x 1TB SATA disks...not your typical desktop (but a far cry from a "typical" PostgreSQL machine around here...it was retired from production use two years ago because it is too slow to keep up with the big dogs). This machine houses the performance data for every single pagehit my company has received in the past year and a half:

Code:
sp_logs=# select sum(max_log_id) from sp5_log_completed;
    sum     
------------
 1032657771
(1 row)
It also houses the report data for our BusinessObjects XI installation. This machine uses a lot of disk space!

Alrighty...so this database holds pagehit information, and in each record we store the ip address of the remote machine. Simple enough, right? Using that, we can geolocate customers who are having speed issues and see if there's a common router on the internet or ISP#1->ISP#2 hop that is causing the issue.

I've used MaxMind before, so I think I'll go with them again. Download the GeoLiteCity CSV from http://www.maxmind.com/app/geolitecity and uncompress it. Next up you want to do this:

Code:
$ cd /path/to/unzippedGeoLiteCity
$ sed '1,2d' GeoLiteCity-Location.csv > Location-NOHEADERS.csv
$ sed '1,2d' GeoLiteCity-Blocks.csv > Blocks-NOHEADERS.csv
$ psql -U postgres
postgres=# \c sp_logs;
sp_logs=# SET client_encoding TO 'LATIN1';
sp_logs=# CREATE TABLE IpBlocks ( startIpNum bigint, endIpNum bigint,
        locId int );
sp_logs=# CREATE TABLE IpLocations ( locId int, country varchar(2),
        region varchar(2), city varchar(50), postalCode varchar(10),
        latitude double precision, longitude double precision,
        dmaCode varchar(20), areaCode varchar(10) );
sp_logs=# COPY IpBlocks FROM '/path/to/unzippedGeoLiteCity/Blocks-NOHEADERS.csv' CSV;
sp_logs=# COPY IpLocations FROM '/path/to/unzippedGeoLiteCity/Location-NOHEADERS.csv' CSV;
sp_logs=# CREATE INDEX iploc_loc_lat_lon_idx ON iplocations(locid, latitude, longitude);
sp_logs=# CREATE INDEX blks_ip_idx ON ipblocks(startipnum, endipnum);
sp_logs=# CREATE FUNCTION lookup_ip(ip varchar(16)) RETURNS RECORD AS $$
DECLARE 
encoded_ip bigint; 
sql_result record; 
BEGIN
SELECT  
    split_part(ip,'.',1)::bigint*16777216 + 
    split_part(ip,'.',2)::bigint*65536 + 
    split_part(ip,'.',3)::bigint*256 + 
    split_part(ip,'.',4)::bigint 
  INTO encoded_ip;
SELECT 
    l.latitude, 
    l.longitude 
  FROM ipblocks b 
  JOIN iplocations l ON l.locid = b.locid 
    WHERE b.startipnum <= encoded_ip 
        AND b.endipnum >= encoded_ip 
  INTO sql_result;
RETURN sql_result; 
END;
$$ LANGUAGE plpgsql;
And then have fun:


Code:
sp_logs=# \timing on
Timing is on.
sp_logs=# select lookup_ip('8.8.8.8'); -- One of Google's public DNS servers...
      lookup_ip      
---------------------
 (37.4192,-122.0574)
(1 row)

Time: 1.825 ms
sp_logs=#
I'll edit this post again when I get a chance to add in the xplanet section...but for now, enjoy.
Posted in Uncategorized
Views 15593 Comments 5
« Prev     Main     Next »
Total Comments 5

Comments

  1. Old Comment
    thanks
    by the way, that is openbsd, right?
    did you ever test net/disk performance difference between openbsd/linux? or something like that with linux and other UNIX-like free systems? have some results to share?
    Posted 09-14-2011 at 04:15 AM by Web31337 Web31337 is offline
    Updated 09-14-2011 at 04:17 AM by Web31337
  2. Old Comment
    Actually no, that machine running PostgreSQL is a Debian box. Here's an OpenBSD "top" dump (my username and domain were altered by me =):

    Code:
    load averages:  0.63,  0.29,  0.20                                                                               <user>.<domain> 08:43:24
    68 processes:  64 idle, 1 zombie, 3 on processor
    CPU0 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU1 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU2 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU3 states:  3.6% user,  0.0% nice,  0.0% system,  0.0% interrupt, 96.4% idle
    CPU4 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU5 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU6 states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  100% idle
    CPU7 states: 14.3% user,  0.0% nice,  0.0% system,  0.0% interrupt, 85.7% idle
    Memory: Real: 366M/3644M act/tot Free: 320M Cache: 2906M Swap: 0K/3830M
    
      PID USERNAME PRI NICE  SIZE   RES STATE     WAIT      TIME    CPU COMMAND
    21000 <user>     2    0 3580K 4596K sleep/1   poll    218:57  2.00% mp3blaster
    14063 <user>     2    0 1024K 2196K sleep/3   poll      1:18  0.05% top
    11574 _sndio     2  -20  712K 1244K sleep/2   poll     25:31  0.00% /usr/bin/aucat -l -r 48000
    13412 <user>     2    0  345M  260M sleep/4   poll     17:27  0.00% xxxterm
    Anyways, I have done benchmarking between Linux and {Free,Open}BSD, and Linux wins hands down for PostgreSQL...at least on our hardware. I probably could've coaxed considerably more performance out of FreeBSD, but I'm not as familiar with FreeBSD as I am with OpenBSD and Linux, so I dunno. The benchmarks were also done before OpenBSD on amd64 had BIGMEM enabled by default, so I'm sure the results would be better for OpenBSD now (so honestly I should re-benchmark when I get a reasonably powerful machine available again).
    Posted 09-14-2011 at 08:48 AM by rocket357 rocket357 is offline
    Updated 09-14-2011 at 08:51 AM by rocket357
  3. Old Comment
    To add to the benchmarking comment, it's been my experience that *BSD tends to lag behind in terms of disk I/O when compared to Linux, but tends to be highly competitive in terms of net I/O...so obviously a database benchmark will favor Linux unless it's a very small database that can be cached in RAM. A while back we tried migrating to FreeBSD for PostgreSQL (dynamic I/O priority is what we were hoping would give FreeBSD the edge), but the disk I/O couldn't keep up enough to make the migration a win.

    When I benchmark again I'll re-read the FreeBSD handbook to make sure I'm up to speed on tuning so I can post some numbers.
    Posted 09-14-2011 at 09:01 AM by rocket357 rocket357 is offline
  4. Old Comment
    Sounds interesting. So, openbsd basically should win if using network application that resides entirely in RAM?
    I have my hands full, but I was planning on looking to BSDs side, OpenBSD/NetBSD/FreeBSD all seem interesting to me, in terms of taking them to high-load road.
    btw is there something for openbsd like grsecurity for linux?
    Posted 09-14-2011 at 10:06 AM by Web31337 Web31337 is offline
  5. Old Comment
    I said it'd be competitive, not a hands-down win for *BSD. The BSD's tend to be more stable and secure, but that comes at a cost of higher learning curve and performance reduction (performance reduction is debateable).

    grsecurity basically enables the following:
    1. /proc hardening (doesn't exist in OpenBSD base)
    2. chroot restrictions
    3. PaX
    4. Kernel Stack Randomization
    5. RBAC

    OpenBSD approaches security in a slightly different manner:
    1. GCC hardening (so all binaries are built with pointers before buffers, random canary values, etc... for stack protection)
    2. strlcpy and strlcat
    3. W^X (Memory pages cannot be writeable and executable at the same time)
    4. Privilege Separation (the "listening" portion of a network daemon runs as an unprivileged user)
    5. loader randomization (libraries are loaded randomly, address space is randomized, etc...)
    6. Integrated cryptography (swap is encrypted, for instance)
    7. Systrace sandboxing
    8. network stack randomization

    The big difference is that these features are built-in to the core system and on by default in OpenBSD, not bolted on afterwards...and their security track record shows it.

    The one big issue I'd say that OpenBSD has is that it doesn't have any sort of advanced ACL capabilities (just stock ugo-rwx permissions). The philosophy is to prevent vulnerabilities, not contain them after they occur. It's like two people going into a contagious disease room in a hospital, one chooses to wear a rubber suit (OpenBSD) and one chooses to get vaccinations beforehand. Ideally, you'd do both, but if you could only choose one...I'd wear the suit (and check it thoroughly) without thinking twice.
    Posted 09-14-2011 at 10:44 AM by rocket357 rocket357 is offline
    Updated 09-14-2011 at 10:57 AM by rocket357
 

  



All times are GMT -5. The time now is 09:30 PM.

Main Menu
Advertisement

My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration