[SOLVED] Retrieve specific data from html (alternative who is)
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Retrieve specific data from html (alternative who is)
I have been working with whois tool for some time , but sometimes whois database is not very accurate on countries from specific ip .
tcpiputils.com website looks very accurate and we can retrieve an ip data html with wget , by downloading the normal html displayed on browser .
But using grep in the downloaded data to retrieve the country is a mess , because depending on some ips , the html can display the city and then the country on same field .
if you download this html you will see that in line 134 is where is the field with city , country etc ...
but that line is a mess to pick up something or to pick an unique reference for grep to get it , and then i have another issue ahead witch is in other ips the country could be in other line and it could have only country without city name before .
The one more easier to get , i believe it is the 2 letter country code .
Using the 2 letter code i can make a search for the full country name on a country list i have here .
Here you go. This was kind of a fun little script to write. It was easiest to use the PCRE (Perl Compatible regular Expressions) mode of grep to search for the appropriate lines. I've included the option to select the full or short country name.
EDIT: Please let me know if anything isn't clear.
EDIT2: I'm sorry, full_country is actually the state name.
EDIT3: I updated it to get the full country name.
Thank you very much for the code , in my script i already use whois , but this one will popup if whois output is not very reliable or old .
Somehow , i believe many people will use this code you wrote in future .
Thanks again
Edited
scasey
The problem with whois tool is that sometimes is not very accurate , specially if the server is located in one country and the guy that registered it is in another , in whois you will retrieve multiple countries , one ip here i got 3 countries , US , CN (China) , SG (Singapore) .
And also because sometimes whois is overloaded and you can get a timeout from the output .
The code "Individual" wrote gives you an alternative way to get some ip country name without having to use whois , and also dnsutils website can reverse ip to hostname and a lot of other informations that whois is not able to get .
From anyone who uses it , it may use it to get other variables from the webpage that normally you can not get with whois .
Since the pedropt only wants the country, it is probably easier to use the whois program. But now he has options to choose from.
Don't get me wrong..your script is most impressive...I just wanted to point out that the website is only displaying what whois returns.
I use whois to get the reporting address for spam reporting, and it works almost all the time. There are issues with KoreaNIC, and sometimes with JPNIC, and I have to go to the relevant web pages for those...sometimes.
Here's a script I use to pull the contact information:
Don't get me wrong..your script is most impressive...I just wanted to point out that the website is only displaying what whois returns.
No offense taken.
Quote:
Originally Posted by scasey
I use whois to get the reporting address for spam reporting, and it works almost all the time. There are issues with KoreaNIC, and sometimes with JPNIC, and I have to go to the relevant web pages for those...sometimes.
Here's a script I use to pull the contact information:
scasey
The problem with whois tool is that sometimes is not very accurate , specially if the server is located in one country and the guy that registered it is in another , in whois you will retrieve multiple countries , one ip here i got 3 countries , US , CN (China) , SG (Singapore) .
Yes, I've seen that. As said, I mostly use whois to identify the reporting address for the IP that delivered spam to my servers...in which case I only need to know the upstream provider/owner of that IP.
When that doesn't work, I go directly to the managing Network Information Center's website (although, I have also bookmarked https://dnslytics.com -- it certainly can be useful.)
Yes, I've seen that. As said, I mostly use whois to identify the reporting address for the IP that delivered spam to my servers...in which case I only need to know the upstream provider/owner of that IP.
When that doesn't work, I go directly to the managing Network Information Center's website (although, I have also bookmarked https://dnslytics.com -- it certainly can be useful.)
I personally dont contact the upstream provider because in my server from the same subnet of a specific ip i recieve a lot of exploitation techniques to the server or Denial of service .
My solution was to implement some rules to deal with DOS attacks , and then lookup on the logs what a specific ip have been doing , and depending on that i can block it directly into the firewall with my script .
This way i dont have to worry again with that ip .
Sometimes when some ip subnet is trying to hack the server , by this i mean that for ex : one day i get a port scan from 192.168.1.30 , i block it in firewall , then next day i get another portscan or a dos from 192.168.1.35 , same treatment in the firewall , then after 2 days i get an attempt to exploitation or anything else from another ip from same subnet like 192.168.1.50 , best way here is to block that subnet as a whole , because as i notice here , most attacks come from another open websites with services for public , and by this i think that those sites were hacked somehow and the hacker is using the website as a remote shell and redirecting the job using a different ip or the owner of the website did had nothing to do and decided to hammer something on the web .
Definitively the best way is to block in the firewall or i have to contact isp providers everyday because of abusing ips on its network , i dont have time for that .
I definitely agree about blocking netblocks, and do that routinely, but only for email, using ucspi-tcp, which can be configured to drop connections on port 25. We currently block about 75% of connection attempts from spamming providers, mostly in other countries.
We've automated reporting such that we can supply a perl script with the reporting address, the delivering IP address, and the name of the Maildir file containing the spam. The script then composes and send an email to the provider. In our experience, the vast majority of providers welcome the reports, as they allow them to address the source of the UCE and thereby avoid being blacklisted.
We used to do all that automatically, but the program we'd found to do the contact lookups stopped working and was no longer maintained. Before that happened, we'd built the very effective block list mentioned above, tho...so we get relatively little UCE anymore...a small enough amount that we can manage doing the lookups and reporting manually.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.