LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-12-2014, 05:15 PM   #1
centosser
LQ Newbie
 
Registered: Aug 2014
Posts: 6

Rep: Reputation: Disabled
Parse out only specific characters from web page


Hi. I am using curl to download a specific web page, basically I need the CustName: from the query run on whois.domaintools.com

If you type an ip address, it will return the organization name. I need only this information saved in a plain text file. I tried using grep -E but it gets messy because there are many &nbsp and ;&nbsp located after the CustName. Also the string is returned in one long line so grepping for CustName returns that same long line. The characters that follow the information I need are simply a new line which is '<br>'. I need to stop grabbing text up until that point.

So what I do is run

Code:
curl -s http://whois.domaintools.com/ip.addr.of.domain > file
Then I run

Code:
grep -E -o "CustName.{120}" file
The 120 stands for characters after CustName. Much of these are &nbsp and ;nbsp. I use 120 to make sure I grab everything. Basically, since the data is all in one line, they use <br> right after the information I need. The Address section is below CustName and that is not what I need. I would only like the information up until the <br>

Here is an example of the output of the above command:

Code:
grep -E -o "CustName.{120}" file
242:CustName:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;Akamai&nbspTechnologies<br/>Address:&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbs
As you can see, the only information I want is Akamai Technologies. How can I parse out this data in the most efficient way? Thank you for any help.
 
Old 08-12-2014, 06:04 PM   #2
unSpawn
Moderator
 
Registered: May 2001
Posts: 29,415
Blog Entries: 55

Rep: Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600Reputation: 3600
If you 'elinks -dump' then you have parsed output to grep or awk? Or else pipe the curl through a parser like http://www.devshed.com/c/a/apache/logging-in-apache/2/ (see "Listing 3-1. A Simple Script to Use As a Filter")?
 
Old 08-12-2014, 06:25 PM   #3
centosser
LQ Newbie
 
Registered: Aug 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Hi. I'm not sure what parser means here. I saved the output of the curl command to a file and just ran grep from there. Sorry new to parsing in apache.
 
Old 08-12-2014, 07:37 PM   #4
norobro
Member
 
Registered: Feb 2006
Distribution: Debian Sid
Posts: 792

Rep: Reputation: 331Reputation: 331Reputation: 331Reputation: 331
Perhaps I'm missing something here, but why not use "whois"?
Code:
whois ip.addr.of.domain | grep OrgName
 
Old 08-13-2014, 03:37 PM   #5
centosser
LQ Newbie
 
Registered: Aug 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
The problem is that whois does not seem to work with every ip address
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Bash script to parse a file to get a set of line between a specific characters venkatrg Linux - Newbie 5 12-24-2010 06:55 AM
how can redirect specific ip to specific web page Barq Linux - Server 0 07-01-2009 06:58 PM
Apache 2.0.52 funny characters on web page mnauta Linux - Networking 1 12-03-2004 12:24 AM
Auto login and run firefox at a specific web page eraser Linux - Newbie 4 11-21-2004 05:34 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 05:09 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration