LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-29-2003, 12:54 PM   #1
patsnip
LQ Newbie
 
Registered: Aug 2003
Posts: 3

Rep: Reputation: 0
Question Need help with grep, trying to parse/filter a file...


Hi there - hope this an appropirate place to ask for help with this...

I need to extract business name and address information (ONLY) from an HTML file, in order to make a database eventually. In other words, I want to strip all the HTML and other extraneous material from the file, and produce a neat file with only the information I want out the other...I'm thinking a well-designed grep line or 2 will do this for me.

Am I deluded? or is there another more obvious way to achieve the same end?

Any advice appreciated.

Here are some sample lines from the HTML file I'm trying to work from:

post
 
Old 08-29-2003, 12:58 PM   #2
patsnip
LQ Newbie
 
Registered: Aug 2003
Posts: 3

Original Poster
Rep: Reputation: 0
sorry, seems the HTML won't appear...hope someone can help intuitively...

cheers
 
Old 08-29-2003, 01:00 PM   #3
SaTaN
Member
 
Registered: Aug 2003
Location: Suprisingly in Heaven
Posts: 223

Rep: Reputation: 33
I suppose this code will do....
The name of the file which contains the html code should be given as argument...
open(FILE,"<$ARGV[0]");
while(<FILE>)
{
s/\<[\d\D]*?\>//g;
s/\s+/ /g;
print "$_";
}
 
Old 08-29-2003, 01:05 PM   #4
david_ross
Moderator
 
Registered: Mar 2003
Location: Scotland
Distribution: Slackware, RedHat, Debian
Posts: 12,047

Rep: Reputation: 67
Try using lynx in dump mode - that should strip out the html for you then you can awk/cut/grep the data easier.
I don't know why you can't post html unless you need to have 5 posts first. E-mail me with the html if you want me to post it in the mean time.
 
Old 08-29-2003, 02:33 PM   #5
patsnip
LQ Newbie
 
Registered: Aug 2003
Posts: 3

Original Poster
Rep: Reputation: 0
thanks saTan and David...I think I have it now!

cheers,
Jason
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
C source file Parse error before 38 exvor Programming 5 09-19-2005 02:10 PM
Grep-like filter exists? carl.waldbieser Programming 3 08-31-2005 11:34 PM
parse log file hourly onewave Programming 2 03-28-2005 01:52 PM
optimizing perl parse file. eastsuse Programming 1 12-22-2004 02:49 AM
How can I filter the output of grep to exclude certain cases? QtCoder Linux - General 1 03-28-2004 12:05 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration