LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-17-2003, 04:06 PM   #1
Rezon
LQ Newbie
 
Registered: Oct 2003
Posts: 8

Rep: Reputation: 0
Question Parsing Text from a html file.


I have an assignment, and I am looking for some suggestions....

I am given an html file and have to write a script in the BASH shell to parse out some information (IE: an IP address).

My only problem is that sed, grep and all those commands grab the whole line that the pattern is found on. In this case, each line of the html file is very lengthy, there are probably like 3 lines in the whole file.

So using these commands doesn't get me very far. I still have a large block of text stored in a new file, but it does me no good.

Any suggestions?

Thanks

Rez
 
Old 10-17-2003, 04:14 PM   #2
Kurt M. Weber
Member
 
Registered: Oct 2003
Distribution: Slackware
Posts: 335

Rep: Reputation: 36
My suggestion:
Use grep to find the line in the file, then pipe the output from that to sed to remove the tags and other extraneous junk.
 
Old 10-17-2003, 04:48 PM   #3
Rezon
LQ Newbie
 
Registered: Oct 2003
Posts: 8

Original Poster
Rep: Reputation: 0
How do I tell sed not to delete the full line though? If it finds say &nbsp in the text line it kills the whole line by default to my understanding.

Thanks.
 
Old 10-17-2003, 04:56 PM   #4
Kurt M. Weber
Member
 
Registered: Oct 2003
Distribution: Slackware
Posts: 335

Rep: Reputation: 36
I'm not to keen on regular expression syntax anymoe, but basically tell it to replace everything that meets a regular expression that denotes an HTML tag with nothing.
 
Old 10-17-2003, 05:04 PM   #5
Rezon
LQ Newbie
 
Registered: Oct 2003
Posts: 8

Original Poster
Rep: Reputation: 0
I was actually thinking of first deleting anything in between < >...

sed 's/<*>//` thefile

I'm not to sure on syntax either unfortuanately, we started learning sed this week. Let me know if this is right...

Thanks
 
Old 10-17-2003, 05:40 PM   #6
suprax
LQ Newbie
 
Registered: Oct 2003
Posts: 1

Rep: Reputation: 0
I am doing the same assignment, and I was able to use grep -o [0-9A-Z][0-9A-Z]-go on for 5 more. This works fine, but its the IPs that I can't get. If I do [0-9][0-9][0-9].[0-9] and go on for the rest of the IP search it gives me 255.255.255.255 fine, but it wont give me anything else that has less digits in the IP (24.24.5.22).

My question is, is it possible to put some sort of OR statement between the [0-9]s? for each part of the IP so that it can match [0-9][0-9] for some of them? If I can get this thats all I need.

Thanks.
 
Old 10-18-2003, 12:09 AM   #7
SaTaN
Member
 
Registered: Aug 2003
Location: Suprisingly in Heaven
Posts: 223

Rep: Reputation: 33
Rezon :::
I think this should do it for you

open(FILE,"<$ARGV[0]");
while(<FILE>)
{
s/\<[\d\D]*?\>//g;
s/\s+/ /g;
print "$_";
}

If you don't understand anything in this feel free to ask me

suprax::

I think | does that think for you [ a broken pipe | is like an "or"]
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Need help parsing text file scilec Programming 5 12-02-2004 01:00 PM
need help parsing text file airman99 Linux - General 2 10-08-2004 09:09 PM
Parsing large text file with perl smaida Programming 5 09-13-2004 04:33 AM
Parsing a tab delimited text file jajanes Programming 9 08-08-2003 10:34 AM
Parsing a file for a string of text jamesmwlv Linux - General 2 12-02-2002 07:13 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:33 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration