LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 08-20-2004, 09:28 AM   #1
gsphanikumar6
LQ Newbie
 
Registered: Aug 2004
Posts: 6

Rep: Reputation: 0
Extract text from a html file


Hey guys
I am a newbie. I want to write a script that takes a directory with set of html files as input scan through each file and remove the html tags within the body of the pages and pick each word of the plain text that remains and put it in a file with each word on a new line.
--------------------------------------------
Thanks in advance

Last edited by gsphanikumar6; 08-20-2004 at 12:16 PM.
 
Old 08-20-2004, 01:05 PM   #2
david_ross
Moderator
 
Registered: Mar 2003
Location: Scotland
Distribution: Slackware, RedHat, Debian
Posts: 12,047

Rep: Reputation: 64
Welcome to LQ.

Try this:
Code:
#!/usr/bin/perl

while(<STDIN>){
$line=$_;
$line=~s/<[^>]*>//ig;
$line=~s/\s/\n/ig;
print $line;
}

exit;
It reads from stdin and writes to stdout.
 
Old 08-20-2004, 01:11 PM   #3
gsphanikumar6
LQ Newbie
 
Registered: Aug 2004
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks for the reply david_ross
But i don't know perl. I will be thank full if u write the code in
bash
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Extract spesific text from an HTML file mister_0101 Programming 6 07-24-2005 04:50 PM
Finding Text in an html file Xaque208 Linux - Software 3 11-15-2004 10:32 PM
extract text portions from html files linuxfond Programming 3 04-28-2004 11:00 AM
Parsing Text from a html file. Rezon Programming 6 10-18-2003 12:09 AM
how can I convert a text file to a html one? kevin_liu Linux - Software 2 07-16-2003 06:09 AM


All times are GMT -5. The time now is 08:00 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration