LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
LinkBack Search this Thread
Old 04-21-2004, 06:08 PM   #1
linuxfond
Member
 
Registered: Jan 2003
Location: Belgium
Distribution: Mandrake 9.2
Posts: 475

Rep: Reputation: 30
extract text portions from html files


Hello,
I have a daunting task: 350 html files on HD, each of these files contain a portion of text which I want to extract and save into a text file:
<li>E-mail: <a href="mailto:abc@abc.abc">abc@abc.abc</a></li>
Is there an easy way to do, a shell script, a command line tool, or something of this kind which will extract the emails?
Once this is done, I have to extract other text data and save it to file.
Thanks, indeed.
 
Old 04-21-2004, 06:46 PM   #2
linuxfond
Member
 
Registered: Jan 2003
Location: Belgium
Distribution: Mandrake 9.2
Posts: 475

Original Poster
Rep: Reputation: 30
Remove duplicate lines

I downloaded something from the Web which runs under Windox.
Now I have a text file with 4000+ email addresses. More than half of it are duplicates. How can I remove the duplicates?
Please, help.
N.B. I know it looks like I am going to spam these addresses, but, I am really not going to do so.

Last edited by linuxfond; 04-21-2004 at 06:48 PM.
 
Old 04-21-2004, 11:21 PM   #3
itsme86
Senior Member
 
Registered: Jan 2004
Location: Oregon, USA
Distribution: Slackware
Posts: 1,246

Rep: Reputation: 56
man uniq

You might need to run your text file through sort first (cat file.txt | sort | uniq > newfile.txt)
 
Old 04-28-2004, 11:00 AM   #4
linuxfond
Member
 
Registered: Jan 2003
Location: Belgium
Distribution: Mandrake 9.2
Posts: 475

Original Poster
Rep: Reputation: 30
Talking

Thank you. It worked!
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are Off
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge Of Html Files Into A Single Html (or Pdf) fiomba Linux - Software 6 06-20-2011 07:28 PM
Convert text files to html files lothario Linux - Software 3 09-27-2005 08:48 PM
Extract spesific text from an HTML file mister_0101 Programming 6 07-24-2005 04:50 PM
how to convert text(html) back to html. d1l2w3 Linux - Software 4 04-08-2005 08:16 PM
Extract text from a html file gsphanikumar6 Linux - Newbie 2 08-20-2004 01:11 PM


All times are GMT -5. The time now is 03:20 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration