LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 12-16-2011, 06:35 PM   #1
Yaazkal
LQ Newbie
 
Registered: Dec 2011
Posts: 2

Rep: Reputation: Disabled
Need to modify a lot of html files


Hello, I have about 3400 files in a tree structure (about 80% are html files).


1. I need to modify every html file to remove <p> style and old things like font attribute and add another style.
2. I need to change the root of all links that are in the html. e.g. change /old/path/ to /new/path at the href attribute.
3. I need to remove some links. e.g. links that points to google.com need to be removed, so
Code:
<a href="http://www.google.com">as google said</a>
should be only "as google said".

Is there any software that can do this for me?
Is it possible to make a script?

I have no knowledge about scripting and for this work, I thing this could be the fast way... anybody wants to help me?

Thanks !
 
Old 12-16-2011, 06:56 PM   #2
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 211Reputation: 211Reputation: 211
Interesting problem. If you had fewer tasks to accomplish, I would ordinarily recommend spending the time to write a couple of complex sed commands. Given the size of the HTML store, and the number of modifications you want to make, it's probably time to write a program leveraging an HTML Parser library.

What languages do you know?

Perl has the HTML::Parser module. http://search.cpan.org/~gaas/HTML-Parser-3.69/Parser.pm

C has libxml2 with an HTMLParser module. http://laurentparenteau.com/blog/200...xml2-tutorial/

Java has the Swing HTML parser. http://java.sun.com/products/jfc/tsc...les/bookmarks/

Python also has an HTMLParser. http://docs.python.org/library/htmlparser.html
 
Old 12-16-2011, 07:23 PM   #3
Yaazkal
LQ Newbie
 
Registered: Dec 2011
Posts: 2

Original Poster
Rep: Reputation: Disabled
Hello jhwilliams, thanks for answer !

I'll take a look to sed and see if I can, if not I'll check maybe something in Java, or C#.

If anybody wants to give me examples or show me "the light" I will appreciate it
 
Old 12-16-2011, 07:39 PM   #4
jschiwal
LQ Guru
 
Registered: Aug 2001
Location: Fargo, ND
Distribution: SuSE AMD64
Posts: 15,733

Rep: Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682Reputation: 682
If it is the text of the html files that you want, consider the html2txt command. There is also xmlto and xsltproc you could try to process the html files and transform them to text.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Merge Of Html Files Into A Single Html (or Pdf) fiomba Linux - Software 10 05-11-2018 11:28 AM
[SOLVED] How to modify html data sent by web server application? foreigner82 Programming 7 12-13-2009 04:51 PM
how to modify ERROR 404 (html) and put my own page?? mandriva 2006 ssarrinah Mandriva 2 06-04-2008 07:26 PM
html code and including html files Hockeyfan Programming 2 08-22-2005 05:11 PM
easy way to wrap a lot of links for html output? the_rhino Programming 7 10-20-2004 12:40 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 12:23 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration