LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 05-19-2008, 08:21 AM   #1
craig467
Member
 
Registered: Jun 2005
Location: Maine
Distribution: Red Hat 9
Posts: 65

Rep: Reputation: 15
regex HTML Character Entities


I am looking for a regex statement that will search the string:

Code:
<div style="margin-left:50;"><blockquote>This is the text that would be the quote that would go here.</blockquote></div>
and convert the HTML Character Entities (&lt;, &gt;, and &quot into there respective HTML counter parts (<, >, ") for ONLY the div tags, leaving the other tags alone and keeping their HTML character entities in tact.

Can someone help please, I have no idea how to tackle this.

Thanks.
 
Old 05-19-2008, 08:38 AM   #2
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
No time at the moment to test anything....

This general form in SED will be useful:
sed 's/&lt;\(div.*\);&gt/<\1>/' oldfile > newfile

This uses a backreference to capture everything between &lt and &gt, and then insert it between < and >. You may need to escape one or more of the special characters.

Good SEd tutorial here: http://www.grymoire.com/Unix/Sed.html
 
Old 05-20-2008, 09:04 AM   #3
craig467
Member
 
Registered: Jun 2005
Location: Maine
Distribution: Red Hat 9
Posts: 65

Original Poster
Rep: Reputation: 15
Thanks Pixellany, I am using CGI and before your post I tried
(&lt/*div[^(&gt]*(&gt
as the matching string, but it only got me the closing tag. I will try you suggestion and let you know how it turns out.
 
Old 05-20-2008, 09:06 AM   #4
craig467
Member
 
Registered: Jun 2005
Location: Maine
Distribution: Red Hat 9
Posts: 65

Original Poster
Rep: Reputation: 15
Sorry the smiley faces should be semicolons and parenthesis. I did not know how to stop that.
 
Old 05-20-2008, 10:38 AM   #5
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
There's a "Disable smilies in text" option in the advanced view. :)
 
Old 05-20-2008, 12:09 PM   #6
gnashley
Amigo developer
 
Registered: Dec 2003
Location: Germany
Distribution: Slackware
Posts: 4,928

Rep: Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612Reputation: 612
Saw this (here, I believe) the other day:

Code:
# this is a function to convert characters to their html-encodings

brackets () {
    	sed -e '
        s/</&lt;/g
        s/>/&gt;/g
        s,/,%2f,g
        s/?/%3f/g
        s/:/%3a/g
        s/@/%40/g
        s/&/%26/g
        s/=/%3d/g
        s/+/%2b/g
        s/\$/%24/g
        s/,/%2c/g
        s/ /%20/g
        '
}
 
Old 05-20-2008, 06:59 PM   #7
osor
HCL Maintainer
 
Registered: Jan 2006
Distribution: (H)LFS, Gentoo
Posts: 2,450

Rep: Reputation: 78
Quote:
Originally Posted by gnashley View Post
Saw this (here, I believe) the other day
Unfortunately, this processes html and URI escape sequences, and is not of much use to the OP since
  1. It will make global changes instead of only in the div tags.
  2. It will do undesired URI filtering as well (e.g., all spaces in the original file will become %20).
  3. It does the reverse of operation desired.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
regex with sed to process file, need help on regex dwynter Linux - Newbie 5 08-31-2007 05:10 AM
html hex representation of character kpachopoulos Programming 2 06-12-2007 10:39 PM
Extracting name and address from html page using grep and regex swiftguy121 Linux - Software 2 03-19-2007 12:41 AM
JS: can html entities be found in a string? eantoranz Programming 4 08-22-2006 04:05 PM
pdftotext - How to output to html with ampersand entities ? narc Linux - Software 2 01-04-2006 02:34 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration