LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 07-11-2009, 05:24 AM   #1
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
BBCode replacement technique in PHP - using regular expressions


I am working on implementing a simple BBCode system for my comment forms in my blog software. For various reasons I want to avoid direct HTML. Currently using straightforward search and replace as in

[ b ] - > <b>
[ i ] - > <i>
[ code ] - > <code>
[ quote ] - > <blockquote>

etc

The problem with this approach is that I cannot really do error checking as there is no way to determine if tags are properly closed etc.

This can lead to bugs on the web page. For example a single commenter who does not close a code block can render the rest of the page in ugly fixed width font.

So is there a simple, yet safe way to implement bbcode using regular expressions. Since I'm using PHP and server side scripting, I really don't want to implement a whole Lexer/parser scanner algorithm for this.

Yet I'm sure regular expressions can handle this. Can anybody help me out here? Any tips or indications.

What I want to do is simply like

[ b ]sometext here[ /b ] to be replaced with <b>sometext here</b>

But I don't want to implement it if there is no end tag. All pointers and hints gratefully accepted.

(spaces used to avoid BBCode on this forum)

Last edited by vharishankar; 07-11-2009 at 05:48 AM.
 
Old 07-11-2009, 05:42 AM   #2
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
I think it will be safe if you put the user's input in e.g. a <div> or a <p>. That way the browser should ignore tags that are still open.

On a site note:
I think it's better to replace [ b ] by a <span class="myclass"> than by <b>. Nowadays it's the preferred way to use CSS so you can separate content from formatting.
 
Old 07-11-2009, 05:46 AM   #3
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
Actually most browsers will spill over the tags in the <div> or <p> even outside it because of incorrect implementation. But even otherwise, I'd prefer a cleaner solution to this.

As for using <span class= > I use it extensively to mark up special text which have some contextual meaning, but for normal markup of ordinary text inline, I still prefer the plain bold and italic tags.

I have searched the web for this, but I couldn't find a BBCode parsing using regexp to my liking. I prefer to use normal regexps to Perl regexps, as I am more comfortable with the POSIX regexps.

Last edited by vharishankar; 07-11-2009 at 05:48 AM.
 
Old 07-11-2009, 04:47 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
What about the BBCode extension of PHP? It seems to me that since BBCode allows nesting this is a case where regexps really aren't appropriate.
 
Old 07-11-2009, 09:44 PM   #5
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
Hi ntubski, unfortunately I may not be able to use PHP extensions, because I am hosting on a shared hosting provider and I have no control over which version of PHP is installed and which extensions are available.

However, I am leaning towards "growing my own" regexp for the moment. My needs are pretty simple and straightforward and too much advanced error handling is not needed. All I want to check for is whether every opening tag has a closing tag.
 
Old 07-12-2009, 10:58 AM   #6
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
After thinking a lot about the pros and cons of different approaches, I've implemented simple regexp rule that is not perfect, but at least matches opening and closing tags and prevents the possibility of overflowing the formatting. It's too trivial and I am using it only for simple tags like bold, italic, code and quote. I am not implementing any tag that requires attributes or nested elements, like lists.

Reg exp I used:
PHP Code:
$str_to_replace eregi_replace ("\[b\](.+)\[\/b\]""<b>\\1</b>"$str_to_replace);
$str_to_replace eregi_replace ("\[i\](.+)\[\/i\]""<i>\\1</i>"$str_to_replace);
// ...
// etc. 
It's very trivial though. Can you see anything wrong with it? So far it seems to be reasonably OK. I can live with improperly nested tags as I can always correct it manually. Writing a full-fledged BBCode grammar rules and a parser in PHP is probably too big an overhead for a small application.

Last edited by vharishankar; 07-12-2009 at 11:00 AM.
 
Old 07-12-2009, 11:12 AM   #7
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
Just make sure that it works with a multi-line text segment like shown below.
Code:
[ b ]Hi harishankar
hope it works

regards
WimS
[ /b ]
 
Old 07-12-2009, 11:15 AM   #8
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
Yes, it does works. Thanks for the suggestion. I didn't think of that before. But seems to work.

You can check my blog here for the comments section where I implemented the code:
http://harishankar.org/blog/entry.ph...ed-to-comments

Thanks again.
 
Old 07-12-2009, 03:06 PM   #9
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
I posted a case on your blog that doesn't quite work:
Code:
[b]bold[/b] [i]and[/i] [b]beautiful[/b]
Should render as
bold and beautiful

But shows up as
bold [/b] and [b]beautiful
 
Old 07-12-2009, 08:49 PM   #10
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
Hmm... thanks for the test. I think that the regular expression requires a bit of tweaking. I am not sure why it is not matching the first [/b].

It is being greedy. Is there any way to make that expression non-greedy? Adding a question mark at the end like (.+?) gives me an error in eregi_replace.

Code:
Warning: eregi_replace() [function.eregi-replace]: REG_BADRPT in /home/hari/public_html/harishankar.org/blog/Functions.php on line 1176

Last edited by vharishankar; 07-12-2009 at 09:13 PM.
 
Old 07-12-2009, 09:57 PM   #11
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
I fixed the issue by using PCRE instead of POSIX regular expressions:

There's no apparent way to prevent greedy parsing in ereg functions in PHP.

PHP Code:
$patterns = array ("/\[b\](.+?)\[\/b\]/i"
            
"/\[i\](.+?)\[\/i\]/i",
            
"/\[quote\](.+?)\[\/quote\]/i",
            
"/\[code\](.+?)\[\/code\]/i"
            
);        
$replacements = array (    "<b>$1</b>",
            
"<i>$1</i>""<blockquote>$1</blockquote>",
            
"<code>$1</code>"
);                             
        
$bb_str preg_replace ($patterns$replacements$str); 

Last edited by vharishankar; 07-12-2009 at 09:59 PM.
 
Old 07-13-2009, 12:05 AM   #12
Wim Sturkenboom
Senior Member
 
Registered: Jan 2005
Location: Roodepoort, South Africa
Distribution: Ubuntu 12.04, Antix19.3
Posts: 3,794

Rep: Reputation: 282Reputation: 282Reputation: 282
Regular expressions are by default greedy and will try to match as much as possible. And there is a way, but maybe not in php. From man re_syntax
Quote:
*? +? ?? {m}? {m,}? {m,n}?
non-greedy quantifiers, which match the same possibilities, but prefer the smallest number rather than the largest number of matches (see MATCHING)
PS
OK re-reading your last post and I see that you found that.

Last edited by Wim Sturkenboom; 07-13-2009 at 12:09 AM. Reason: added PS; I must learn to read
 
Old 07-13-2009, 12:15 AM   #13
vharishankar
Senior Member
 
Registered: Dec 2003
Distribution: Debian
Posts: 3,178

Original Poster
Blog Entries: 4

Rep: Reputation: 138Reputation: 138
Quote:
Originally Posted by Wim Sturkenboom View Post
Regular expressions are by default greedy and will try to match as much as possible. And there is a way, but maybe not in php. From man re_syntax


PS
OK re-reading your last post and I see that you found that.
Thanks. Actually the problem was that the ereg functions don't accept the qualifier to make the expression non-greedy. I thought of using ereg because they tend to be simpler.

Luckily preg functions work as well or better in most cases without significant overhead. PCRE is certainly more complex than POSIX regular expressions, but I think it is more featureful.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Applied regular expressions in PHP: Provisioning the Linksys PAP2T LXer Syndicated Linux News 0 10-24-2008 03:40 PM
regular expressions in php ALInux Programming 4 11-07-2005 11:48 AM
use BBCode in PHP script Boby Programming 2 08-22-2004 11:01 AM
Php and regular expressions logicdisaster Programming 9 06-26-2004 05:01 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:23 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration