[SOLVED] BBCode replacement technique in PHP - using regular expressions
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
BBCode replacement technique in PHP - using regular expressions
I am working on implementing a simple BBCode system for my comment forms in my blog software. For various reasons I want to avoid direct HTML. Currently using straightforward search and replace as in
The problem with this approach is that I cannot really do error checking as there is no way to determine if tags are properly closed etc.
This can lead to bugs on the web page. For example a single commenter who does not close a code block can render the rest of the page in ugly fixed width font.
So is there a simple, yet safe way to implement bbcode using regular expressions. Since I'm using PHP and server side scripting, I really don't want to implement a whole Lexer/parser scanner algorithm for this.
Yet I'm sure regular expressions can handle this. Can anybody help me out here? Any tips or indications.
What I want to do is simply like
[ b ]sometext here[ /b ] to be replaced with <b>sometext here</b>
But I don't want to implement it if there is no end tag. All pointers and hints gratefully accepted.
(spaces used to avoid BBCode on this forum)
Last edited by vharishankar; 07-11-2009 at 05:48 AM.
I think it will be safe if you put the user's input in e.g. a <div> or a <p>. That way the browser should ignore tags that are still open.
On a site note:
I think it's better to replace [ b ] by a <span class="myclass"> than by <b>. Nowadays it's the preferred way to use CSS so you can separate content from formatting.
Actually most browsers will spill over the tags in the <div> or <p> even outside it because of incorrect implementation. But even otherwise, I'd prefer a cleaner solution to this.
As for using <span class= > I use it extensively to mark up special text which have some contextual meaning, but for normal markup of ordinary text inline, I still prefer the plain bold and italic tags.
I have searched the web for this, but I couldn't find a BBCode parsing using regexp to my liking. I prefer to use normal regexps to Perl regexps, as I am more comfortable with the POSIX regexps.
Last edited by vharishankar; 07-11-2009 at 05:48 AM.
Hi ntubski, unfortunately I may not be able to use PHP extensions, because I am hosting on a shared hosting provider and I have no control over which version of PHP is installed and which extensions are available.
However, I am leaning towards "growing my own" regexp for the moment. My needs are pretty simple and straightforward and too much advanced error handling is not needed. All I want to check for is whether every opening tag has a closing tag.
After thinking a lot about the pros and cons of different approaches, I've implemented simple regexp rule that is not perfect, but at least matches opening and closing tags and prevents the possibility of overflowing the formatting. It's too trivial and I am using it only for simple tags like bold, italic, code and quote. I am not implementing any tag that requires attributes or nested elements, like lists.
It's very trivial though. Can you see anything wrong with it? So far it seems to be reasonably OK. I can live with improperly nested tags as I can always correct it manually. Writing a full-fledged BBCode grammar rules and a parser in PHP is probably too big an overhead for a small application.
Last edited by vharishankar; 07-12-2009 at 11:00 AM.
Hmm... thanks for the test. I think that the regular expression requires a bit of tweaking. I am not sure why it is not matching the first [/b].
It is being greedy. Is there any way to make that expression non-greedy? Adding a question mark at the end like (.+?) gives me an error in eregi_replace.
Code:
Warning: eregi_replace() [function.eregi-replace]: REG_BADRPT in /home/hari/public_html/harishankar.org/blog/Functions.php on line 1176
Last edited by vharishankar; 07-12-2009 at 09:13 PM.
Regular expressions are by default greedy and will try to match as much as possible. And there is a way, but maybe not in php. From man re_syntax
Quote:
*? +? ?? {m}? {m,}? {m,n}?
non-greedy quantifiers, which match the same possibilities, but prefer the smallest number rather than the largest number of matches (see MATCHING)
PS
OK re-reading your last post and I see that you found that.
Last edited by Wim Sturkenboom; 07-13-2009 at 12:09 AM.
Reason: added PS; I must learn to read
Regular expressions are by default greedy and will try to match as much as possible. And there is a way, but maybe not in php. From man re_syntax
PS
OK re-reading your last post and I see that you found that.
Thanks. Actually the problem was that the ereg functions don't accept the qualifier to make the expression non-greedy. I thought of using ereg because they tend to be simpler.
Luckily preg functions work as well or better in most cases without significant overhead. PCRE is certainly more complex than POSIX regular expressions, but I think it is more featureful.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.