Regex help

TheCrow33 · 01-08-2011, 06:42 PM

I'm currently working on a website for my employer, and trying to use a tool called Rereplacer to make my life easier. Anyway the idea is I'm going to create one article with a few keywords like PRODUCTNAME that will be replaced based on the value held in the <title> tags.

Rereplacer just uses regular expressions to replace given strings with other strings. Anyway I'm fairly certain I can take just the text within the <title> tags and replace the word PRODUCTNAME with the result, anyone know how it could be done?

Just for simplicity sake let's assume this is the format of the answer I'm looking for:

To replace: "regexReplaceStr"
Replace with: "replacementStr"

Thanks for any help.

P.S. I wouldn't normally ask such a skiddie question, but my boss is quite impatient and if I didn't I'd be looking for a new job.

AlucardZero · 01-08-2011, 07:48 PM

This doesn't need a regular expression.. and I have no clue how Rereplacer is expecting input.

If I was using sed it would be: sed "s/<title>regexReplaceStr/<title>replacementStr/g"

Also not sure what you mean by skiddie (script kiddie) .. you DDoSing someone?

Nominal Animal · 01-08-2011, 08:16 PM

First match on Google search for rereplacer is the Joomla ReReplacer description, complete with links to examples and regular expression cheatsheets; there's even a link to the forum, with a separate category for ReReplacer. Is this the software you're using? If so, why did you ask here?

There's even the inverse problem (changing title based on text in a div) in a post the developer responded to. Adapted for your question, the search pattern would be

Code:

(<title>)(.*)(</title>.*)PRODUCTNAME

and the replacement

Code:

\1\2\3\2

with 'Search area' set to 'Everywhere'. This should replace PRODUCTNAME everywhere with whatever you have in your "title" element.

Nominal Animal

TheCrow33 · 01-09-2011, 09:44 AM

Thank you nomial animal. I had indeed found that post on the rereplacer website, which is the only way I knew this could be done with a regular expression. The only reason I did not post on their website is because it seemed to me that just to sign up for their forums costs money (They ask for a "billing address" in the registration process). And I figured you guys would be quicker to respond, and free. Anyway after I found that post I tried to manipulate that regex to do what I wanted, but like I said I don't have much knowledge of regular expressions and couldn't succesfully do it. And quite frankly that regex confused the hell out of me.

AlcardZero: no I'm not DDoSing someone, and quite frankly that seems a stupid question. The original question was skiddie (yes, script kiddie) like because I was a asking for someone else to do the work and hand me an answer. I also did not expect you to know how rereplacer expected input, that is why I gave a format of how I expected an answer.

AlucardZero · 01-09-2011, 11:21 AM

We have a different definition of skiddie then.

TheCrow33 · 01-09-2011, 01:25 PM

Quote:

Originally Posted by AlucardZero

We have a different definition of skiddie then.

I don't think it's our definitions that differ, but rather I was referring to that question being skiddie'ish in nature. I would expect any skiddie to ask for an answer handed to them on a silver platter with no explanation of how it works (as I did in this post), and without doing any actual work to get a working solution. That's all I was referring to.

Nominal Animal · 01-09-2011, 01:59 PM

Quote:

Originally Posted by TheCrow33

Anyway after I found that post I tried to manipulate that regex to do what I wanted, but like I said I don't have much knowledge of regular expressions and couldn't succesfully do it. And quite frankly that regex confused the hell out of me.

Well, it would have saved time if you had told us that. Let me explain how it works, by splitting into pieces. But first, some tips:

Characters listed in square brackers [ ] are alternatives. Ranges like A-Z are supported. If the first character is a caret, ^, the set is inverted: any character except the listed match.
Period . matches any character.
Asterisk * means any number (zero or more) of the preceding character or subexpression.
Question mark ? means zero or one of the preceding character or subexpression.
Parentheses () define a subexpression which can be referred to in the replacement. First subexpression is referred to as \1, second \2 and so on.
Subexpressions can be nested, but only the outermost ones can be referred to.
Within subexpressions, vertical bar | separates alternative subexpressions. For example, (abc|def) matches either abc or def.

Lets examine the expression (<title>)(.*)(</title>.*)PRODUCTNAME:

The first subexpression, (<title>) starts the match with the title tag: <title>. Since it's also the first subexpression in parentheses, we can refer to it in the replacement as \1.
If you wish to be careful, you can use (<[Tt][Ii][Tt][Ll][Ee](>|[\n\t\v\f\r ][^>]*>)) instead, to match uppercase title tags and title tags with attributes.
(Note the inner subexpression: it will match either an immediate >, or a whitespace followed by anything up to the first >.)
The second subexpression, (.*) matches anything (or nothing). It is a greedy match, so it will contain anything up to but not including the last match of the following subexpression. If there were no following subexpression or characters, it would match till the end of the document.
The third subexpression, (</title>.*) is the tricky one. Not only does it match the title end tag, but also everything up to but not including the last match of the following subexpression. (Again, you might wish to use (</[Tt][Ii][Tt][Ll][Ee]>.*) instead.)
Finally, there is the target identifier, which we wish to replace: PRODUCTNAME .

The way this works is quite simple. The match starts at the beginning of the title element, and contains everything up to the end of the string to be replaced. (Specifically, the third subexpression will contain most of your HTML document.)
ReReplacer will replace any matches with \1\2\3\2, which means the title start tag \1, title \2, title end tag and most of the HTML document up to but not including the replacement string \3, followed by the title again \2.

I don't know if ReReplacer applies this repeatedly or not. If you find that only the last occurrence of PRODUCTNAME is replaced with the title text, you simply need to copy this multiple times (to apply it multiple times), once for each possible occurrence of PRODUCTNAME.

Note that PRODUCTNAME itself should never occur in the title. If it does, you'll get rather interesting but unwanted results.

Hope this helps,

Nominal Animal