-   Programming (
-   -   Help needed with understanding (Java) Regular Expression (

dwhitney67 11-30-2011 10:27 AM

Help needed with understanding (Java) Regular Expression
I'm maintaining a Java application that queries for email replies from an POP server. I have very limited knowledge interpreting regular expressions, and was wondering if someone could help me understand the following statement:

Pattern subjStart = Pattern.compile("^\\s*(?:[Rr][Ee]:\\s*" + SUBJECT_PREFIX + " )?([0-9A-Z]+)\\s*(?:.*)?");
The SUBJECT_PREFIX is a string that can be anything, including an empty string. Pattern is from the java.util.regex package.

firstfire 11-30-2011 12:13 PM


First, inspect these two links: 1, 2(scroll to "Regular Expressions, Literal Strings and Backslashes").

According to second link:

In literal Java strings the backslash is an escape character. The literal string "\\" is a single backslash. In regular expressions, the backslash is also an escape character. The regular expression \\ matches a single backslash. This regular expression as a Java string, becomes "\\\\". That's right: 4 backslashes to match a single one.
So, "\\s" is the character class \s equal to [ \t\n\x0B\f\r] -- a whitespace character.

From the first link:

Greedy quantifiers
X? X, once or not at all
Special constructs (non-capturing)
(?:X) X, as a non-capturing group
Therefore (?:.*)? means optional (note second ?) block of zero or more arbitrary characters. Such non-capturing groups are used to improve performance.

The same with "(?:[Rr][Ee]:\\s*" + SUBJECT_PREFIX + " )?" -- optional block, for example "Re: <SUBJECT_PREFIX>" or "RE: <SUBJECT_PREFIX>" etc.

The only capturing group here is '([0-9A-Z]+)' -- one or more capital alphanumeric characters.

Hope, I am correct and this will help.

dwhitney67 11-30-2011 02:07 PM


Originally Posted by firstfire (Post 4538342)
Hope, I am correct and this will help.

I will read over the information from the links you provided. Thanks a lot for dissecting the regex string I provided earlier. With the information you supplied, and that found within the Java API site, it shouldn't be too hard to grasp the layout of the regex.

Again, thanks for your help.

All times are GMT -5. The time now is 05:32 AM.