regex
Could someone please help me figure out the regex formula for this string?
City, ST 12345 City can be more than one word, and ST is always any two capital letters. Since the city name can be almost anything but is always followed by a comma so it might be a good idea to start matching at the comma. Thanks. I would really appreciate the member darkcrimson to get in tough with me regarding this thread. Thanks again to everyone. |
Basic (and more) regex info: http://www.regular-expressions.info/ . You should read through it if you want to learn regex.
We need to know where you are using this regular expression. ie: grep? a Perl script? Because there are different flavors of regex engines with different capabilities. And we need to know what you want to do with the match: just print it? Parse out the city, state, zip? Perl-Compatible Regular Expression: Code:
^(.+?), ([A-Z]{2}) (\d{5})$ That's: - ^ start of line - (.+?) One or more characters, non-greedy, captured - , literal comma, literal space - ([A-Z]{2}) two capital letters, captured - literal space - (\d{5}) Five numbers, captured - $ end of line |
You have done yourself the favor of describing how the regex should match the input. Having done this much is most of the work; the rest is just translating the long-form description to the concise regex version. AlucardZero has already mentioned the distinction between regex implementations in different tools and languages, so I'll just use Perl as an example.
You said 'anything but is always followed by a comma', which I will translate to the more accurate 'at least one of anything, followed by a comma'. Happily, there is an almost direct translation of these words to regex code. Code:
.+, + (at least one of the preceding) , (literal comma) Then, you didn't mention the whitespace, but it's there, and whitespace can sometimes be in multiples, so as long as we specify at least one, we'll be robust in how we match: Code:
\s+ + (at least one of the preceding) Then, you said 'always any two capital letters'. Nice, concise, and once again, directly translates to regex code Code:
[A-Z][A-Z] Now, more whitespace, as before, followed by what many would guess to be a US zip code of five digits. Now, for five digits, I will agree with AlucardZero,s example: Code:
\s+[0-9]{5} Code:
$address =~ m/(.+),\s+([A-Z][A-Z])\s+([0-9]{5})/; --- rod. |
regex and grep
Quote:
|
All times are GMT -5. The time now is 11:45 PM. |