Meaning of regular expression

Faki · 04-15-2022, 01:10 PM

Would like to understand the meaning of the following regular expression.

Code:

([^)]*)

pan64 · 04-15-2022, 01:13 PM

you can use www.regex101.com in such cases, that is an excellent site.

Faki · 04-15-2022, 01:20 PM

Have applied it to the following code but still having difficulty.

Code:

(interactive
   (list (read-regexp "Regex: ")
	 (region-beginning)
	 (region-end) ))

Using www.regex101.com gave

Code:

([^)]*)
gm
1st Capturing Group ([^)]*)
Match a single character not present in the list below [^)]
* matches the previous token between zero and unlimited times, as many times as possible, giving back as needed (greedy)
) matches the character ) with index 4110 (2916 or 518) literally (case sensitive)

Counting the number of matches gives 3.

pan64 · 04-15-2022, 01:22 PM

Is there a question here? You can also enter your sample text (on that page) and see how it works.

Faki · 04-15-2022, 01:35 PM

The question is still this. What does the following expression match? And how does it match three times on the mentioned example?

Code:

([^)]*)

pan64 · 04-15-2022, 01:41 PM

Ok, so go thru:

Code:

(       # grouping
[^)]    # Match a single character not present in the list below = anything but )
*       # matches the previous token between zero and unlimited times, as many times as possible
)       # closing paren of the group

so it means altogether any number of any char but ).
But there can be another solution:

Code:

(
any number of any char but )
)

It depends on your regex engine (if a paren is taken as a paren or used for grouping)

sundialsvcs · 04-15-2022, 01:50 PM

There are two parts to it:

First, the expression is looking for "zero or more repetitions of" one of two possible characters.

Then, by enclosing the group in parentheses, it is identifying it as a string that can be returned to the caller of the string-matching function, giving the caller the exact sequence of characters (possibly in this case "empty string") that were matched. A regular expression can have any number of these parenthesized groups, each of which is returned as an element in an array of values.

So – this gives you, not only the yes/no fact that "there was a successful match," but – "exactly what matched in various designated portions of the input string. You can be provided, not only with the matching string-value, but exactly where in the input string it was found.

teckk · 04-15-2022, 03:49 PM

Quote:

The question is still this. What does the following expression match?

Why don't you try it and see for yourself.

Code:

text="
abcd()efg()hij&
12345
5789()abc
"

grep -Eo '([^)]*)' <<< "$text"

grep -Po '([^)]*)' <<< "$text"

grep -Eo '([^0-9]*)' <<< "$text"

echo "$%^&()[]{}123()345()789" | grep -o '([^)]*)'

echo "$%^&()[]{}123()345()789" | grep -Eo '([^)]*)'

etc.

Faki · 04-16-2022, 09:44 AM

I cannot understand how the following gives

Result:

Code:

()
()
()

Code:

echo "$%^&()[]{}123()345()789" | grep -o '([^)]*)'

I understand you want me to try for myself, but am not making sense of the results.

allend · 04-16-2022, 10:11 AM

With the -o option, grep prints only the matched (non-empty) parts of matching lines, with each such part on a separate output line.
Consider

Code:

bash-5.1$ echo "$%^&(x)[]{}123(xx)345(xxx)789" | grep -o '[&35]([^)]*)'
&(x)
3(xx)
5(xxx)

That regex says to match for any of three characters & or 3 or 5, followed by a left parenthesis followed by zero or more characters that are not right parenthesis followed by right parenthesis.

dugan · 04-16-2022, 10:12 AM

Would it be easier to understand if you just simplified the regex to this?

Code:

[^)]*

You're not doing anything with the capturing groups, so you can take out the outer brackets.

I also don't get where you're seeing that it matches "three times". It matches every substring that starts with a character that isn't either whitespace or a closing parenthese, so you should typically get a lot more than three matches.

Quote:

Originally Posted by sundialsvcs

First, the expression is looking for "zero or more repetitions of" one of two possible characters.

That's wrong. When a caret appears as the first character between two square brackets, it's a negation character.

Faki, obviously, the regex you wrote is wrong and it's not doing what you intended, whatever that is (you forgot to tell us). Were you looking for something to match the insides of parenetheses?

rknichols · 04-16-2022, 03:55 PM

Quote:

Originally Posted by dugan

Would it be easier to understand if you just simplified the regex to this?

Code:

[^)]*

You're not doing anything with the capturing groups, so you can take out the outer brackets.

If these are basic (not extended) regular expressions, the those "outer brackets" are taken literally, and the expression

Code:

([^)]*)

means "a left parenthesis, followed by any number (zero or more) of characters that are not a right parenthesis, and finally a right parenthesis."

This almost matches the innermost level of a multi-level parenthesized expression, which seems to be what was intended, but if so, the regex isn't quite right. If that were indeed the intent, this seems to work:

Code:

 grep '([^()]*)'

It is useful to include the "--color" option so that you can see just what is being matched.

Note also that grep processes each input line separately, so a multi-line block of code such as in post #3 might not yield the expected result.

teckk · 04-16-2022, 04:34 PM

Yup, that's nifty.

Code:

text="(Mary)(had)a(little)lamb(.)x(123)"

grep -o '([^()]*)' <<< "$text"

grep -o '([^()]*[!0-9])' <<< "$text"

grep -o '([^()]*[!A-Za-z])' <<< "$text"

grep -Eo '([^()]*)' <<< "$text"

sundialsvcs · 04-17-2022, 07:38 PM

Quote:

Originally Posted by dugan

That's wrong. When a caret appears as the first character between two square brackets, it's a negation character.

Correct as noted. "Me bad. Oopsie!" The correct interpretation of ^) in this case is: "is not a left parenthesis."

sundialsvcs · 04-17-2022, 07:45 PM

Here's something to consider: "a regular expression is a very tiny computer program, condensed into a single pregnant line of text." And this just might be the very best way to consider it ... because this actually is how the software implementations of "regular-expression handlers" approach the problem: they "compile" the expression into an intermediate form, then they "execute" it.

The "regex language" is not actually terribly complicated, at least in most cases. But, it is "extremely compact," and there are very many tutorials and practice-websites which can help you to understand it.

FYI: It is also useful to notice that there are many libraries of "pre-debugged regular expressions" out there, for various languages. So, before you spend too much time trying to "puzzle out" your particular problem as though it were brand-new, look to see if it hasn't already been solved. (The "regular expression syntax" used by most languages is also usually the same, so you might be able to "steal" a workable regex from some other language's library.)