LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Some question when using regular expression , ask for help! (https://www.linuxquestions.org/questions/programming-9/some-question-when-using-regular-expression-ask-for-help-898609/)

915086731 08-21-2011 08:20 AM

Some question when using regular expression , ask for help!
 
Please see the following code,
Code:

[saturn@saturn-pc new]$ [[ "aab" =~ ab ]] && echo "ok" || echo "error";
ok
[saturn@saturn-pc new]$ [[ "aab" =~ "ab" ]] && echo "ok" || echo "error";
ok
[saturn@saturn-pc new]$

"aab" should not match ab , Can you tell me why?

Code:

[saturn@saturn-pc new]$ [[ "aab" =~ a*b ]] && echo "ok" || echo "error"
ok
[saturn@saturn-pc new]$ [[ "aab" =~ *ab ]] && echo "ok" || echo "error"
error
[saturn@saturn-pc new]$

why *ab does not match "aab" ?
Thanks!

syg00 08-21-2011 08:33 AM

Quote:

Originally Posted by 915086731 (Post 4449274)
"aab" should not match ab , Can you tell me why?

Can you tell us why you think it should not. Perhaps explain what you think "=~" means
Quote:

why *ab does not match "aab" ?Thanks!
Again, tell us why you think it should - regex is not (shell) globbing.

jlinkels 08-21-2011 08:34 AM

In a regular expression '*' does not mean match any character zero or more occurences.

This is confusing, as in filename matching it does. If you tried the ls command, ls *ab would match aab.

What matches any characted in a regular expression is '.' (period). So matching any character zero or more times would be '.*'. To find it at the start of a line, use '^.*'.

jlinkels

colucix 08-21-2011 08:42 AM

Quote:

Originally Posted by 915086731 (Post 4449274)
Please see the following code,
Code:

[saturn@saturn-pc new]$ [[ "aab" =~ ab ]] && echo "ok" || echo "error";
ok
[saturn@saturn-pc new]$ [[ "aab" =~ "ab" ]] && echo "ok" || echo "error";
ok
[saturn@saturn-pc new]$

"aab" should not match ab , Can you tell me why?

On the contrary, aab matches the regular expression ab, since it does contain the substring ab. If you want to match only the ab literal string, you have to insert word boundaries in the regular expression:
Code:

$ [[ "aab" =~ \\bab\\b ]] && echo "ok" || echo "error"
error
$ [[ "ab" =~ \\bab\\b ]] && echo "ok" || echo "error"
ok

where the \b is the word boundary specification and the preceding backslash is to escape the backslash, so that bash interprets it correctly.

Anyway this behaviour works only until bash version 3.1. To do the same in bash 3.2 and newer, you have to set the option compat31:
Code:

shopt -s compat31

915086731 08-21-2011 08:50 AM

Thanks syg00
I thinks ab is only a fixed string, it does not contain any metacharacter such as . or * or ?, so I am very puzzled.

Code:

[saturn@saturn-pc new]$ [[ "aab" =~ ab ]] && echo "ok" || echo "error"
ok
[saturn@saturn-pc new]$ [[ "aaaaab" =~ ab ]] && echo "ok" || echo "error"
ok            !!ab seems has the same effect of a*b !!
[saturn@saturn-pc new]$ [[ "aaaaab" =~ a*b ]] && echo "ok" || echo "error"
ok
[saturn@saturn-pc new]$ [[ "aabaab" =~ a*b ]] && echo "ok" || echo "error"
ok            !!a*b means repeat "a" zero or more times, but it seems match any char one or more times !!


ghostdog74 08-21-2011 09:48 AM

Regular expression is not the same as shell globbing...if you want to match "ab", then use a fixed string...in the shell, you can just do a simple case/esac without using regular expression
Code:

case "aab" in
"ab" ) echo "ok";;
*) echo "not ok";;
esac

or the if/else statement....using the = sign.

theNbomr 08-21-2011 10:11 AM

Quote:

Originally Posted by ghostdog74 (Post 4449336)
Regular expression is not the same as shell globbing..

Exactly correct, and it is not the same as string comparison. A regular expression tries to match any part of the string against which it is tested. The regex 'ab' will match any substring of 'aab', and since there is a substring 'ab', the test evaluates to 'True'.
--- rod.

915086731 08-21-2011 10:37 AM

Thanks all


All times are GMT -5. The time now is 10:41 PM.