[SOLVED] bash regexp string compare stopped working
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Not sure why it worked previously, but the asterisk inside double quotes is treated literally. For a correct pattern matching you can try
Code:
if [[ ${array[${last}]} =~ screenpc.PRODUCTION.* ]]
but please note that in this case it is totally unnecessary, since you can obtain the same result by matching only the string screenpc.PRODUCTION. It would have sense if you wanted to match any part of the string embedded in other parts, for example:
Code:
if [[ ${array[${last}]} =~ screenpc.PRODUCTION.*something ]]
Edit: an aside note: the compat31 option works for me (even with the quoted pattern). I enabled it using shopt -s compat31.
Edit: after a little search I found the rule introduced in bash 3.2 which changes the behavior in respect of previous versions: from the bash reference manual:
Quote:
An additional binary operator, ‘=~’, is available, ... Any part of the pattern may be quoted to force it to be matched as a string.
and from the change log of Bash 3.2:
Quote:
Quoting the string argument to the [[ command's =~ operator now forces string matching, as with the other pattern-matching operators.
This means that if the entire pattern (or part of it) is embedded in quotes, it is treated as a string (not a pattern anymore).
Also works dropping the .* as you said, although this still confuses me. The left-side string does have more characters (e.g. screenpc.PRODUCTION.20100115), so I thought the final .* would be needed so that the regexp actually matched.
You are implying that the =~ will be true if the right-hand side matches anything within the left-hand side? i.e. implicitly .*matchthis.*
Also works dropping the .* as you said, although this still confuses me. The left-side string does have more characters (e.g. screenpc.PRODUCTION.20100115), so I thought the final .* would be needed so that the regexp actually matched.
To clarify, the =~ operator implies that the right-hand side is an extended regular expression. For the regexp rules the two characters .* together mean "zero or any number of occurrences of any single character". In practice it matches anything, including the null string.
In this case the expression
Code:
screenpc.PRODUCTION.*
matches any string containing "screenpc?PRODUCTION" where the question mark means any character, for example:
Code:
somethingherescreenpc.PRODUCTIONsomethingelse
any text screen.PRODUCTION any text
screenZPRODUCTION
9999screen3PRODUCTION9999
and so on. The same happens if you omit the .* at the end, since it matches any character (dot) that appears zero or more times (asterisk).
The question is: do you want to match a string like
Code:
screenpc.PRODUCTION.20100115
where a literal dot followed by an eight-digit date MUST appear after screen.PRODUCTION? In that case a more refined regular expression could be:
Code:
if [[ ${array[${last}]} =~ screenpc\.PRODUCTION\.[0-9]{8} ]]
where the dots are escaped to match their literal meaning, and a number must be repeated 8 times after the second dot.
Moreover, if you want to match the exact string without any other character before or after the string itself, you can use anchors as in
Code:
if [[ ${array[${last}]} =~ ^screenpc\.PRODUCTION\.[0-9]{8}$ ]]
Finally, from bash >= 3.2 we can write this expression as
Code:
if [[ ${array[${last}]} =~ ^"screenpc.PRODUCTION."[0-9]{8}$ ]]
where the part between double quotes has to be interpreted as literal (dots included). Hope this clarifies.
Hi grail! Actually the question mark was a personal notation (not syntax). BTW, thank you for the notification, I should have chosen another character or maybe another color to avoid confusion.
Still somewhat confused, so I'm missing something. Here's how I interpreted the matching logic -- could you tell me where I'm off track?
aabbcc =~ aabbcc matches trivially since the strings are identical
aabbcc =~ aabb.* matches -- the 'aabb' sections match exactly, then the .* matches the 'cc' section
aabbcc =~ aabb does not match -- the 'aabb' sections match but there is nothing in the regexp to match the 'cc' portion on the left
aabbcc =~ bbcc does not match -- the 'bbcc' sections match, but there is nothing in the regexp to match the 'aa' portion
aabbcc =~ .*bbcc matches -- the .* matches the leading 'aa', and then the 'bbcc' sections match
aabbcc =~ .*bb.* matches -- the first .* matches the 'aa', then the 'bb' sections match, then the trailing .* matches the 'cc'
So in the actual case, I would expect these results:
screenpc.PRODUCTION.20100908 =~ screenpc.PRODUCTION.* matches -- the 'screenpc' matches, the first '.' matches any character (which just happens to also be a '.' in the original string), then 'PRODUCTION' matches, and finally the '.*' matches any set of trailing characters -- '.20100908' in this case.
screenpc.PRODUCTION.20100908 =~ screenpc.PRODUCTION does not match -- the 'screenpc.PRODUCTION' section matches as above, but then there is nothing in the regexp to match the '.20100908' portion of the original string.
If the last case does in fact produce a match, then I would think that the definition of the '=~' operator needs to be stated as:
"the regexp on the right matches A SUBSTRING in the string on the left"
i.e. it's more a 'search for a string' as opposed to 'the regexp matches the string on the left' -- which may in fact be the actual definition, and the book I've looked at is imprecise.
Sorry to belabor the point, but since the system does in fact appear to match the last example as you said, it's clear that I'm missing something fundamental and would like to get it straight.
Actually you miss a main point: a regular expression is a kind of search pattern. You have a string of any length and a regular expression which describes a sequence of characters to be searched inside the string. In other words a string matches a regular expression when it contains the minimal sequence of characters described by the regular expression itself.
Hence it is not mandatory to write a regular expression that matches the entire string. Nevertheless you can refine the regular expression to match only the string (or a set of possible strings) you want.
An example of regular expression refinement: the following:
Code:
.
matches any string except the null string. This means that any string of one or more characters is matched. If you want to match a string of at least three characters you will use
Code:
...
If you want to match a string whose length is exactly three characters, you have to use anchors to match the beginning and the end of the string:
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.