LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 10-20-2010, 09:28 AM   #1
jcrowley
LQ Newbie
 
Registered: Mar 2006
Posts: 17

Rep: Reputation: 0
bash regexp string compare stopped working


Have a bash script which contains a line like this:

if [[ ${array[${last}]} =~ "screenpc.PRODUCTION.*" ]]

which WORKED as expected in bash 4.0.33 and now fails in 4.1.2

Instrumented the script to print the value of the left-hand side and it is exactly what is expected.

As noted above, this has been working fine until we installed Fedora 13 (kernel 2.6.33), and now it fails.

Tried setting shell 'extglob' to On with same results.

Did something change? Are there other shell/bash options that need to be set?

Thanks for any help -- this has the whole installation stopped!
 
Old 10-20-2010, 10:16 AM   #2
jcrowley
LQ Newbie
 
Registered: Mar 2006
Posts: 17

Original Poster
Rep: Reputation: 0
also tried compat31

Turned on this shell option -- still getting incorrect results.
 
Old 10-20-2010, 10:43 AM   #3
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Not sure why it worked previously, but the asterisk inside double quotes is treated literally. For a correct pattern matching you can try
Code:
if [[ ${array[${last}]} =~ screenpc.PRODUCTION.* ]]
but please note that in this case it is totally unnecessary, since you can obtain the same result by matching only the string screenpc.PRODUCTION. It would have sense if you wanted to match any part of the string embedded in other parts, for example:
Code:
if [[ ${array[${last}]} =~ screenpc.PRODUCTION.*something ]]
Edit: an aside note: the compat31 option works for me (even with the quoted pattern). I enabled it using shopt -s compat31.

Edit: after a little search I found the rule introduced in bash 3.2 which changes the behavior in respect of previous versions: from the bash reference manual:
Quote:
An additional binary operator, ‘=~’, is available, ... Any part of the pattern may be quoted to force it to be matched as a string.
and from the change log of Bash 3.2:
Quote:
Quoting the string argument to the [[ command's =~ operator now forces string matching, as with the other pattern-matching operators.
This means that if the entire pattern (or part of it) is embedded in quotes, it is treated as a string (not a pattern anymore).

Last edited by colucix; 10-20-2010 at 11:09 AM.
 
Old 10-20-2010, 02:40 PM   #4
jcrowley
LQ Newbie
 
Registered: Mar 2006
Posts: 17

Original Poster
Rep: Reputation: 0
That works -- thanks.

Also works dropping the .* as you said, although this still confuses me. The left-side string does have more characters (e.g. screenpc.PRODUCTION.20100115), so I thought the final .* would be needed so that the regexp actually matched.

You are implying that the =~ will be true if the right-hand side matches anything within the left-hand side? i.e. implicitly .*matchthis.*
 
Old 10-20-2010, 05:00 PM   #5
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Quote:
Originally Posted by jcrowley View Post
Also works dropping the .* as you said, although this still confuses me. The left-side string does have more characters (e.g. screenpc.PRODUCTION.20100115), so I thought the final .* would be needed so that the regexp actually matched.
To clarify, the =~ operator implies that the right-hand side is an extended regular expression. For the regexp rules the two characters .* together mean "zero or any number of occurrences of any single character". In practice it matches anything, including the null string.

In this case the expression
Code:
screenpc.PRODUCTION.*
matches any string containing "screenpc?PRODUCTION" where the question mark means any character, for example:
Code:
somethingherescreenpc.PRODUCTIONsomethingelse
any text screen.PRODUCTION any text
screenZPRODUCTION
9999screen3PRODUCTION9999
and so on. The same happens if you omit the .* at the end, since it matches any character (dot) that appears zero or more times (asterisk).

The question is: do you want to match a string like
Code:
screenpc.PRODUCTION.20100115
where a literal dot followed by an eight-digit date MUST appear after screen.PRODUCTION? In that case a more refined regular expression could be:
Code:
if [[ ${array[${last}]} =~ screenpc\.PRODUCTION\.[0-9]{8} ]]
where the dots are escaped to match their literal meaning, and a number must be repeated 8 times after the second dot.

Moreover, if you want to match the exact string without any other character before or after the string itself, you can use anchors as in
Code:
if [[ ${array[${last}]} =~ ^screenpc\.PRODUCTION\.[0-9]{8}$ ]]
Finally, from bash >= 3.2 we can write this expression as
Code:
if [[ ${array[${last}]} =~ ^"screenpc.PRODUCTION."[0-9]{8}$ ]]
where the part between double quotes has to be interpreted as literal (dots included). Hope this clarifies.
 
Old 10-20-2010, 08:43 PM   #6
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Although I do this with great trepidation, I need to make an amendment to colucix's post:
Quote:
matches any string containing "screenpc?PRODUCTION" where the question mark means any character, for example:
A question mark means 0 or 1 of the previous character, so it will match the following:
Code:
screenpcPRODUCTION
screenpPRODUCTION

# but not
screenpc.PRODUCTION
# As this now has 2 characters between p and P
 
Old 10-21-2010, 01:44 AM   #7
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Hi grail! Actually the question mark was a personal notation (not syntax). BTW, thank you for the notification, I should have chosen another character or maybe another color to avoid confusion.
 
Old 10-21-2010, 07:22 AM   #8
jcrowley
LQ Newbie
 
Registered: Mar 2006
Posts: 17

Original Poster
Rep: Reputation: 0
Still somewhat confused, so I'm missing something. Here's how I interpreted the matching logic -- could you tell me where I'm off track?

aabbcc =~ aabbcc matches trivially since the strings are identical

aabbcc =~ aabb.* matches -- the 'aabb' sections match exactly, then the .* matches the 'cc' section

aabbcc =~ aabb does not match -- the 'aabb' sections match but there is nothing in the regexp to match the 'cc' portion on the left

aabbcc =~ bbcc does not match -- the 'bbcc' sections match, but there is nothing in the regexp to match the 'aa' portion

aabbcc =~ .*bbcc matches -- the .* matches the leading 'aa', and then the 'bbcc' sections match

aabbcc =~ .*bb.* matches -- the first .* matches the 'aa', then the 'bb' sections match, then the trailing .* matches the 'cc'

So in the actual case, I would expect these results:

screenpc.PRODUCTION.20100908 =~ screenpc.PRODUCTION.* matches -- the 'screenpc' matches, the first '.' matches any character (which just happens to also be a '.' in the original string), then 'PRODUCTION' matches, and finally the '.*' matches any set of trailing characters -- '.20100908' in this case.

screenpc.PRODUCTION.20100908 =~ screenpc.PRODUCTION does not match -- the 'screenpc.PRODUCTION' section matches as above, but then there is nothing in the regexp to match the '.20100908' portion of the original string.


If the last case does in fact produce a match, then I would think that the definition of the '=~' operator needs to be stated as:

"the regexp on the right matches A SUBSTRING in the string on the left"

i.e. it's more a 'search for a string' as opposed to 'the regexp matches the string on the left' -- which may in fact be the actual definition, and the book I've looked at is imprecise.

Sorry to belabor the point, but since the system does in fact appear to match the last example as you said, it's clear that I'm missing something fundamental and would like to get it straight.

Thanks
 
Old 10-21-2010, 08:49 AM   #9
colucix
LQ Guru
 
Registered: Sep 2003
Location: Bologna
Distribution: CentOS 6.5 OpenSuSE 12.3
Posts: 10,509

Rep: Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983Reputation: 1983
Actually you miss a main point: a regular expression is a kind of search pattern. You have a string of any length and a regular expression which describes a sequence of characters to be searched inside the string. In other words a string matches a regular expression when it contains the minimal sequence of characters described by the regular expression itself.

Hence it is not mandatory to write a regular expression that matches the entire string. Nevertheless you can refine the regular expression to match only the string (or a set of possible strings) you want.

An example of regular expression refinement: the following:
Code:
.
matches any string except the null string. This means that any string of one or more characters is matched. If you want to match a string of at least three characters you will use
Code:
...
If you want to match a string whose length is exactly three characters, you have to use anchors to match the beginning and the end of the string:
Code:
^...$
Better now?
 
Old 10-21-2010, 09:22 AM   #10
jcrowley
LQ Newbie
 
Registered: Mar 2006
Posts: 17

Original Poster
Rep: Reputation: 0
Yes, that is exactly the clarification I needed.

Thanks.
 
Old 10-21-2010, 12:03 PM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192Reputation: 3192
Glad you go there Please mark as SOLVED now you have a solution.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] BASH: if Variable -eq String not working worm5252 Programming 2 01-24-2010 03:07 PM
Regexp for changing only part of the found string maginotjr Programming 4 01-21-2009 04:27 AM
Perl/regexp help... - query string parsing... lowpro2k3 Programming 4 05-11-2005 05:18 PM
compare string in SQL hus Programming 9 04-27-2005 10:17 AM
compare string in C++ danxl Programming 3 11-02-2003 02:14 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 01:43 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration