LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   regex pattern matching, with open braces or quotes being closed (https://www.linuxquestions.org/questions/linux-general-1/regex-pattern-matching-with-open-braces-or-quotes-being-closed-4175588034/)

ansh007 08-26-2016 08:01 AM

regex pattern matching, with open braces or quotes being closed
 
Dear members,
I need some help with pattern matching, shell or perl, anything would help.

Problem : I need to write a function/module that recognizes a valid regex matching. For an example : (abc*) is valid, but (abc* is not. What I mean is opening and closing of braces and quotes is a must that I am looking for.
Where I need : I have a script to update regex strings. say there is a regx (abc???124{123??}?) .Now if an user updates it to (abc???124{xy3??}?) it should accept but (abc???124{ab23???) should fail.
It would be great if we can use some built in functionalities as well.

Thanks in advance :)

HMW 08-27-2016 01:19 AM

Well, one way to do it would be to check that you have the same amount of opening and closing brackets by using a loop and controlling each character.

You can do this in any language.

Best regards,
HMW

HMW 08-27-2016 02:42 AM

Hi again!

I just did this with a bash script using a loop and the modulus operator. Have to rush out now, but give it a try on your own and I'll get back if you need assitance.

Best regards,
HMW

Edit:
Here's an example run of my little script:
Code:

[HMW@ArchLinux LQ]$ ./valReg.sh "(abc???124{ab23???)"
Not matching brackets!
[HMW@ArchLinux LQ]$ echo $?
1
[magnus@ArchLinux LQ]$ ./valReg.sh "(abc???124{123??}?)"
Matching brackets.
[HMW@ArchLinux LQ]$ echo $?
0

You can also ditch the loop and use grep in combination with modulus, this is perhaps a better alternative.

ansh007 08-29-2016 12:57 AM

Thanks HMW
 
Dear HMW, Thanks.
This looks similar to what I am looking for.
Would you like to share your little code :) ?

pan64 08-29-2016 02:39 AM

to check a regexp, I would suggest you to compile it.

HMW 08-29-2016 02:45 AM

Quote:

Originally Posted by ansh007 (Post 5597623)
Dear HMW, Thanks.
This looks similar to what I am looking for.
Would you like to share your little code :) ?

Hi ansh007!

The thing with this forum is that we're happy to help you help yourself. I have more or less already given you the answer to your question.

By using the modulus operator you can easily figure out if you have an EVEN amount of characters or not:
Code:

bc <<< 10%2
0
bc <<< 7%2
1

So, what you have to do is to find the characters you are looking for, and then process that information with modulus; if the answer is 0 (zero) - you have an even (correct) number of chars. I have already given you two different ways to approach this:
1. You can use a loop to check each character in a string (doable in any programming language).
2. You can use grep if you want to (can) use bash.

Why don't you try it out yourself first. If you are stuck on something I'll be more than happy to help you out.

Best regards,
HMW

rtmistler 08-29-2016 06:30 AM

I've been silently subscribed this whole time because I was kind of thinking:

"Cool ... a regex syntax checker! ... Well, if one comes out of this question, maybe I'll learn a bit."

But then I realized, "BASH" (or any other script language) is a syntax checker! In other words, you get it wrong, it will tell you and it tries to tell you where the problem is. So I'm not so sure that there's a benefit here except for fun, or an exercise/assignment.

This is why we test our code. And note that syntax checker is not a range or input/output tester, it is more like a compiler which interprets code. You can still be wrong if you end up allowing things like divide by zero or invalid input, but still have correct syntax.

That said I agree with HMW's recommendation, you can tally the number and types of open brackets and then validate that you have the same amount of close brackets of the same type. The problem there is one also of placement, because [a+b] means something different than [], and {a+]b}|c[ might pass because there are the same amount of opens and closes, but it's still incorrect.

HMW 08-29-2016 07:23 AM

Quote:

Originally Posted by rtmistler (Post 5597682)
The problem there is one also of placement, because [a+b] means something different than [], and {a+]b}|c[ might pass because there are the same amount of opens and closes, but it's still incorrect.

Actually, that is just one of many problems! What if you want to search for a literal '(' or '{' character in your regex, then this check will fail although the regex is correct.

Personally, I just looked at it as a nice little exercise, because I am a hacker, but as rtmistler has pointed out, this is of very limited use in reality.

Best regards,
HMW

ansh007 08-30-2016 03:03 PM

Guys, thanks for being subscribed to the thread.
Apparently I have solved the issue, thanks for your suggestion. It may fail under certain test cases, which I will fix by the by.
I am putting the whole code here, please suggest if I should enhance anywhere. Thanks again :)



#!/bin/bash

## Function arrch : to check if the 1st brace pair is valid
arrch() {

n=`expr $c + 1`
local array="${arr[@]:$n}"
local seek="$1"
local ret=0
element=`echo $array | grep -o "$seek" |wc -l`

if [[ $element -eq 0 ]]; then
echo $element
echo " It is invalid "
exit 1
else
ret=$element
fi

return $ret
}

## Function arrch : to check if the other brace pairs are valid
arrchh(){
n=`expr $c + 1`
local array="${arr[@]:$n}"
local seek="$1"
ret="$2"
if [ "$ret" -gt 1 ] ; then
var1=`expr $ret - 1`
var2=`echo $array | grep -o "$seek" |wc -l`
echo $var2
if [ $var1 -eq $var2 ]; then
echo "${seek} is cool"
else
echo "openers and closers are not same for ${seek}"
exit 1
fi
fi
}

## To check if the script is called with accurate arguments
## For testing purpose, arg1 is the regex we are going to test

if [[ $# != 1 ]]
then
echo "$0 usage: $0 <expression>"
exit 1
fi

exp=$1 ## exp is the regex we are validating which shouldn't contain "/" as it won't be a file name in that case

if [[ `echo $exp| grep -c '/'` -gt 0 ]]; then
echo " ${exp} is an invalid file name - it contains / "
exit 2
fi


## example exp='abc???123??[0-9]{2,}[a-zA-Z]{2,}(abc)' ## abc???123??[0-9]{2,[a-zA-Z]{2,}}(abc)
expr=`echo $exp |tr -cs '[]{}()<>' x | tr -cd '[]{}()<>'` ## []{}[]{}() ## []{[]{}()
END=`echo $expr | wc -c`; END=`expr $END - 1`

## Putting the characters of $expr into an array arr

for ((c=1; c<=$END; c++ )); do
arr[${c}]=`echo $expr |cut -c$c`
done

## echo ${arr[@]}
## Checking if the number of elemnts in array is even numbered. If not there is definitely an extra open/closed brace exist
modulo=$(( ${#arr[@]} % 2 ))
if [ $modulo -eq 1 ]; then
echo " The Pattern is improper"
exit 3
elif [ ${arr[1]} == ']' ] || [ ${arr[1]} == '}' ] || [ ${arr[1]} == ')' ] || [ ${arr[1]} == '>' ]; then
echo " The Pattern is improper - Cannot start with closed braces"
exit 4
fi

for ((c=1; c<=$END; c++ )); do

option=${arr[${c}]}
echo "option ${option}"

case "$option" in

'[' )

arrch ']'
ret="$?"
echo $ret ##
arrchh '\[' "$ret"

;;

'{' )
arrch '}'
ret="$?"
arrchh '{' "$ret"
;;

'(' )
arrch ')'
ret="$?"
arrchh '(' "$ret"
;;

'<' )
arrch '>'
ret="$?"
arrchh '<' "$ret"
;;



*)
echo "Logic to be built"
;;
esac
done
echo " Pattern ok "
exit 0

rtmistler 08-30-2016 07:06 PM

Great work!

Do you have a set of test regex strings valid and invalid?

HMW 08-31-2016 01:07 AM

Impressive!

Makes my code look a little... small!

Anyway, here is what I threw together. Bear in mind that this has no error handling and whatnot, I just did it to test my logic and math (the latter is in all honesty not great!).

Code:

#!/bin/bash

evenOdd=$(echo $1 | egrep -o '\(|\)|\{|\}' | wc -l | awk '{ print $1%2 }')

if (( $evenOdd != 0 )); then
    echo "Not matching brackets!"
    exit 1
fi

echo "Matching brackets."
exit 0


pan64 08-31-2016 02:10 AM

I still suggest you to use a regexp compiler, so you will use the regular regexp engine to check the expression instead of reinventing that (not to speak about that you will not be able to properly do that).


All times are GMT -5. The time now is 10:27 AM.