[SOLVED] regex pattern matching, with open braces or quotes being closed
Linux - GeneralThis Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
regex pattern matching, with open braces or quotes being closed
Dear members,
I need some help with pattern matching, shell or perl, anything would help.
Problem : I need to write a function/module that recognizes a valid regex matching. For an example : (abc*) is valid, but (abc* is not. What I mean is opening and closing of braces and quotes is a must that I am looking for.
Where I need : I have a script to update regex strings. say there is a regx (abc???124{123??}?) .Now if an user updates it to (abc???124{xy3??}?) it should accept but (abc???124{ab23???) should fail.
It would be great if we can use some built in functionalities as well.
I just did this with a bash script using a loop and the modulus operator. Have to rush out now, but give it a try on your own and I'll get back if you need assitance.
Dear HMW, Thanks.
This looks similar to what I am looking for.
Would you like to share your little code ?
Hi ansh007!
The thing with this forum is that we're happy to help you help yourself. I have more or less already given you the answer to your question.
By using the modulus operator you can easily figure out if you have an EVEN amount of characters or not:
Code:
bc <<< 10%2
0
bc <<< 7%2
1
So, what you have to do is to find the characters you are looking for, and then process that information with modulus; if the answer is 0 (zero) - you have an even (correct) number of chars. I have already given you two different ways to approach this:
1. You can use a loop to check each character in a string (doable in any programming language).
2. You can use grep if you want to (can) use bash.
Why don't you try it out yourself first. If you are stuck on something I'll be more than happy to help you out.
Best regards,
HMW
Last edited by HMW; 08-29-2016 at 04:35 AM.
Reason: Spelling...
I've been silently subscribed this whole time because I was kind of thinking:
"Cool ... a regex syntax checker! ... Well, if one comes out of this question, maybe I'll learn a bit."
But then I realized, "BASH" (or any other script language) is a syntax checker! In other words, you get it wrong, it will tell you and it tries to tell you where the problem is. So I'm not so sure that there's a benefit here except for fun, or an exercise/assignment.
This is why we test our code. And note that syntax checker is not a range or input/output tester, it is more like a compiler which interprets code. You can still be wrong if you end up allowing things like divide by zero or invalid input, but still have correct syntax.
That said I agree with HMW's recommendation, you can tally the number and types of open brackets and then validate that you have the same amount of close brackets of the same type. The problem there is one also of placement, because [a+b] means something different than [], and {a+]b}|c[ might pass because there are the same amount of opens and closes, but it's still incorrect.
The problem there is one also of placement, because [a+b] means something different than [], and {a+]b}|c[ might pass because there are the same amount of opens and closes, but it's still incorrect.
Actually, that is just one of many problems! What if you want to search for a literal '(' or '{' character in your regex, then this check will fail although the regex is correct.
Personally, I just looked at it as a nice little exercise, because I am a hacker, but as rtmistler has pointed out, this is of very limited use in reality.
Guys, thanks for being subscribed to the thread.
Apparently I have solved the issue, thanks for your suggestion. It may fail under certain test cases, which I will fix by the by.
I am putting the whole code here, please suggest if I should enhance anywhere. Thanks again
#!/bin/bash
## Function arrch : to check if the 1st brace pair is valid
arrch() {
n=`expr $c + 1`
local array="${arr[@]:$n}"
local seek="$1"
local ret=0
element=`echo $array | grep -o "$seek" |wc -l`
if [[ $element -eq 0 ]]; then
echo $element
echo " It is invalid "
exit 1
else
ret=$element
fi
return $ret
}
## Function arrch : to check if the other brace pairs are valid
arrchh(){
n=`expr $c + 1`
local array="${arr[@]:$n}"
local seek="$1"
ret="$2"
if [ "$ret" -gt 1 ] ; then
var1=`expr $ret - 1`
var2=`echo $array | grep -o "$seek" |wc -l`
echo $var2
if [ $var1 -eq $var2 ]; then
echo "${seek} is cool"
else
echo "openers and closers are not same for ${seek}"
exit 1
fi
fi
}
## To check if the script is called with accurate arguments
## For testing purpose, arg1 is the regex we are going to test
if [[ $# != 1 ]]
then
echo "$0 usage: $0 <expression>"
exit 1
fi
exp=$1 ## exp is the regex we are validating which shouldn't contain "/" as it won't be a file name in that case
if [[ `echo $exp| grep -c '/'` -gt 0 ]]; then
echo " ${exp} is an invalid file name - it contains / "
exit 2
fi
## Putting the characters of $expr into an array arr
for ((c=1; c<=$END; c++ )); do
arr[${c}]=`echo $expr |cut -c$c`
done
## echo ${arr[@]}
## Checking if the number of elemnts in array is even numbered. If not there is definitely an extra open/closed brace exist
modulo=$(( ${#arr[@]} % 2 ))
if [ $modulo -eq 1 ]; then
echo " The Pattern is improper"
exit 3
elif [ ${arr[1]} == ']' ] || [ ${arr[1]} == '}' ] || [ ${arr[1]} == ')' ] || [ ${arr[1]} == '>' ]; then
echo " The Pattern is improper - Cannot start with closed braces"
exit 4
fi
Anyway, here is what I threw together. Bear in mind that this has no error handling and whatnot, I just did it to test my logic and math (the latter is in all honesty not great!).
I still suggest you to use a regexp compiler, so you will use the regular regexp engine to check the expression instead of reinventing that (not to speak about that you will not be able to properly do that).
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.