LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - General
User Name
Password
Linux - General This Linux forum is for general Linux questions and discussion.
If it is Linux Related and doesn't seem to fit in any other forum then this is the place.

Notices


Reply
  Search this Thread
Old 08-26-2016, 08:01 AM   #1
ansh007
LQ Newbie
 
Registered: May 2016
Location: Bangalore, India
Posts: 10

Rep: Reputation: Disabled
regex pattern matching, with open braces or quotes being closed


Dear members,
I need some help with pattern matching, shell or perl, anything would help.

Problem : I need to write a function/module that recognizes a valid regex matching. For an example : (abc*) is valid, but (abc* is not. What I mean is opening and closing of braces and quotes is a must that I am looking for.
Where I need : I have a script to update regex strings. say there is a regx (abc???124{123??}?) .Now if an user updates it to (abc???124{xy3??}?) it should accept but (abc???124{ab23???) should fail.
It would be great if we can use some built in functionalities as well.

Thanks in advance
 
Old 08-27-2016, 01:19 AM   #2
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Well, one way to do it would be to check that you have the same amount of opening and closing brackets by using a loop and controlling each character.

You can do this in any language.

Best regards,
HMW
 
Old 08-27-2016, 02:42 AM   #3
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Hi again!

I just did this with a bash script using a loop and the modulus operator. Have to rush out now, but give it a try on your own and I'll get back if you need assitance.

Best regards,
HMW

Edit:
Here's an example run of my little script:
Code:
[HMW@ArchLinux LQ]$ ./valReg.sh "(abc???124{ab23???)"
Not matching brackets!
[HMW@ArchLinux LQ]$ echo $?
1
[magnus@ArchLinux LQ]$ ./valReg.sh "(abc???124{123??}?)"
Matching brackets.
[HMW@ArchLinux LQ]$ echo $?
0
You can also ditch the loop and use grep in combination with modulus, this is perhaps a better alternative.

Last edited by HMW; 08-27-2016 at 04:58 AM.
 
1 members found this post helpful.
Old 08-29-2016, 12:57 AM   #4
ansh007
LQ Newbie
 
Registered: May 2016
Location: Bangalore, India
Posts: 10

Original Poster
Rep: Reputation: Disabled
Thanks HMW

Dear HMW, Thanks.
This looks similar to what I am looking for.
Would you like to share your little code ?
 
Old 08-29-2016, 02:39 AM   #5
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,791

Rep: Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304
to check a regexp, I would suggest you to compile it.
 
Old 08-29-2016, 02:45 AM   #6
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by ansh007 View Post
Dear HMW, Thanks.
This looks similar to what I am looking for.
Would you like to share your little code ?
Hi ansh007!

The thing with this forum is that we're happy to help you help yourself. I have more or less already given you the answer to your question.

By using the modulus operator you can easily figure out if you have an EVEN amount of characters or not:
Code:
bc <<< 10%2
0
bc <<< 7%2
1
So, what you have to do is to find the characters you are looking for, and then process that information with modulus; if the answer is 0 (zero) - you have an even (correct) number of chars. I have already given you two different ways to approach this:
1. You can use a loop to check each character in a string (doable in any programming language).
2. You can use grep if you want to (can) use bash.

Why don't you try it out yourself first. If you are stuck on something I'll be more than happy to help you out.

Best regards,
HMW

Last edited by HMW; 08-29-2016 at 04:35 AM. Reason: Spelling...
 
Old 08-29-2016, 06:30 AM   #7
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
I've been silently subscribed this whole time because I was kind of thinking:

"Cool ... a regex syntax checker! ... Well, if one comes out of this question, maybe I'll learn a bit."

But then I realized, "BASH" (or any other script language) is a syntax checker! In other words, you get it wrong, it will tell you and it tries to tell you where the problem is. So I'm not so sure that there's a benefit here except for fun, or an exercise/assignment.

This is why we test our code. And note that syntax checker is not a range or input/output tester, it is more like a compiler which interprets code. You can still be wrong if you end up allowing things like divide by zero or invalid input, but still have correct syntax.

That said I agree with HMW's recommendation, you can tally the number and types of open brackets and then validate that you have the same amount of close brackets of the same type. The problem there is one also of placement, because [a+b] means something different than [], and {a+]b}|c[ might pass because there are the same amount of opens and closes, but it's still incorrect.
 
Old 08-29-2016, 07:23 AM   #8
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by rtmistler View Post
The problem there is one also of placement, because [a+b] means something different than [], and {a+]b}|c[ might pass because there are the same amount of opens and closes, but it's still incorrect.
Actually, that is just one of many problems! What if you want to search for a literal '(' or '{' character in your regex, then this check will fail although the regex is correct.

Personally, I just looked at it as a nice little exercise, because I am a hacker, but as rtmistler has pointed out, this is of very limited use in reality.

Best regards,
HMW
 
Old 08-30-2016, 03:03 PM   #9
ansh007
LQ Newbie
 
Registered: May 2016
Location: Bangalore, India
Posts: 10

Original Poster
Rep: Reputation: Disabled
Guys, thanks for being subscribed to the thread.
Apparently I have solved the issue, thanks for your suggestion. It may fail under certain test cases, which I will fix by the by.
I am putting the whole code here, please suggest if I should enhance anywhere. Thanks again



#!/bin/bash

## Function arrch : to check if the 1st brace pair is valid
arrch() {

n=`expr $c + 1`
local array="${arr[@]:$n}"
local seek="$1"
local ret=0
element=`echo $array | grep -o "$seek" |wc -l`

if [[ $element -eq 0 ]]; then
echo $element
echo " It is invalid "
exit 1
else
ret=$element
fi

return $ret
}

## Function arrch : to check if the other brace pairs are valid
arrchh(){
n=`expr $c + 1`
local array="${arr[@]:$n}"
local seek="$1"
ret="$2"
if [ "$ret" -gt 1 ] ; then
var1=`expr $ret - 1`
var2=`echo $array | grep -o "$seek" |wc -l`
echo $var2
if [ $var1 -eq $var2 ]; then
echo "${seek} is cool"
else
echo "openers and closers are not same for ${seek}"
exit 1
fi
fi
}

## To check if the script is called with accurate arguments
## For testing purpose, arg1 is the regex we are going to test

if [[ $# != 1 ]]
then
echo "$0 usage: $0 <expression>"
exit 1
fi

exp=$1 ## exp is the regex we are validating which shouldn't contain "/" as it won't be a file name in that case

if [[ `echo $exp| grep -c '/'` -gt 0 ]]; then
echo " ${exp} is an invalid file name - it contains / "
exit 2
fi


## example exp='abc???123??[0-9]{2,}[a-zA-Z]{2,}(abc)' ## abc???123??[0-9]{2,[a-zA-Z]{2,}}(abc)
expr=`echo $exp |tr -cs '[]{}()<>' x | tr -cd '[]{}()<>'` ## []{}[]{}() ## []{[]{}()
END=`echo $expr | wc -c`; END=`expr $END - 1`

## Putting the characters of $expr into an array arr

for ((c=1; c<=$END; c++ )); do
arr[${c}]=`echo $expr |cut -c$c`
done

## echo ${arr[@]}
## Checking if the number of elemnts in array is even numbered. If not there is definitely an extra open/closed brace exist
modulo=$(( ${#arr[@]} % 2 ))
if [ $modulo -eq 1 ]; then
echo " The Pattern is improper"
exit 3
elif [ ${arr[1]} == ']' ] || [ ${arr[1]} == '}' ] || [ ${arr[1]} == ')' ] || [ ${arr[1]} == '>' ]; then
echo " The Pattern is improper - Cannot start with closed braces"
exit 4
fi

for ((c=1; c<=$END; c++ )); do

option=${arr[${c}]}
echo "option ${option}"

case "$option" in

'[' )

arrch ']'
ret="$?"
echo $ret ##
arrchh '\[' "$ret"

;;

'{' )
arrch '}'
ret="$?"
arrchh '{' "$ret"
;;

'(' )
arrch ')'
ret="$?"
arrchh '(' "$ret"
;;

'<' )
arrch '>'
ret="$?"
arrchh '<' "$ret"
;;



*)
echo "Logic to be built"
;;
esac
done
echo " Pattern ok "
exit 0
 
1 members found this post helpful.
Old 08-30-2016, 07:06 PM   #10
rtmistler
Moderator
 
Registered: Mar 2011
Location: USA
Distribution: MINT Debian, Angstrom, SUSE, Ubuntu, Debian
Posts: 9,882
Blog Entries: 13

Rep: Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930Reputation: 4930
Great work!

Do you have a set of test regex strings valid and invalid?
 
Old 08-31-2016, 01:07 AM   #11
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Impressive!

Makes my code look a little... small!

Anyway, here is what I threw together. Bear in mind that this has no error handling and whatnot, I just did it to test my logic and math (the latter is in all honesty not great!).

Code:
#!/bin/bash

evenOdd=$(echo $1 | egrep -o '\(|\)|\{|\}' | wc -l | awk '{ print $1%2 }')

if (( $evenOdd != 0 )); then
    echo "Not matching brackets!"
    exit 1
fi

echo "Matching brackets."
exit 0
 
1 members found this post helpful.
Old 08-31-2016, 02:10 AM   #12
pan64
LQ Addict
 
Registered: Mar 2012
Location: Hungary
Distribution: debian/ubuntu/suse ...
Posts: 21,791

Rep: Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304Reputation: 7304
I still suggest you to use a regexp compiler, so you will use the regular regexp engine to check the expression instead of reinventing that (not to speak about that you will not be able to properly do that).
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Pattern matching... Is there a clever RegEx? danielbmartin Programming 9 02-20-2015 11:57 AM
Matching patterns or partial pattern matching yaplej Programming 6 12-16-2012 10:21 AM
how to use regex pattern matching to get data from file? ranjit Programming 4 10-17-2011 02:09 PM
[SOLVED] awk with pipe delimited file (specific column matching and multiple pattern matching) lolmon Programming 4 08-31-2011 12:17 PM
Pattern matching in a bash case statement using regex ciphyre Programming 1 01-31-2009 12:20 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - General

All times are GMT -5. The time now is 09:13 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration