LinuxQuestions.org
Visit the LQ Articles and Editorials section
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 09-19-2008, 06:46 PM   #1
picobyte
LQ Newbie
 
Registered: Apr 2005
Distribution: slackware, knoppix
Posts: 17

Rep: Reputation: 0
Lightbulb testgrep


This script matches a string to an regexp, and in the case of a failure, shows where the failure occurs.

Save it as testgrep and chmod as required.

For instance test

testgrep "^http://www\.[a-zA-Z0-9.]*\.(com|olg|net)/$" "http://www.linuxquestions.org/"

and you'll get
Code:
^http://www\.[a-zA-Z0-9.]*\.(com|olg|net)/$
                            ^
or testgrep "^h[ea]l*(w?[eo][r ]|e?ld){2,3}$" "hello world"

which will not complain

There might be an error in it, please tell if it misbehaves.
Comments welcome.

Code:
#!/bin/bash

# (C) Roel Kluin, 2008 GPL v.2

# usage: testgrep "match" "teststring"
# output: indicates where grep -E goes wrong
# tests from the left

point_it()
{
  while [ $pt -ne 0 ]; do
    echo -n " ";
    let "pt--";
  done
  echo "^";
  esc=2;
}


# main
i=0;
sb=0;
rb=0
cb=0;
pt=0;
esc=0;
echo "$1"
while [ $i -ne ${#1} ]; do
  if [ $esc -eq 2 ]; then
    break;
  elif [ $esc -eq 1 ]; then
    esc=0;
  elif [ "${1:$i:1}" = "]" ]; then
    [ $sb -lt 1 -o $cb -gt 0 ] && point_it;
    let "sb--";
  elif [ "${1:$i:1}" = ")" ]; then
    [ $rb -lt 1 ] && point_it;
    let "rb--";
  elif [ "${1:$i:1}" = "}" ]; then
    [ $cb -lt 1 -o $sb -gt 0 ] && point_it;
    let "cb--";
  elif [ "${1:$i:1}" = "\\" ]; then
    esc=1;
    [ $sb -gt 0 -o $cb -gt 0 ] && point_it;
  elif [ "${1:$i:1}" = "[" ]; then
    [ $cb -gt 0 ] && point_it;
    let "sb++";
  elif [ "${1:$i:1}" = "(" ]; then
    [ $sb -gt 0 -o $cb -gt 0 ] && point_it;
    let "rb++";
  fi
  let "i++";
  if [ $sb -ne 0 -o $rb -ne 0 -o $esc -ne 0 -o $cb -ne 0 ]; then
    continue;
  elif [ "${1:$i:1}" = "?" ]; then
    [ $sb -gt 0 -o $cb -gt 0 ] && point_it;
    continue;
  elif [ "${1:$i:1}" = "*" ]; then
    [ $sb -gt 0 -o $cb -gt 0 ] && point_it;
    continue;
  elif [ "${1:$i:1}" = "{" ]; then
    [ $sb -gt 0 ] && point_it;
    let "cb++";
    continue;
  fi
  echo "$2" | grep -qE "${1:0:$i}"
  [ $? -ne 0 ] && point_it;
  pt=$i;
done
[ $sb -ne 0 -o $rb -ne 0 -o $esc -eq 1 -o $cb -ne 0 ] && point_it;

Last edited by picobyte; 09-20-2008 at 01:10 PM. Reason: clarification
 
Old 09-19-2008, 08:15 PM   #2
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
I didn't really take a good look at it, but it sounds like a replacement for something like test $( echo string | egrep -c '^regex$' ) -eq 1? Certainly more graceful.
ta0kira

Last edited by ta0kira; 09-19-2008 at 08:18 PM.
 
Old 09-20-2008, 04:33 AM   #3
picobyte
LQ Newbie
 
Registered: Apr 2005
Distribution: slackware, knoppix
Posts: 17

Original Poster
Rep: Reputation: 0
no, I don't think so. I'll illustrate its usage with an example:

say I want to find all instances in the linux kernel
of :
Code:
x = kmalloc(sizeof(y), GFP_KERNEL)
And I came up with this, which I fed this to my (bash) term:
Code:
al_="[A-Za-z_]";
an_="[A-Za-z0-9_]";
int="[0-9]"
hex="[a-f0-9]"

#whitespace
s="[[:space:]]*";
S="[[:space:]]+";

# to match something like 1ul, floats or hexes as well:
D="$int*\.?$int+x?$hex*[uUlL]{0,3}[fF]?"

# can be used for a variable/function name
V="$al_+$an_*";

# same, but also catches variables that are members or arrays
w="($V|${V}\[$s$an_*$s\]|$V\.|$V->)+"

# match the end of the line, including comments
cendl="$s(\/[*\/].*)?$"

git grep -E "$s$w$s=${s}kmalloc$s\(${s}sizeof$s\($s$w$s\)$s,GFP_KERNEL$s\)$s;$cendl"
But it only finds one match, there's obviously something wrong with my match. I look and find in drivers/atm/nicstar.c, line 920:
Code:
  scq = kmalloc(sizeof(scq_info), GFP_KERNEL);
which should match. Then I do
Code:
testgrep "$s$w$s=${s}kmalloc$s\(${s}sizeof$s\($s$w$s\)$s,GFP_KERNEL$s\)$s;$cendl" "  scq = kmalloc(sizeof(scq_info), GFP_KERNEL);"
and it shows me that the match fails before GFP_KERNEL.

solution: there is an optional space missing:
Code:
git grep -E "$s$w$s=${s}kmalloc$s\(${s}sizeof$s\($s$w$s\)$s,${s}GFP_KERNEL$s\)$s;$cendl"
 
Old 09-20-2008, 12:30 PM   #4
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Oh, I see now. It took me a while to get it. So you feed the script an expression and a string, knowing it isn't a match, then your script tells you where exactly the discrepancy is since you only have match/no-match otherwise? That sounds very useful.
ta0kira
 
Old 09-20-2008, 12:52 PM   #5
picobyte
LQ Newbie
 
Registered: Apr 2005
Distribution: slackware, knoppix
Posts: 17

Original Poster
Rep: Reputation: 0
Yes, instead of success/failure, this script shows where the failure occurs. And thank you, yes, I hope it is useful.

I can think of a few improvements which I may add when I have time:

* also indicate from the right, where the discrepancy ends.
* display discrepant section in colour, instead of just pointing at.
 
Old 09-20-2008, 01:06 PM   #6
ta0kira
Senior Member
 
Registered: Sep 2004
Distribution: FreeBSD 9.1, Kubuntu 12.10
Posts: 3,078

Rep: Reputation: Disabled
Maybe break the expression up into a tree using tabs or spaces and show how the string matches up along side it?
Code:
'^h[ea]l*'        ^'hell'
(
  'w?[eo][r ]'    'o '{1}    'wor'{2}
  'e?ld'                     'ld'{2}
){2,3}            'o '{1}    'world'{2}
'$'               $
That's a rash example, but you get the idea.
ta0kira

Last edited by ta0kira; 09-20-2008 at 01:45 PM.
 
  


Reply

Tags
bash, grep, match, regex, regexp


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



All times are GMT -5. The time now is 03:07 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration