LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Perl regex not matching across multiple lines despite ms flags (http://www.linuxquestions.org/questions/programming-9/perl-regex-not-matching-across-multiple-lines-despite-ms-flags-826600/)

gfarrell 08-16-2010 03:27 PM

Perl regex not matching across multiple lines despite ms flags
 
Hi all,

I have written a regular expression (tested in regexpal and regextester alpha something) with which I want to replace something like:

Code:

myFunction: function myFunction(arg1, arg2) {
moreCode = someCode() + someMoreCode();
return moreCode();
}

into
Code:

myFunction: (function myFunction ...
})

The regular expression I've used is:
Code:

([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}
which is the most versatile matching pattern I need, and the replacement pattern is:
Code:

$1: (function $1($2) {$3})
So I wrote a bash script to do this for me:
Code:

#! /bin/bash

for file in `find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name *.js`;
do
        echo "Processing $file"
        cp $file $file.bak
        perl -pi -e "s/([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}/\1: (function \1(\2) {\3})/gms" $file;
done

but it only matches functions which occupy one line only, despite my tests showing multiple line matching in javascript testers online and using the m and s flags (which should make it multi line no?)

This is really frustrating me so I'd love some help.

konsolebox 08-16-2010 05:12 PM

We should first verify that it's not a syntax problem. Try this mod:
Code:

#! /bin/bash

while read file; do
        echo "Processing $file"
        cp "$file" "$file.bak"
        perl -pi -e 's/([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}/\1: (function \1(\2) {\3})/gms' "$file";
done < <(exec find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name '*.js')

Notice how arguments are properly placed in quotes.

gfarrell 08-16-2010 05:29 PM

Quote:

Originally Posted by konsolebox (Post 4068227)
We should first verify that it's not a syntax problem. Try this mod:
Code:

#! /bin/bash

while read file; do
        echo "Processing $file"
        cp "$file" "$file.bak"
        perl -pi -e 's/([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}/\1: (function \1(\2) {\3})/gms' "$file";
done < <(exec find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name '*.js')

Notice how arguments are properly placed in quotes.

Thanks for the reply, but it had the same effect as before, only transforming single-line functions e.g. from
Code:

myFunction: function myFunction(arg) {...},
to
Code:

myFunction: (function myFunction(arg) {...}),
Whereas I am trying to match all function definitions...

konsolebox 08-16-2010 05:58 PM

Try this one
Code:

#! /bin/bash

while read file; do
        echo "Processing $file"
        cp "$file" "$file.bak"
        perl -pi -e 'undef $/; s/([[:alnum:]_]+):\s?function\s?\1?\(([[:alnum:]_\s,]*)\) {([^}]*)}/\1: (function \1(\2) {\3})/gs' "$file";
done < <(exec find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name '*.js')

Only that this method is limited to functions that doesn't contain another '}' inside the body.

David the H. 08-16-2010 06:17 PM

Aargh, konsolebox just beat me to almost the exact same solution. And I spent all morning trying to figure it out, since I have little direct experience with perl. :banghead: (Indeed, I still don't know exactly what undef $/ does, only that it seems to work.)

One thing I can add though is that I read that \w equals [[:alnum:]_], so you should be able to use that instead. This is what I came up with in the end:
Code:

perl -pe 'undef $/; s/(\w):\s?(function\s?\1?([^}]+)})/\1: (\2)/gms'

konsolebox 08-16-2010 06:43 PM

That was only a difference in time. I also don't know much about perl re (not even noticed \w) so I searched the web for answers. That icon's really funny btw. :D

David the H. 08-16-2010 06:50 PM

No sweat there. It was a good learning experience for me. It's just a bit of a letdown to finally figure something out, only to be beaten by just a few minutes.

Now I know of a new way to handle multi-line matches. If only sed would introduce similar flags....

konsolebox 08-16-2010 07:28 PM

I searched again. It appears that it can also be done in sed:
Code:

sed -n '
        # if the first line copy the pattern to the hold buffer
        1h
        # if not the first line then append the pattern to the hold buffer
        1!H
        # if the last line then ...
        $ {
                # copy from the hold to the pattern buffer
                g
                # do the search and replace
                s/\([[:alnum:]_]\+\):[[:blank:]]\?function[[:blank:]]\?\1\?(\([[:alnum:]_[:blank:],]*\)) {\([^}]*\)}/\1: (function \1\(\2\) {\3})/g
                # print
                p
        }
' file

reference: http://austinmatzko.com/2008/04/26/s...h-and-replace/

gfarrell 08-17-2010 04:51 AM

Hi guys,

I really appreciate all the responses but the regex I originally supplied:
Code:

([A-Za-z0-9_]+):\s?function\s?\1?\(([A-Za-z0-9_\ ,]*)\) {(\n?((.*)\n\t?)*)}
Actually matched all of these functions properly:
Code:

        aFunction: function aFunction() {
                for(var i = 0; i<10; i++) {
                        document.write(i);
                }
                someStuff();
                //This is a comment...
        },
        anotherFunction: function anotherFunction(myArg) {},
        yetAnotherOne: function yetAnotherOne() {}

The problem is not with the regex but with perl because it only matched anotherFunction and yetAnotherOne but not aFunction. This is probably because perl is having issues with my newline and tab characters?

konsolebox 08-17-2010 06:48 AM

But even if it's not the case that perl properly recognize newline and tab chars, this expresion could still match more than just a single function
Code:

{(\n?((.*)\n\t?)*)}
Code:

... {
...
}
...
... {
...
}


grail 08-17-2010 07:29 AM

Thought I would put in 2C if not perl:
Code:

awk 'BEGIN{RS=""}/function/{sub(/function/,"(&");$NF=$NF ")"}1' file

gfarrell 08-17-2010 07:47 AM

Quote:

Originally Posted by konsolebox (Post 4068825)
But even if it's not the case that perl properly recognize newline and tab chars, this expresion could still match more than just a single function
Code:

{(\n?((.*)\n\t?)*)}
Code:

... {
...
}
...
... {
...
}


Try it here: http://www.regextester.com/. It works fine both on "Javascript" and "Preg".

@grail: You'll have to explain that to me...

grail 08-17-2010 08:58 AM

Quote:

@grail: You'll have to explain that to me...
No probs :)

It sets the Record Separator to be an empty line and then searches for any record containing the word function. Once found it prepends an opening bracket to the
start of the word 'function' and then to the last field it appends a closing bracket. The 1 at the end prints everything it runs into so you can transfer all contents to a new file. If you are replacing the original you will need to perform a move on completion.

konsolebox 08-17-2010 09:52 AM

@grail So that's the power of RS. I never thought that awk can do that. That should be helpful with the improvement of my compiler :D.

--- a wrong idea ---

gfarrell 08-17-2010 10:01 AM

Quote:

Originally Posted by grail (Post 4068953)
No probs :)

It sets the Record Separator to be an empty line and then searches for any record containing the word function. Once found it prepends an opening bracket to the
start of the word 'function' and then to the last field it appends a closing bracket. The 1 at the end prints everything it runs into so you can transfer all contents to a new file. If you are replacing the original you will need to perform a move on completion.

So if I wanted to make it even more limited so it only matched the pattern
Code:

someName: function someName... }
?


All times are GMT -5. The time now is 01:07 AM.