LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Perl regex not matching across multiple lines despite ms flags (https://www.linuxquestions.org/questions/programming-9/perl-regex-not-matching-across-multiple-lines-despite-ms-flags-826600/)

gfarrell 08-17-2010 10:26 AM

Sorted it out in AWK I think (thanks for pointing me towards it, worked out how to use it after a couple of mins):
Code:

awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' $file
The "level 1" stuff is because the class opens with a curly brace so we look for all "level 1" functions (class methods).

If you spot any problems with this please let me know.

[EDIT]

Spotted a problem, corrected below. Seems to have worked, thanks guys.
Code:

#! /bin/bash

for file in `find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name *.js`;
do
        echo "Processing $file"
        cp $file "/Users/gid/Desktop/uikitBackup/`basename $file`";
        mv $file $file.tmp
        awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1 && INFN==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' $file.tmp >> $file;
        rm $file.tmp
done


konsolebox 08-17-2010 10:38 AM

Odd. Both of the scripts you presented does not seem to be working.
Code:

awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' ...
awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1 && INFN==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' ...

Can you show us the output?

gfarrell 08-17-2010 10:47 AM

Quote:

Originally Posted by konsolebox (Post 4069091)
Odd. Both of the scripts you presented does not seem to be working.
Code:

awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' ...
awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1 && INFN==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' ...

Can you show us the output?

Yeah sure, one of my class files is interfering with it somehow (and I can't work it out) but it's working in almost all circumstances.

Script:
Code:

#! /bin/bash

for file in `find /Users/gid/Sites/uikit_testing/app/webroot/js/ -name *.js`;
do
        echo "Processing $file"
        cp $file "/Users/gid/Desktop/uikitBackup/`basename $file`";
        mv $file $file.tmp
        awk 'BEGIN{LVL=0; INFN=0} /function/ {INFN=1; if(LVL==1) sub(/function/, "(&");} /{/ {LVL++} /}/ {LVL--; if(LVL==1 && INFN==1) sub(/}/, "&)"); if(LVL==1) INFN=0;}1' $file.tmp >> $file;
        rm $file.tmp
done

Input:
Code:

/**
 * UIKit initialiser class
*/

var UIKit = {
        init: function init() {
                this.registry.each(function(entry, key) {
                        $$(entry.selector).each(function(item, index){
                                if(item.hasClass("no-replace") || $chk(item.retrieve("uikit"))) {
                                        return;
                                }
                                if(eval("window."+entry["class"]) !== undefined) {
                                        var ui = eval("new " + entry["class"] + "(item)");
                                        item.store("uikit", ui);
                                }
                        });
                }, this);
        },
       
        registry: new Hash(),
       
        register: function register(className, selector) {
                this.registry.set(className, {
                        "class":        className,
                        "selector":    selector
                });
        },
       
        enhance: function enhance(element, uiclass) {
                if(!(element.hasClass("no-replace") || $chk(element.retrieve("uikit")))) {
                        if(!$chk(uiclass)) {
                                for(var name in this.registry) {
                                        var item = this.registry[name];
                                        if($$(item.selector).contains(element)) {
                                                uiclass = eval(item["class"]);
                                                break;
                                        }
                                }
                               
                                if(!$chk(uiclass)) { return false; }
                        }
                        var ui = new uiclass(element);
                        element.store("uikit", ui);
                        return true;
                }
                return false;
        }
}

/**
 * UIKit retrieval function.
 *
 * @param      string|element  el.    The DOM element or element id of the element for which you want the UIKit to be retrieved.
 * @return      UI|false        The UI derived object or false if none is set.
*/

function $UI(el) {
        //First check if el is an instanceof UI
        if(el instanceof UI) {
                return el;
        } else {
                //Otherwise get it
                el = $(el);
               
                if(!el) {
                        throw new UIKitException("Invalid argument for $UI.");
                }
               
                if(!$chk(el.retrieve("uikit"))) {
                        throw new UIKitException("Passed object does not have an associated UI object.");
                } else {
                        return el.retrieve("uikit");
                }
        }
}

Output:
Code:

/**
 * UIKit initialiser class
*/

var UIKit = {
        init: (function init() {
                this.registry.each(function(entry, key) {
                        $$(entry.selector).each(function(item, index){
                                if(item.hasClass("no-replace") || $chk(item.retrieve("uikit"))) {
                                        return;
                                }
                                if(eval("window."+entry["class"]) !== undefined) {
                                        var ui = eval("new " + entry["class"] + "(item)");
                                        item.store("uikit", ui);
                                }
                        });
                }, this);
        }),
       
        registry: new Hash(),
       
        register: (function register(className, selector) {
                this.registry.set(className, {
                        "class":        className,
                        "selector":        selector
                });
        }),
       
        enhance: (function enhance(element, uiclass) {
                if(!(element.hasClass("no-replace") || $chk(element.retrieve("uikit")))) {
                        if(!$chk(uiclass)) {
                                for(var name in this.registry) {
                                        var item = this.registry[name];
                                        if($$(item.selector).contains(element)) {
                                                uiclass = eval(item["class"]);
                                                break;
                                        }
                                }
                               
                                if(!$chk(uiclass)) { return false; }
                        }
                        var ui = new uiclass(element);
                        element.store("uikit", ui);
                        return true;
                }
                return false;
        })
}

/**
 * UIKit retrieval function.
 *
 * @param        string|element        el.        The DOM element or element id of the element for which you want the UIKit to be retrieved.
 * @return        UI|false        The UI derived object or false if none is set.
*/

function $UI(el) {
        //First check if el is an instanceof UI
        if(el instanceof UI) {
                return el;
        } else {
                //Otherwise get it
                el = $(el);
               
                if(!el) {
                        throw new UIKitException("Invalid argument for $UI.");
                }
               
                if(!$chk(el.retrieve("uikit"))) {
                        throw new UIKitException("Passed object does not have an associated UI object.");
                } else {
                        return el.retrieve("uikit");
                }
        })
}

As you can see, it works quite well (I think...)

grail 08-17-2010 11:03 AM

Whatever you put between the // is what will be looked for, so if you had:
Code:

someName: function someName... }

someOtherName: function someOtherName... }

You could change it so:
Code:

awk 'BEGIN{RS=""}/^someName:/{sub(/function/,"(&");$NF=$NF ")"}1' file
This will only change the one with 'someName:' at the start of the line :)

gfarrell 08-17-2010 11:19 AM

Quote:

Originally Posted by grail (Post 4069129)
Whatever you put between the // is what will be looked for, so if you had:
Code:

someName: function someName... }

someOtherName: function someOtherName... }

You could change it so:
Code:

awk 'BEGIN{RS=""}/^someName:/{sub(/function/,"(&");$NF=$NF ")"}1' file
This will only change the one with 'someName:' at the start of the line :)

Thanks for that, unfortunately not versatile enough (which is why I was using that nice regex pattern). I worked it out (see earlier post).

konsolebox 08-17-2010 11:25 AM

@gfarrell Ok the code seems to work fine only that it still doesn't work on lines that contains multiple }'s. Before I really had an idea that it could also be done in awk but this was really the limit that I was expecting.

@grail There's a problem in the RS="" method if a section contains blank lines within.

gfarrell 08-17-2010 11:30 AM

Quote:

Originally Posted by konsolebox (Post 4069151)
@gfarrell Ok the code seems to work fine only that it still doesn't work on lines that contains multiple }'s. Before I really had an idea that it could also be done in awk but this was really the limit that I was expecting.

@grail There's a problem in the RS="" method if a section contains blank lines within.

I think you just worked out my problem, multiple braces in a line! Thanks =]

(Not that I know how to fix it (or really need to now)).

The other problem I got was in comment doc-blocks but it was largely not a problem.

grail 08-17-2010 11:33 AM

So using the data you provided, the following worked but needed to exceptions:
Code:

awk 'BEGIN{RS="";ORS="\n\n"}/: function/{sub(/function/,"(&");sub(/},$/,"}),")}1' file
The exceptions:

1. The input although it has some lines that appear blank they actually contain whitespace. So I had to remove those through vim prior to running

2. The enhance function is not terminated the same as the others, ie it does not finish with }, and so would need to be changed manually.

I will see if I can come up with a more full solution tomorrow as it's 2am and I am tired :(

grail 08-17-2010 11:37 AM

Just ran yours to and found that it supplies an extra round closing bracket at the end of your $UI function

gfarrell 08-17-2010 11:44 AM

Quote:

Originally Posted by grail (Post 4069167)
Just ran yours to and found that it supplies an extra round closing bracket at the end of your $UI function

Not when I ran it it didn't...

konsolebox 08-17-2010 11:49 AM

Ok here are just sad some points. I'm sorry if I have to tell you these.

I already had same problem with my own purpose but I just gave it up and started to think about letting it get solved in other languages... like perl or parrot. Awk really have its limits. It's not only about counting the total braces then making a deduction in every recursion. It's also about knowing if the braces are just part of an ordinary string or not or part of ... etc. Also, what if there are 4 braces in a line, 3 of them is part of the current function but the 4th is part of the container block holding the function. How can you tell that it's part of the container block since you're only in the context of the function?

Probably this can still be solved but that would only mean imitating a real language parser. Still doing that in awk IMO is really no longer practical. e.g. reading single chars and not phrases or lines since you can't tell when do compound statements or blocks ends or separates.. etc.

P.S. Maybe using another script that's similar to HTML TIDY for awk scripts then using your methods will do the trick.

gfarrell 08-17-2010 12:03 PM

Quote:

Originally Posted by konsolebox (Post 4069178)
Ok here are just sad some points. I'm sorry if I have to tell you these.

I already had same problem with my own purpose but I just gave it up and started to think about letting it get solved in other languages... like perl or parrot. Awk really have its limits. It's not only about counting the total braces then making a deduction in every recursion. It's also about knowing if the braces are just part of an ordinary string or not or part of ... etc. Also, what if there are 4 braces in a line, 3 of them is part of the current function but the 4th is part of the container block holding the function. How can you tell that it's part of the container block since you're only in the context of the function?

Probably this can still be solved but that would only mean imitating a real language parser. Still doing that in awk IMO is really no longer practical. e.g. reading single chars and not phrases or lines since you can't tell when do compound statements or blocks ends or separates.. etc.

P.S. Maybe using another script that's similar to HTML TIDY for awk scripts then using your methods will do the trick.

While I realise that the method I used was not perfect (and the regex I used was (it worked during testing, just not in a bash script with perl), it worked for my purpose and therefore I'm happy enough with it. In terms of the code files I was parsing, none of those problems were encountered except in one file which I did manually, it still saved me time in the other 29 files. I really can't be bothered to write a proper code parser because then, as you say, I'd be basically writing an interpreter and I have absolutely no interest in doing that.

I hope you manage to work out the problems you were encountering but for me it's done its job.

grail 08-17-2010 12:04 PM

Ok I am really going this time, but I did run your several times and was unable to get it to not print the extra bracket :(
This is a little untidy due to tiredness but seems to work :) You can take any bits that help you ;)
Code:

awk '/: function/{sub(/function/,"(function");f=1}
    f{if(/{/)a++;
      if(/}/)a--;
      if(!a){sub(/}/,"&)");
              f=0}
}1' file


David the H. 08-17-2010 07:58 PM

Well, this conversation really moved beyond me while I was away. Time to bow out, I think.
Quote:

Originally Posted by konsolebox (Post 4068329)
I searched again. It appears that it can also be done in sed:

Yes, of course I know it's possible. However, as you just demonstrated, it takes a lot of mucking about with the hold buffer to build up the line before you can run the regex on it. But if sed had s & m switches similar to perl's (hmm, that doesn't sound right...:redface:), then you'd be able to simply choose to treat the newlines like any other character straight out of the starting gate. Much easier to grasp conceptually and more flexible overall.

Time to learn more about perl, I guess. :)

ghostdog74 08-17-2010 08:35 PM

Code:

awk -vRS="}" '/myFunction:/{
    gsub(/.*myFunction:/,"myFunction: (")
    gsub(/{.[^}]*/,"...\n");
    print $0RT")"
}' file



All times are GMT -5. The time now is 08:13 PM.