LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Bash/awk extract function from file. (https://www.linuxquestions.org/questions/linux-newbie-8/bash-awk-extract-function-from-file-4175471753/)

apottere 08-01-2013 12:02 PM

Bash/awk extract function from file.
 
I'm trying to write a one liner that can extract a certain bash function from a file. This code will extract the function test() from test.sh:

Code:

cat test.sh | awk '/test()/ {start=1} /{/ {if (start == 1) { line=$0; while(sub("{", "", line)) brackets++}} /}/ {if (start == 1) { line2=$0; while(sub("}", "", line2)) brackets--}} {if (start == 1) if (brackets > 0) print $0; else { print; exit}}'
This uses awk to balance parentheses after the function name is found, and quits when they're balanced. Is there a prettier way of doing this? It works perfectly fine, I just didn't know if there was something "better" that I overlooked.

Edit: "perfectly fine" is not necessarily correct, commented brackets will still get counted.

jpollard 08-01-2013 02:44 PM

Welll... it won't handle the case where the brace ("{") may be inside quotes...

Firerat 08-01-2013 02:55 PM

personally I use sed for things like this

Code:

sed '/test()/,/^}$/!d' Infile

apottere 08-01-2013 05:58 PM

Quote:

Code:

sed '/test()/,/^}$/!d' Infile

This doesn't do anything, and what it looks like it's trying to do is just as buggy as the original. What if there's a bracket on it's own line halfway through the function?

Quote:

Welll... it won't handle the case where the brace ("{") may be inside quotes...
Yeah, this too. I just can't think of anything better.

Firerat 08-01-2013 08:46 PM

Quote:

Originally Posted by apottere (Post 5001231)
This doesn't do anything, and what it looks like it's trying to do is just as buggy as the original. What if there's a bracket on it's own line halfway through the function?


Yeah, this too. I just can't think of anything better.

???

Code:

mkdir Wtf
cd Wtf
cat > test.sh << "EOF"
#!/bin/bash
function test() {
echo foo
}

test2(){
echo muppet
}
echo bar
EOF
sed '/test()/,/^}$/!d' test.sh

works just fine

konsolebox 08-02-2013 01:24 AM

@Firerat: That won't work with functions like
Code:

function x {
    {
        true
        false
    } >/path/file.xyz
}

And can't be consistent with one-line functions:
Code:

function x { true; }
I think a real parser that would recognize general shell-scripting syntax and read the whole script consistently would be a good a solution to this. Try other scripting languages perhaps. I favor Ruby.

Edit: Oh yes your code could work with functions made of regular format but some do this:
Code:

function x {
{
    true
    false
}
}

And some do this as well:
Code:

if tests; do
    # create this version of a function
    function {
        echo a
    }
else
    # do this instead
    function {
        echo b
    }
fi

Even this:
Code:

function x {
    xyz << EOF
{
some texts
}
around
EOF
}


grail 08-02-2013 02:44 AM

From a language exclusive point of view I would suggest counting the braces and once you have a match for left and right count, assuming the code has a working function without errors,
you should be at the end of your function.

As konsolebox has pointed out, there are several different formats you may need to contend with to have the count correct :)

grail 08-02-2013 03:35 AM

So had a quick think about it, this is the sort of thing I would look at:

Code:

ruby -ne 'BEGIN{l=r=0};if $_ =~ /test/; until l > 0 && l == r; l+=$_.scan(/{/).size;r+=$_.scan(/}/).size;print $_;gets;end;break;end' file

Firerat 08-02-2013 04:41 AM

Quote:

Originally Posted by konsolebox (Post 5001382)
@Firerat: That won't work with functions like
..

ahh, I see now

apottere 08-02-2013 08:39 AM

Fixed my original with awk:

Code:

cat test.sh | awk -v sq="'" '/test()/ {line=$0; gsub("\"[^\"]*\"", "", line); gsub(sq"[^"sq"]*"sq, "", line); gsub("#.*$", "", line); if( sub("test()", "", line) ) {start=1; brackets=0}} /function test/ {line=$0; gsub("\"[^\"]*\"", "", line); gsub(sq"[^"sq"]*"sq, "", line); gsub("#.*$", "", line); if( sub("test()", "", line) ) {start=1; brackets=0}} /{/ {if (start == 1) { line=$0; gsub("\"[^\"]*\"", "", line); gsub(sq"[^"sq"]*"sq, "", line); gsub("#.*$", "", line); while(sub("{", "", line)) { brackets++ }}} /}/ {if (start == 1) { line2=$0; gsub("\"[^\"]*\"", "", line); gsub(sq"[^"sq"]*"sq, "", line); gsub("#.*$", "", line); while(sub("}", "", line2)) brackets--}} {if (start == 1) if (brackets > 0) print $0" - "brackets; else { print $0" - "brackets" - EXITING"; exit}}'

I think
Code:

gsub("\"[^\"]*\"", "", line); gsub(sq"[^"sq"]*"sq, "", line); gsub("#.*$", "", line);
could be functionized, but this will strip out all strings and then all comments before checking for the function name or brackets. I've tried it with all the tests here and it seems to work

jpollard 08-02-2013 08:51 AM

That should work nearly all the time. The most unusual cases will be where the braces are escaped (happens with find commands, and xargs).

apottere 08-02-2013 08:55 AM

I'll code this out later today, but would:

1. Removing everything inside backticks and $()
2. Removing all occurences of \{ and \}

cover all cases? Is there any other way to escape a bracket?

Edit: I'll also have to remove escaped quotes before I remove strings, too.

jpollard 08-02-2013 09:18 AM

Only remaining problem would be typeos in the file... These could cause problems with counting (missing quote, done keyword, close paren...). These don't HAVE to be detected by the shell defining the functions... until the function is actually invoked. But they could cause problems for the scanner.

BTW, as for other ways of escaping--- yes. "here" documents are another...

apottere 08-02-2013 10:17 AM

Dang, forgot about those. Also, I'm not going to worry about typos, if it's broken then it'll break hard. Heredocs on the other hand...

Edit: If there's no other way to escape from a heredoc other than the specified string on a line by itself, it may not be that bad after all.

konsolebox 08-02-2013 10:37 AM

You should also consider that awk process texts generally line by line and not character by character unlike other languages. Example is my attempt to parse scripts like this. It works and is made complex, but is still limited.

compiler.gawk


All times are GMT -5. The time now is 09:51 AM.