LinuxQuestions.org
Review your favorite Linux distribution.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 03-20-2010, 01:21 AM   #16
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208

Quote:
Originally Posted by grail View Post
btw. I copied catkin's original code above and it only returned the first 2 correct entries for me
That was probably because the quoted character in tab = " " became space when I copy-pasted it into the OP; it was a tab in the original. A better technique (more legible) would have been for me to use tab = "\t".
 
Old 03-20-2010, 01:29 AM   #17
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Thanks again for all your interest and suggestions

Having moved the goal posts by requiring support for backslash escapes in quoted strings (in accordance with Bacula's usage), here's my current version. It incorporates many suggestions from this thread but can probably still be improved. For maintainability, it uses an extra { }, to allow room for the "# Backslash escape" comment.
Code:
#!/usr/bin/awk -f

BEGIN {
    FS = "[ \t]*=[ \t]*"
    IGNORECASE = 1
}

{
    gsub( /[ \t]*/, "", $1 )                # Remove any spaces and tabs from keyword
    if ( $1 == "ArchiveDevice" ) {
        if ( substr( $2, 1, 1 ) == "\"" )
        {                                   # Value is a quoted string
            value = ""
            $2 = substr ( $2, 2, match( $2, /[^\\]\"/ ) - 1 )
            for ( i = 1; i <= length( $2 ); i++ )
            {
                char = substr( $2, i, 1 )
                if ( char != "\\" ) value = value char
                else
                {                           # Backslash escape
                    if ( substr( $2, i + 1, 1 ) == "\\" )
                    {                       # Escaped \ so keep one
                        value = value "\\"
                        i++
                    }
                }
            }
        }
        else
        {                                   # Value is unquoted
            sub( /[ \t#].*$/, "", $2 )      # Strip from the first space, tab or # to end of line
            value = $2
        }
        print value
    }
}
 
Old 03-20-2010, 02:00 AM   #18
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Well I am still 6 for 6 with adding backslash support (assuming from 4 onwards you accept eg 4./dev/cdromX)

Code:
#!/usr/bin/awk -f

BEGIN {
    FS="[ \t]*=[ \t]*"
    OFS="="
    IGNORECASE=1
}

{
    if($2 ~ /".*#.*"/)
        sub(/#/,"",$2)

    gsub(/[ \t\"\\]+|#.*$/,"")

    if($1~/archivedevice/)
        print $2
}
 
Old 03-20-2010, 02:55 AM   #19
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
Well I am still 6 for 6 with adding backslash support (assuming from 4 onwards you accept eg 4./dev/cdromX)

Code:
#!/usr/bin/awk -f

BEGIN {
    FS="[ \t]*=[ \t]*"
    OFS="="
    IGNORECASE=1
}

{
    if($2 ~ /".*#.*"/)
        sub(/#/,"",$2)

    gsub(/[ \t\"\\]+|#.*$/,"")

    if($1~/archivedevice/)
        print $2
}
Thanks grail

I tried it but got
Code:
1./dev/sr0
2./dev/cdrom
3./dev/cdrom
4./dev/cdromX
5./dev/cdromX
6./dev/cdromX
Latest test input file attached FYI.

EDIT: expected test output is
Code:
/mnt/floppy
1./dev/sr0
2./dev/cdrom
3./dev/cdrom
4.      /dev/cdrom       X
5.      #/dev/cdrom      X
6.      #/dev/cdrom  \ "         X
Attached Files
File Type: txt ArchiveDevice.test.in.txt (913 Bytes, 13 views)

Last edited by catkin; 03-20-2010 at 02:58 AM.
 
Old 03-20-2010, 04:44 AM   #20
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
hmmm ... I realise we should be looking for your solution but I guess I am curious as to whether or not the output
you desire is of any use? (not trying to be difficult by the way, just trying to understand)
 
Old 03-20-2010, 04:44 AM   #21
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
hmmm ... I realise we should be looking for your solution but I guess I am curious as to whether or not the output
you desire is of any use? (not trying to be difficult by the way, just trying to understand)
 
Old 03-20-2010, 05:05 AM   #22
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
hmmm ... I realise we should be looking for your solution but I guess I am curious as to whether or not the output
you desire is of any use? (not trying to be difficult by the way, just trying to understand)
Good question. No -- it does not have any practical application beyond testing this script. Nobody in their right minds would call a device or a mount point '/dev/cdrom \ " X' but it would be a valid name so they could. Any character can be used in the name of a Linux file including backspace and newline (/ are path component separators). As test data, by pushing the test to extremes that are highly improbable in real life, the robustness of the code is tested.

Some people advise that spaces should never be used in file names and that was relatively rare until GUI file managers that facilitate names to suit the user. Names including quotes and other exotica are not rare, example "Brian's CV". Even in a command-line only environment, very bizarre file names could be created such as a single backspace (hence the HOWTOs about deleting such files). Systems software, including backup software, must handle such file names without issue.
 
Old 03-20-2010, 06:30 AM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Rightio ... I see where you are coming from and probably need the ghostdog ledge to get the escaping of \ to show up but
the below has your format you requested for all 6:

Code:
!/usr/bin/awk -f

BEGIN {
    FS="[ \t]*=[ \t]*"
    OFS="="
    IGNORECASE=1
    f = 1
    g = 0
}

f && match($2, /".*"/){
    keep = substr($2, 2, (RLENGTH - 2))
    gsub(/\\+/, "", keep)
    print keep
    f = 0
    g = 1
}

f{
    gsub(/[ \t\"\\]+|#.*$/,"")

    if( $1 ~ /archivedevice/ )
        print $2

}

g{
    f = 1
    g = 0
}
Probably some redundancy that someone can tell me about too
 
Old 03-20-2010, 06:30 AM   #24
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Rightio ... I see where you are coming from and probably need the ghostdog ledge to get the escaping of \ to show up but
the below has your format you requested for all 6:

Code:
!/usr/bin/awk -f

BEGIN {
    FS="[ \t]*=[ \t]*"
    OFS="="
    IGNORECASE=1
    f = 1
    g = 0
}

f && match($2, /".*"/){
    keep = substr($2, 2, (RLENGTH - 2))
    gsub(/\\+/, "", keep)
    print keep
    f = 0
    g = 1
}

f{
    gsub(/[ \t\"\\]+|#.*$/,"")

    if( $1 ~ /archivedevice/ )
        print $2

}

g{
    f = 1
    g = 0
}
Probably some redundancy that someone can tell me about too
 
Old 03-20-2010, 07:35 AM   #25
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
Rightio ... I see where you are coming from and probably need the ghostdog ledge to get the escaping of \ to show up but
the below has your format you requested for all 6:

Code:
!/usr/bin/awk -f

BEGIN {
    FS="[ \t]*=[ \t]*"
    OFS="="
    IGNORECASE=1
    f = 1
    g = 0
}

f && match($2, /".*"/){
    keep = substr($2, 2, (RLENGTH - 2))
    gsub(/\\+/, "", keep)
    print keep
    f = 0
    g = 1
}

f{
    gsub(/[ \t\"\\]+|#.*$/,"")

    if( $1 ~ /archivedevice/ )
        print $2

}

g{
    f = 1
    g = 0
}
Probably some redundancy that someone can tell me about too
That works except for the missing \ you already know about.

There's some redundancy in match($2, /".*"/) because we know the input is syntactically valid for Bacula so, if the value after the = and any spaces+tabs begins with a " then the closing " is also present so it is only necessary to check whether it begins with a quote: match($2, /^"/).
 
Old 03-20-2010, 08:30 AM   #26
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Hi catkin

Your last bit of info there is not quite right with regard to my script as it needs the full match to work.
If I only look at the start for " then the value for RLENGTH is equal to the largest find, in this case always only
1, however, by using the whole regex ".*" it then looks at all characters that make that regex true.

eg. where $2 contains "5. #/dev/cdrom X" then RLENGTH = 20
 
Old 03-20-2010, 08:39 AM   #27
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
Hi catkin

Your last bit of info there is not quite right with regard to my script as it needs the full match to work.
If I only look at the start for " then the value for RLENGTH is equal to the largest find, in this case always only
1, however, by using the whole regex ".*" it then looks at all characters that make that regex true.

eg. where $2 contains "5. #/dev/cdrom X" then RLENGTH = 20
Sorry, grail -- I missed that RLENGTH use.
 
Old 03-23-2010, 05:40 AM   #28
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
In case anyone is interested, this thread helped in developing a bash function with embedded awk to parse lines from Bacula .conf files. Here it is. Suggestions for doing it more elegantly appreciated.
Code:
#--------------------------
# Name: parse_conf_line
# Purpose: parses a conf file line
# Usage:
#   $1: line to parse
# Global variables envalued: keyword, keyword_org, conf_values[]
#--------------------------
function parse_conf_line {

    fct "${FUNCNAME[0]}" "started. \$1: '$1'"

    local line

    line=$1

    #echo DEBUG: $LINENO: eval "$( echo "$line" | $awk '
    eval "$( echo "$line" | $awk '
        BEGIN {
            #FS = "[ \t]"
            squote = "\047"
            n_values=0
        }

        {
            # Strip any comment and any spaces+tabs before it
            # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            # This is not so easy because a # within a
            # quoted string does not introduce a comment and
            # an escaped " (that is \") does not terminate a
            # quoted string.
            in_string = 0                                   # False
            for ( i = 1; i <= length( $0 ); i++ )
            {
                char = substr( $0, i, 1 )
                if ( char == "#" && in_string == 0 )
                {
                    $0 = substr( $0, 1, i - 1 )
                    sub( /[ \t]*$/, "", $0 )
                    break
                }
                else if ( char == "\"" )
                {
                    if ( in_string == 0 ) in_string = 1
                    else if ( substr( $0, i - 1, 1 ) != "\\" ) in_string = 0
                }
            }

            # Get keyword and value(s) string
            # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            split( $0, array, /[ \t]*=[ \t]*/ )
            keyword = array[1]
            print "keyword_org=" squote keyword squote
            keyword = tolower( keyword )
            gsub( /[ \t]*/, "", keyword )                   # Remove any spaces and tabs from keyword
            print "keyword=" keyword

            # Get individual values from value(s) string
            # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            values_string = array[2]
            while ( length( values_string ) > 0 )
            {
                value = ""
                if ( substr( values_string, 1, 1 ) == "\"" )
                {                                           # Value is a quoted string
                    buf = substr( values_string, 2, match( values_string, /[^\\]"/ ) - 1 )
                    # Strip quoted string just taken
                    values_string = substr( values_string, length( buf ) + 3 )
                    # Copy to value, processing any escapes
                    for ( i = 1; i <= length( buf ); i++ )
                    {
                        char = substr( buf, i, 1 )
                        if ( char != "\\" ) value = value char
                        else
                        {                                   # Backslash escape
                            if ( substr( buf, i + 1, 1 ) == "\\" )
                            {                               # Escaped \ so keep one
                                value = value "\\"
                                i++
                            }
                        }
                    }
                }
                else
                {                                           # Value is unquoted
                    value = values_string
                    sub( /[ \t].*$/, "", value )            # Strip anything after space or tab
                    # Strip value string just taken
                    values_string = substr( values_string, length( value ) + 1 )
                }

                # Write shell script variable assignment
                # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                print "conf_values[" n_values++ "]=" squote value squote
                if ( n_values > 10 ) exit

                # Clean up for the next loop pass
                # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                sub( /^[ \t]*/, "", values_string )         # Strip leading spaces and tabs
            }
        }' \
    )"

    fct "${FUNCNAME[0]}" 'returning'

}  # end of function parse_conf_line
 
Old 03-23-2010, 08:04 AM   #29
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,007

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
Hey catkin

Good work on the finished script. Thought I would just throw my hand in one more time to include
your extra slashes and your values waiting to be eval'ed:

Code:
#!/usr/bin/awk -f

BEGIN {
	FS="[ \t]*=[ \t]*"
	OFS="="
	IGNORECASE=1
	f = 1
	g = 0
	cnt = 0
}

{ 
	key_org = "'"$1"'"
	key = tolower($1)
	gsub(/[ \t]+/,"",key)
}

f && match($2, /".*"/){

	value = substr($2, 2, (RLENGTH - 2))

	if (value ~ /\\\\/)
		gsub(/\\\\/, "SAVE", value)

	gsub(/\\+/, "", value)
	gsub("SAVE", "\\", value)

	f = 0
	g = 1
}

f{
	gsub(/[ \t\"\\]+|#.*$/,"",$2)

	if( key ~ /archivedevice/ )
		value = $2

}

g{
	f = 1
	g = 0
}

{
	print "keyword_org="key_org
	print "keyword="key
	print "conf_values["++cnt"]='"value"'"
}
Thanks for the learning
 
Old 03-23-2010, 11:10 AM   #30
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Quote:
Originally Posted by grail View Post
Thanks for the learning
Glad you're enjoying the challenge. New test file attached FYI. It adds to the previous test file by including:
  • Comments not following an "=".
  • Multiple values after a "keyword ="
  • "Resource definition" stanzas starting with a "JobCycle {" line and ending with a "}" line. For these, the awk should set keywords "jobcycle{" and "}".
Attached Files
File Type: txt parse_conf_line.test.conf.txt (1.3 KB, 8 views)
 
  


Reply

Tags
awk, bacula



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bash script is enterpreting $1, $2 values in awk script ... praveen_218 Programming 4 09-14-2009 03:38 PM
awk in script Jurrian Linux - Newbie 13 10-30-2008 07:09 PM
what does this awk script do? sharathkv25 Programming 3 03-08-2007 03:10 PM
Passing variables from AWK script to my shell script BigLarry Programming 1 06-12-2004 04:32 AM
sed or awk question - replace caps with small letters computera Linux - General 1 12-30-2003 04:39 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:47 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration