ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
btw. I copied catkin's original code above and it only returned the first 2 correct entries for me
That was probably because the quoted character in tab = " " became space when I copy-pasted it into the OP; it was a tab in the original. A better technique (more legible) would have been for me to use tab = "\t".
Thanks again for all your interest and suggestions
Having moved the goal posts by requiring support for backslash escapes in quoted strings (in accordance with Bacula's usage), here's my current version. It incorporates many suggestions from this thread but can probably still be improved. For maintainability, it uses an extra { }, to allow room for the "# Backslash escape" comment.
Code:
#!/usr/bin/awk -f
BEGIN {
FS = "[ \t]*=[ \t]*"
IGNORECASE = 1
}
{
gsub( /[ \t]*/, "", $1 ) # Remove any spaces and tabs from keyword
if ( $1 == "ArchiveDevice" ) {
if ( substr( $2, 1, 1 ) == "\"" )
{ # Value is a quoted string
value = ""
$2 = substr ( $2, 2, match( $2, /[^\\]\"/ ) - 1 )
for ( i = 1; i <= length( $2 ); i++ )
{
char = substr( $2, i, 1 )
if ( char != "\\" ) value = value char
else
{ # Backslash escape
if ( substr( $2, i + 1, 1 ) == "\\" )
{ # Escaped \ so keep one
value = value "\\"
i++
}
}
}
}
else
{ # Value is unquoted
sub( /[ \t#].*$/, "", $2 ) # Strip from the first space, tab or # to end of line
value = $2
}
print value
}
}
hmmm ... I realise we should be looking for your solution but I guess I am curious as to whether or not the output
you desire is of any use? (not trying to be difficult by the way, just trying to understand)
hmmm ... I realise we should be looking for your solution but I guess I am curious as to whether or not the output
you desire is of any use? (not trying to be difficult by the way, just trying to understand)
hmmm ... I realise we should be looking for your solution but I guess I am curious as to whether or not the output
you desire is of any use? (not trying to be difficult by the way, just trying to understand)
Good question. No -- it does not have any practical application beyond testing this script. Nobody in their right minds would call a device or a mount point '/dev/cdrom \ " X' but it would be a valid name so they could. Any character can be used in the name of a Linux file including backspace and newline (/ are path component separators). As test data, by pushing the test to extremes that are highly improbable in real life, the robustness of the code is tested.
Some people advise that spaces should never be used in file names and that was relatively rare until GUI file managers that facilitate names to suit the user. Names including quotes and other exotica are not rare, example "Brian's CV". Even in a command-line only environment, very bizarre file names could be created such as a single backspace (hence the HOWTOs about deleting such files). Systems software, including backup software, must handle such file names without issue.
Rightio ... I see where you are coming from and probably need the ghostdog ledge to get the escaping of \ to show up but
the below has your format you requested for all 6:
Code:
!/usr/bin/awk -f
BEGIN {
FS="[ \t]*=[ \t]*"
OFS="="
IGNORECASE=1
f = 1
g = 0
}
f && match($2, /".*"/){
keep = substr($2, 2, (RLENGTH - 2))
gsub(/\\+/, "", keep)
print keep
f = 0
g = 1
}
f{
gsub(/[ \t\"\\]+|#.*$/,"")
if( $1 ~ /archivedevice/ )
print $2
}
g{
f = 1
g = 0
}
Probably some redundancy that someone can tell me about too
Rightio ... I see where you are coming from and probably need the ghostdog ledge to get the escaping of \ to show up but
the below has your format you requested for all 6:
Code:
!/usr/bin/awk -f
BEGIN {
FS="[ \t]*=[ \t]*"
OFS="="
IGNORECASE=1
f = 1
g = 0
}
f && match($2, /".*"/){
keep = substr($2, 2, (RLENGTH - 2))
gsub(/\\+/, "", keep)
print keep
f = 0
g = 1
}
f{
gsub(/[ \t\"\\]+|#.*$/,"")
if( $1 ~ /archivedevice/ )
print $2
}
g{
f = 1
g = 0
}
Probably some redundancy that someone can tell me about too
Rightio ... I see where you are coming from and probably need the ghostdog ledge to get the escaping of \ to show up but
the below has your format you requested for all 6:
Code:
!/usr/bin/awk -f
BEGIN {
FS="[ \t]*=[ \t]*"
OFS="="
IGNORECASE=1
f = 1
g = 0
}
f && match($2, /".*"/){
keep = substr($2, 2, (RLENGTH - 2))
gsub(/\\+/, "", keep)
print keep
f = 0
g = 1
}
f{
gsub(/[ \t\"\\]+|#.*$/,"")
if( $1 ~ /archivedevice/ )
print $2
}
g{
f = 1
g = 0
}
Probably some redundancy that someone can tell me about too
That works except for the missing \ you already know about.
There's some redundancy in match($2, /".*"/) because we know the input is syntactically valid for Bacula so, if the value after the = and any spaces+tabs begins with a " then the closing " is also present so it is only necessary to check whether it begins with a quote: match($2, /^"/).
Your last bit of info there is not quite right with regard to my script as it needs the full match to work.
If I only look at the start for " then the value for RLENGTH is equal to the largest find, in this case always only
1, however, by using the whole regex ".*" it then looks at all characters that make that regex true.
eg. where $2 contains "5. #/dev/cdrom X" then RLENGTH = 20
Your last bit of info there is not quite right with regard to my script as it needs the full match to work.
If I only look at the start for " then the value for RLENGTH is equal to the largest find, in this case always only
1, however, by using the whole regex ".*" it then looks at all characters that make that regex true.
eg. where $2 contains "5. #/dev/cdrom X" then RLENGTH = 20
In case anyone is interested, this thread helped in developing a bash function with embedded awk to parse lines from Bacula .conf files. Here it is. Suggestions for doing it more elegantly appreciated.
Code:
#--------------------------
# Name: parse_conf_line
# Purpose: parses a conf file line
# Usage:
# $1: line to parse
# Global variables envalued: keyword, keyword_org, conf_values[]
#--------------------------
function parse_conf_line {
fct "${FUNCNAME[0]}" "started. \$1: '$1'"
local line
line=$1
#echo DEBUG: $LINENO: eval "$( echo "$line" | $awk '
eval "$( echo "$line" | $awk '
BEGIN {
#FS = "[ \t]"
squote = "\047"
n_values=0
}
{
# Strip any comment and any spaces+tabs before it
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
# This is not so easy because a # within a
# quoted string does not introduce a comment and
# an escaped " (that is \") does not terminate a
# quoted string.
in_string = 0 # False
for ( i = 1; i <= length( $0 ); i++ )
{
char = substr( $0, i, 1 )
if ( char == "#" && in_string == 0 )
{
$0 = substr( $0, 1, i - 1 )
sub( /[ \t]*$/, "", $0 )
break
}
else if ( char == "\"" )
{
if ( in_string == 0 ) in_string = 1
else if ( substr( $0, i - 1, 1 ) != "\\" ) in_string = 0
}
}
# Get keyword and value(s) string
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
split( $0, array, /[ \t]*=[ \t]*/ )
keyword = array[1]
print "keyword_org=" squote keyword squote
keyword = tolower( keyword )
gsub( /[ \t]*/, "", keyword ) # Remove any spaces and tabs from keyword
print "keyword=" keyword
# Get individual values from value(s) string
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
values_string = array[2]
while ( length( values_string ) > 0 )
{
value = ""
if ( substr( values_string, 1, 1 ) == "\"" )
{ # Value is a quoted string
buf = substr( values_string, 2, match( values_string, /[^\\]"/ ) - 1 )
# Strip quoted string just taken
values_string = substr( values_string, length( buf ) + 3 )
# Copy to value, processing any escapes
for ( i = 1; i <= length( buf ); i++ )
{
char = substr( buf, i, 1 )
if ( char != "\\" ) value = value char
else
{ # Backslash escape
if ( substr( buf, i + 1, 1 ) == "\\" )
{ # Escaped \ so keep one
value = value "\\"
i++
}
}
}
}
else
{ # Value is unquoted
value = values_string
sub( /[ \t].*$/, "", value ) # Strip anything after space or tab
# Strip value string just taken
values_string = substr( values_string, length( value ) + 1 )
}
# Write shell script variable assignment
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
print "conf_values[" n_values++ "]=" squote value squote
if ( n_values > 10 ) exit
# Clean up for the next loop pass
# ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
sub( /^[ \t]*/, "", values_string ) # Strip leading spaces and tabs
}
}' \
)"
fct "${FUNCNAME[0]}" 'returning'
} # end of function parse_conf_line
Good work on the finished script. Thought I would just throw my hand in one more time to include
your extra slashes and your values waiting to be eval'ed:
Code:
#!/usr/bin/awk -f
BEGIN {
FS="[ \t]*=[ \t]*"
OFS="="
IGNORECASE=1
f = 1
g = 0
cnt = 0
}
{
key_org = "'"$1"'"
key = tolower($1)
gsub(/[ \t]+/,"",key)
}
f && match($2, /".*"/){
value = substr($2, 2, (RLENGTH - 2))
if (value ~ /\\\\/)
gsub(/\\\\/, "SAVE", value)
gsub(/\\+/, "", value)
gsub("SAVE", "\\", value)
f = 0
g = 1
}
f{
gsub(/[ \t\"\\]+|#.*$/,"",$2)
if( key ~ /archivedevice/ )
value = $2
}
g{
f = 1
g = 0
}
{
print "keyword_org="key_org
print "keyword="key
print "conf_values["++cnt"]='"value"'"
}
Glad you're enjoying the challenge. New test file attached FYI. It adds to the previous test file by including:
Comments not following an "=".
Multiple values after a "keyword ="
"Resource definition" stanzas starting with a "JobCycle {" line and ending with a "}" line. For these, the awk should set keywords "jobcycle{" and "}".
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.