[SOLVED] extract directory and xml data to create a comma delimited file

David the H. · 05-09-2012, 12:51 PM

Actually, I don't think you need to make it that complex. The parent tag appears to be unique, and there's only one "value" in it, so it should be quite simple to grab.

Code:

xmlparts=$( xml sel -t -v '//hudson.model.StringParameterValue/value' -o , -v /build/result -o , -v '/build/culprits/string[last()]' build.xml "$dir/$file" )

Using two slashes in front of an element appears to make the matching "global", bounded only by any higher level paths specified in front of it. In any case you can always provide the full tree path to the exact entry you want.

Now do we have everything?

Edit: BTW, it looks like the only reason your last command is failing is because it uses single quotes rather than doubles. Since the outer quotes are also single, It means that they pair up with them instead and leave the tagName string unprotected by the shell. So no quotes actually get passed to the command.

j-me · 05-09-2012, 02:16 PM

that is interesting. It works. Just trying to make sure I understand it. I kinda do now. I learned a ton during this process.
The

Code:

/build/actions/hudson.model.ParametersAction/parameters/hudson.model.StringParameterValue[name="tagName"]/value

returns an error.
Entity: line 23: parser error : Couldn't find end of Start Tag value-of line 23
udson.model.ParametersAction/parameters/hudson.model.StringParameterValue[name="

thus why I went with the [name='tagName'] and it worked with [name='tagName'] until I tried to include the /description ... I think the brackets make the single quotes "local".

I believe now that provides what the requirements were. Thank you so very much.

ntubski · 05-09-2012, 10:05 PM

Quote:

Originally Posted by j-me

Code:

/build/actions/hudson.model.ParametersAction/parameters/hudson.model.StringParameterValue[name="tagName"]/value

returns an error.
Entity: line 23: parser error : Couldn't find end of Start Tag value-of line 23
udson.model.ParametersAction/parameters/hudson.model.StringParameterValue[name="

Older versions of xmlstarlet (1.0.x and earlier) create an XSLT document as a string and then parse it, double quotes are special in XML so the parser gets tripped up (1.0.4 and later will escape these args).

Nominal Animal · 05-10-2012, 08:57 AM

XML files should always be processed using XML tools, like above posters have shown.

Still, if the build.xml files are as limited to the format shown in the example, then it is certainly possible to parse them with plain awk . I would personally consider this only if using the proper tools was too slow or burdensome.

Anyway, here is the awk script:

Code:

#!/usr/bin/awk -f
BEGIN {
    RS = "[\t\n\v\f\r ]*<"
    FS = "[\t\n\v\f\r ]*>[\t\n\v\f\r ]*"

    element = ""    # XML element name
    content = ""    # Immediate content to element
    isopen  = 0

    parents = 0     # Number of parent elements
    parent[0] = ""  # Current element
    parent[1] = ""  # Parent element to current element
}

#
# Per-file initialization
#
(FNR == 1) {

    # Per-file initialization.
    # Current file name (and path) is in FILENAME.

    result = ""
    description = ""
    timestamp = ""
    basename = FILENAME
    sub(/^.*\/Deploy_/, "Deploy_", basename)
    split(basename, path, "/")

}

#
# XML processing
#

{
    if (isopen) {
        isopen = 0
        if (length(element) > 0) {
            for (i = parents; i >= 0; i--)
                parent[i+1] = parent[i]
            parents++
        }
    }

    if (NF < 1)
        next

    if ($1 ~ /^[!?]/)
        next

    if (NF > 2) {
        printf("Spurious > after %s.\n", $2) > "/dev/stderr"
        exit(1)
    }

    element = $1
    content = $2

    sub(/^[\t\n\v\f\r ]+/, "", content)
    sub(/[\t\n\v\f\r ]+$/, "", content)
    # To combine all whitespace in content to single spaces, add
    #   sub(/[\t\n\v\f\r ]+/, " ", content)

    if (element ~ /^\//) {
        sub(/^\/+/, "", element)
        sub(/[\t\n\v\f\r ].*$/, "", element)
        if (parent[1] != element) {
            printf("%s: Element not open.\n", element) > "/dev/stderr"
            exit(1)
        }
        for (i = 1; i < parents; i++)
            parent[i] = parent[i+1]
        delete parent[parents]
        parents--
        next
    }

    if (element ~ /\/$/)
        sub(/\/+$/, "", element)
    else
        isopen = 1

    sub(/[\t\n\v\f\r ].*$/, "", element)
    parent[0] = element
}

# Actual processing starts here. Available:
#   parents         The number of parent elements for current node
#   parent[0]       The current element name
#   parent[1]       The name of closest parent element
#   parent[parents] The name of the root element
#   content         Immediate content following the element.
#                   Does not include content after any child elements,
#                   even if they do belong to the current element.

(parents == 1 && parent[1] == "build" && parent[0] == "description") {
    description = content
    next
}

(parents == 1 && parent[1] == "build" && parent[0] == "result") {
    result = content
    next
}

(parents == 2 && parent[2] == "build" && parent[1] == "culprits" && parent[0] == "string") {
    printf("%s,%s,%s,%s,%s\n", path[1], path[3], description, result, content)
}

Run it using

Code:

above-script.awk jobs/*/*/*/build.xml

or, if you have too many files for a single command,

Code:

find jobs/ -mindepth 4 -maxdepth 4 -name build.xml -print0 | xargs -r0 ./above-script.awk

to get the expected output.