[SOLVED] Splitting command output and parsing to an array.

vlazdir · 06-06-2013, 03:59 AM

Hi all,

I am fairly new to shell programming and my question is probably not that difficult.

This has to be done in korn shell.

I run a command that outputs unlimited lines with two words on each line. The words on each line correspond to eachother and will be used as two variables at a later stage in the script.

Here is some code:

Code:

# /opt/NTAP/SANToolkit/bin/sanlun lun show | awk 'NR>1{ print $1 $2}' | sed -e s/:// | awk -F/ '{ print $1 " " $3 }' | sort -u
filer1 volume1
filer1 volume2
filer2 volume1
#

I want the first word to eventually become a variable that will be used together with the second word in a variable like this:

Code:

`/usr/bin/ssh -i ${SSH_KEY} ${USER}@${FILER} snap list -n ${VOL_NAME}`

What it does is log in to a storage system and get info on snapshots. The $FILER and $VOL_NAME varies according to the output in the upper code.

I have tried many things that didn't work and they seem to boil down to the same problem. It looks like the output from the command is on one line. I can't split them up in the fashion I want.

Code:

        test=`/opt/NTAP/SANToolkit/bin/sanlun lun show | awk 'NR>1{ print $1 $2}' | sed -e s/:// | awk -F/ '{ print $1 " " $3 }' | sort -u`

        i=0
        for LINE in `echo ${test}`; do
                tmp[$i]=$LINE
        done

        print "tmp0: ${tmp[0]}"
        print "tmp1: ${tmp[1]}"
        print "tmp2: ${tmp[2]}"

Output:

Code:

# ./script.sh
tmp0: vfiler1 t63ld_cs11
vfiler1 t63ld_cs12
vfiler2 t63ld_cs12
tmp1:
tmp2:
#

Does anyone know how to help me?

Regards
Erik

NevemTeve · 06-06-2013, 05:14 AM

You want to use an associative array? Are you sure korn shell supports that? If not, you might want to use an actual programming language, such as awk, perl or php.

grail · 06-06-2013, 10:38 AM

hmmm ... well not knowing the 'sanlun' command ... are you saying that the data delivered from it is infinite or that the are merely many lines returned?

Also, there is absolutely no reason to call awk, sed, awk ... ever (at least that I can think of). If you would provide example output data I am pretty sure we could find something better.

As for creating your array, why not simply feed the output of your command into a while loop and read the 2 variables into whatever array structure you like?

sundialsvcs · 06-06-2013, 11:13 AM

If it were me, I would use another programming-language. You probably have both Perl and PHP installed on your machine, and any of these could be used to implement a command. That's the beauty of the #!commandname ("shebang") line that's the first line of the file. The shell looks at that line, and invokes that line, and no one (but you...) will be the wiser.

Both of these languages have built-in functions for splitting strings, and they have full implementations of associative arrays ("hashes").

Very much ... use the right tool for the job. The Korn shell is the only one that reasonably attempts to provide a programming-language in its "shell scripting" language. Bash, IMHO, doesn't. But, thanks to "shebang," you don't have to.

David the H. · 06-06-2013, 03:12 PM

Quote:

Originally Posted by vlazdir

I want the first word to eventually become a variable that will be used together with the second word in a variable like this:

Associative arrays are definitely more suited for that, as suggested. The syntax is pretty much the same in ksh93 and bash, although there are probably minor differences.

How can I use variable variables (indirect variables, pointers, references) or associative arrays?
http://mywiki.wooledge.org/BashFAQ/006

And Don't Read Lines With For! Always use a while+read loop when processing files or command input. And read can do all the line splitting for you.

Code:

typeset -A user

while read -r filer volume ; do

    user["$filer"]=$volume

done < <( input commands )


for filer in "${!user[@]}"; do

    echo "User: $filer has volume; ${user[$filer]}"

done

NevemTeve · 06-07-2013, 03:37 AM

Yes, bash does have great extended features! The OP has ksh though.

David the H. · 06-07-2013, 06:49 AM

As I pointed out, both shells have associative arrays, and the above code will work in either one (although there was a minor quoting error, which I have now corrected).

In fact, the majority of the advanced features available in bash are duplicated directly from ksh, although there are occasional differences in syntax (e.g. bash uses declare by default over typset, but it does offer the latter as a synonym). I estimate that about 85% of code written for bash will run unaltered in ksh. You just have to double-check the documentation to catch that other 15%.

PS: In case it wasn't clear enough from my last post, the reason the OP's output appears to have only a single line is due to the improper use of the for loop (not to mention the Useless Use Of Echo). After the variable/command substitution is expanded, the shell's word-splitting removes all whitespace, including newlines, and the loop only processes each word individually.

And by the way: $(..) is highly recommended over `..`.

vlazdir · 06-07-2013, 07:17 AM

Hi,

I have to use ksh because that's the only shell that i know exists on absolutely all of our servers. We have large pool of *ix-es and *ux-es and I want this to work on all platforms that I might need it for in the future.

Anyway. I found a solution to it. David th H, i've tried yours but can't seem to get it working. I get syntax error at line 46 : `<<' unmatched.

Here's my solution, it myght not be very compact, but it works:

Code:

GetFilerVolume () {   # This function gets all the variables that I need later in the script.
        sanlun_cmd=`/opt/NTAP/SANToolkit/bin/sanlun lun show | awk 'NR>1{ print $1 $2}' | sed -e s/:// | awk -F/ '{ print $1 " " $3 }' | sort -u | tr -s ' ' '\t'`

        i=0
        print "${sanlun_cmd}" | while read LINE;
                do
                        TESTARR[$i]=${LINE}
                        FILER[$i]=$(echo ${TESTARR[$i]} | awk '{print $1}')
                        VOL_NAME[$i]=$(echo ${TESTARR[$i]} | awk '{print $2}')
                        i=$(($i+1))
                done
}
SnapList () {
        GetFilerVolume    # Call the fuction above to get variables.

        ANTALL="${#FILER[*]}"
        j=0
        while ((0<=j<=${ANTALL}));do
                echo "Snapshots on ${FILER[$j]}"
                filer_cmd=$(/usr/bin/ssh -i ${SSH_KEY} ${USER}@${FILER[$j]} snap list -n ${VOL_NAME[$j]})
                print "${filer_cmd}\n";
                j=$(($j+1))
                        if [[ ${j} -eq ${ANTALL} ]]; then
                        exit 0
                        fi
        done
}

Thanks for all your help guys!

BR
Erik

NevemTeve · 06-07-2013, 07:58 AM

That's another bash'ism:

Code:

while read L; do echo "file: $L"; done < <(ls -1)

David the H. · 06-07-2013, 08:19 AM

Except that ksh supports process substitution too. At least the standard ksh88 and ksh93 do.

Do note that the substitution itself is <(..), which acts like a file when used in a command. The extra < is just a regular shell redirect. The only caveat is that there must be a space between them. Otherwise the shell will confuse it with a here document.

Note from the link that there are other options for doing the same thing as well. The most portable is probably a simple command substitution inside a heredoc:

Code:

while read x ; do
   echo "$x"
done <<EOF
$( command )
EOF

But in any case please do not do this:

Code:

sanlun_cmd=`/opt/NTAP/SANToolkit/bin/sanlun lun show | awk 'NR>1{ print $1 $2}' | sed -e s/:// | awk -F/ '{ print $1 " " $3 }' | sort -u | tr -s ' ' '\t'`

If the command outputs multiple lines or entries that you intend to process individually, do not store them in a single, scalar variable. Loop over the output directly, and either process it then or store it in an array for later use.

You can also move it into a function and run that when you need it. But you still need to loop over the output.

Your chain of awk|sed|tr can almost certainly also be condensed down into a single awk command, BTW, although sort may still have to be used. I'd have to see an example of the sanlun output before I'd be confident about rewriting it myself however.

vlazdir · 06-07-2013, 09:15 AM

Yes, agree the sanlun command is rather ugly with all the awks, sed and tr. If you're able to compress it into one awk and and one sort -u I'will be impressed

. Don't struggle to much with it as what I have works fine, though it would be interesting to see how it can be done. Here is the actual output from the sanlun command without any alteration:

Code:

bash-3.00# /opt/NTAP/SANToolkit/bin/sanlun lun show
controller:               lun-pathname                               device filename                   adapter  protocol          lun size         lun state
    filer1:  /vol/volume11/nmpool0_1/nmpool1.lun  /dev/rdsk/c4t60A9800043346552575A564443744851d0s2  qlc1     FCP           30g (32212254720)    GOOD
    filer1:  /vol/volume11/nmpool0_2/nmpool2.lun  /dev/rdsk/c4t60A9800043346552575A564443754869d0s2  qlc1     FCP           30g (32212254720)    GOOD
    filer1:  /vol/volume12/rpool/rpool1.lun       /dev/rdsk/c4t60A9800043346552575A5644436C472Dd0s2  qlc1     FCP           50g (53687091200)    GOOD
    filer2:  /vol/volume12/nmpool0_3/nmpool3.lun  /dev/rdsk/c4t60A980004334655268345644446F4951d0s2  qlc1     FCP           30g (32212254720)    GOOD
    filer2:  /vol/volume12/nmpool0_4/nmpool4.lun  /dev/rdsk/c4t60A98000433465526834564444703863d0s2  qlc1     FCP           30g (32212254720)    GOOD
bash-3.00#

What need to get out of this is in bold. Note that it's difficult to see that the colon after "filer#" is not in bold. The sed in the sanlun command I use is merely to get rid of the colon.

erik

David the H. · 06-07-2013, 09:43 AM

Not a problem at all. As I suspected, it's a relatively simple operation with several possible solutions.

I think the safest and cleanest is probably this:

Code:

awk 'NR>1{ sub(/:$/,"",$1) ; split($2,a,"/") ; print $1 , a[3]}' | sort -u

But if the output is always exactly regular, then you could even go with something much simpler:

Code:

awk -F'[ :/]+' 'NR>1{ print $2 , $4 }' | sort -u

This is assuming there are no whitespace characters in the desired strings either, which could complicate things a bit. On a similar vein, I also don't think you need to insert tab characters, unless you need to tell read to delimit the columns separately from regular spaces. But you can easily insert one in to the print command if necessary.

grail · 06-07-2013, 12:41 PM

Actually awk can do the uniq for you too

Code:

awk -F"[ :/]+" 'NR > 1 && !_[$2][$4]++{print $2,$4}'

David the H. · 06-09-2013, 04:01 AM

Yeah, I had thought of that, after I finished my last post.

What it can't do easily though is actually sort the input, if that happens to be important. I mean, it can be done, but the command gets so much longer that it's simpler just to pipe it through sort.

But if the command output is already sorted, as it appears to be, then just emulating uniq is no sweat.

Here's my first version modified in the same way:

Code:

awk 'NR>1{ sub(/:$/,"",$1) ; split($2,a,"/") ; if (!_[$1,a[3]]++) { print $1 , a[3] }}'

No doubt grail can make it shorter.