LinuxQuestions.org - Nested loop questions and using a counter to increment array output

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Nested loop questions and using a counter to increment array output (https://www.linuxquestions.org/questions/programming-9/nested-loop-questions-and-using-a-counter-to-increment-array-output-860259/)

Nested loop questions and using a counter to increment array output

Hey all. I work in a simulations environment. I'm trying to write a bash script that will read fields from a .csv file into an array, the first field being an identifiyng number and the second field being a corresponding url. There are about 1600 of these number/url combinations in the .csv file that i'm reading from. Once that is done i want it to parse a text file and match the number, when it has a match i want it to enter the corresponding url into a particular line in the text file. The script I have written (with the help of the people on this forum a while back) does this well, but now I have a lot more data to parse. I think the script itself is explanatory enough to see what i'm doing. What i would like to do is cut it down to one while loop nested inside another loop so that I don't have 1600 or so elif statements. I can't figure out how to increment the output of the array. for instance, the first cycle would find the number that matches ${record1[2]} and input the url stored in ${record1[3]}. the next cycle would match ${record1[4]} and input the url in ${record1[5]}, and so on, does that make sense?
The code is below and a sample .csv and text file are attached.
Thanks for any and all help!

Code:

#!/bin/bash



echo

echo "This script will insert URL's in the force planning file."









## This section identifies the .csv file

echo 

echo "Please enter the path to the .csv file you will be using"



cnt=0

while true

do

read -r csv

 if [[ -f "$csv" || -f ../"$csv" ]]

  then

  echo "File exists"

  break

  else

  echo "Try again"

 fi



  ((cnt+=1))

  if [[ $cnt -eq 5 ]]

    then

    echo "exceeded 5 tries. quitting. Do you know what you're doing?"

    exit

  fi

done



echo 

echo "Please enter the name of the text file you would like to change"



## This section is for identifying the text file

cnt=0

while true

do

read -r fplan

 if [[ -f "$fplan" || -f ../"$fplan" ]]

  then

  echo "File exists"

  break

  else

  echo "Try again"

 fi



  ((cnt+=1))

  if [[ $cnt -eq 5 ]]

    then

    echo "exceeded 5 tries. quitting. Do you know what you're doing?"

    exit

  fi

done



echo

echo "Please wait while the task completes"

echo "Patience is a virture, posess it if you can"

echo "Seldom found in women,  never in a man"



sleep 2                                          



var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters. 



IFS=',' record1=( ${var1} )  ## Sets the field delimiter and reads the csv file into an array



sleep 2



xterm -e tail -f $PWD/complete &  ## Opens a tail window to see the progress



## This section will parse the csv data into the fplan text file in the proper locations

while read -r LINE

do                                                              

    if [[ "${LINE}" =~ "SystemDeclarationData" ]]              

    then

        echo "$LINE"

        read -r LINE  

        if [[ ${LINE} == ${record1[2]} ]]                  

        then                                                    

            echo "$LINE"                                        

            count=0

            while read -r LINE

            do

            if [[ $count == 23 ]]

            then

            echo "${record1[3]}"

            break

            else echo "$LINE"

            ((count+=1))

            fi

            done

        elif [[ ${LINE} == ${record1[4]} ]]            

        then                                                    

            echo "$LINE"                                        

            count=0

            while read -r LINE

            do

            if [[ $count == 23 ]]

            then

            echo "${record1[5]}"

            break

            else echo "$LINE"

            ((count+=1))

            fi

            done

        elif [[ ${LINE} == ${record1[6]} ]]                          

        then                                                    

            echo "$LINE"                                        

            count=0

            while read -r LINE

            do

            if [[ $count == 23 ]]

            then

            echo "${record1[7]}"

            break

            else echo "$LINE"

            ((count+=1))

            fi

            done

        elif [[ ${LINE} == ${record1[8]} ]]                        

        then                                                    

            echo "$LINE"                                        

            count=0

            while read -r LINE

            do

            if [[ $count == 23 ]]

            then

            echo "${record1[9]}"

            break

            else echo "$LINE"

            ((count+=1))

            fi

            done

        else echo "$LINE"

        fi

    else echo "$LINE"

    fi



done < "$fplan" > $PWD/complete                                                    



sleep 5



echo

echo

echo  "backing up original text file"



cp "$fplan" "$fplan".bak



sleep 5

echo

echo "renaming the changed file to work with the sim environment"



mv $PWD/complete "$fplan"



sleep 3

echo

echo "operation completed"

echo

echo



killall xterm

#prep your loop vars
evenctr=2
let oddctr="10#$evenctr+1"

# within your loop use
${record1[$evenctr]}
${record1[$oddctr]}

let evenctr="10#$evenctr+2"
let oddctr="10#$evenctr+1"

#you will need a way to break the loop, something like this maybe
#I didn't test this
if [ -z ${record1[$oddctr]} ]; then break

I won't write it for you, but this should get you started.

Russ

Heh. No worries, I don't want it written for me, just need some guidance.
It took me a minute, but i see what you're doing there. This is really my first time using an array, it didn't make sense to me at first.
Many thanks for the reply!

Can you tell me what the 10# does in let evenctr="10#$evenctr+2"?

Perhaps i'm not looking in the right place, but I didn't find it.
Thanks again!

This would be so much easier if you use gawk (GNU awk) for the lookup and replacement. Consider this skeleton script:

Code:

#!/bin/bash



PLAN=./sample.fplan

CSV=./test.csv



NEW=./sample.fplan.temp



if ! gawk -v "csv=$CSV" '

    BEGIN {

        # Records (lines) are separated by some form of a newline.

        RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"



        # Fields are separated by a comma. Eat whitespace around commas.

        FS="[\t ]*,[\t ]*"



        # Read the CSV file. If the first field only contains digits,

        # and there are at least two fields in the record,

        # add the second field to a lookup table keyed by the first field.

        while ((getline < csv) > 0)

            if ($1 ~ /^[0-9]+$/ && NF >= 2)

                lookup[$1] = $2



        # Reset field separators to linear whitespace for the text file.

        FS="[\t ]+"

    }



    {

        # Check if any of the fields in this record (line)

        # is a lookup key. If yes, replace with the lookup value.

        for (i = 1; i <= NF; i++)

            if ($i in lookup)

                $i = lookup[$i]



        # Output the (possibly modified) line.

        print $0

    }' "$PLAN" > "$NEW" ; then



    echo "Error processing $PLAN or $CSV." >&2



    exit 1

else



    if ! mv -b --suffix=.old "$NEW" "$PLAN" ; then



        echo "Cannot replace $PLAN with the new one." >&2



        exit 1

    fi

fi



echo "Done!" >&2



exit 0

This does not have the user interface features. You will still need to ask the file names ($PLAN, $CSV) and construct a temporary file name ($NEW) before the gawk script is run.

Note that the gawk script replaces any complete token (separated by whitespace) matching a key in the CSV file. It does not have to be at the start of the line, or even the only thing on the line. I assumed that would be more useful to you. If, however, you only wish to check the first word in the file, change the check loop in gawk into

Code:

# Check if the first field in this record (line) is a lookup key.

# If yes, replace the first field with the lookup value.

if ($1 in lookup)

$1 = lookup[$1]

Hope this helps,

Nominal Animal

Quote:

Originally Posted by aSingularity (Post 4246297)

The 10# makes sure the math is done in decimal, sometimes when your cutting numbers to use you get leading zeros and the number will be taken as octal.

I don't think you need it in your case, I just copied some code from one of my scripts that did need it and left it in.

it can be evenctr="$evenctr+2"

@nominal animal: gawk is something i will have to look into. Never really played with it. I do believe that I am going to play with your script though, I can see quite a few things i could use gawk for. Thank you for the reply.

@rustek: appreciate the response. I've been playing with your suggestions and have integrated a version of them not only in this script, but in another also. Thanks for the help!!

Quote:

This does not have the user interface features. You will still need to ask the file names ($PLAN, $CSV) and construct a temporary file name ($NEW) before the gawk script is run.

This can be done within the BEGIN using getline from "-":

Code:

BEGIN{

    printf "Please enter the path to the .csv file you will be using: "

    getline csv_file_name < "-"

}

As for your bash code, personally I find an infinite loop that is later broken out of using an if to be pointless.
Just put your testing in the while loop section, like:

Code:

echo "Please enter the path to the .csv file you will be using"

read -r csv

cnt=0



until [[ -f $csv || -f ../$csv ]] || (( cnt++ == 5 ))

do

    echo "$csv could not be found. Please try again"

    read -r csv

done



if (( cnt > 5 ))

then

    echo "exceeded 5 tries. quitting. Do you know what you're doing?"

    exit

fi

Also I am curious about your sed command:

Code:

var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters.

If I understand the format of the data file to be:

Code:

123,data

456,data2

If you run the above it would return the following:

Code:

123,data456,data2

Which I would have said is not what you want?
Maybe it could just be simple like:

Code:

records=($(sed 's/,/ /g' $csv))

And lastly, maybe for the looping you could try something like:

Code:

exec 3<&0

exec < $fplan



while read -ru3 $LINE

do

    if [[ "${LINE}" =~ "SystemDeclarationData" ]]              

    then

        echo "$LINE"

        read -ru3 LINE

        echo "$LINE"



        found=0

        for (( cnt = 2; cnt <= ${!records[@]} || found; cnt+=2 ))

        do

            if [[ $LINE == ${records[cnt]} ]]

            then

                for (( cnt2 = 0; cnt2 < 23; cnt2++ ))

                do

                    read -ru3 LINE

                    echo "$LINE"

                done

                echo "${records[cnt+1]}"

            fi

        done

    else

        echo "$LINE"

    fi

done> $PWD/complete

exec 0<&3 3<&-

I'm beginning to really like awk for stuff like this. Heres a question though, This portion of the code replaces the matched lookup key with the string located in $0. How would i skip lines. For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16. The next pattern in the lookup table would be matched on line 18, so I would want the output of print $0 to appear 10 lines below that on line 28 and so on. In c# i would just use Console.ReadLine. I tried using getline and the NR variable, unfortunately, my lightbulb is very dim today.

Code:

{

        # Check if any of the fields in this record (line)

        # is a lookup key. If yes, replace with the lookup value.

        for (i = 1; i <= NF; i++)

            if ($i in lookup)

                $i = lookup[$i]



        # Output the (possibly modified) line.

        print $0

    }' "$PLAN" > "$NEW" ; then

You will probably need to set your NR value so that the next read of the file will increment it to the value you are looking for.
Something like:

Code:

if (found what you want)

    NR += 9

So this will set NR to 15, if currently at line 6, then next line, 16, when read you can print what you require.
You will probably also need to add something to the if that lets you know this is the time to print.

I know I have not given the solution exactly, but figured you might enjoy playing to try and find it.

If you get stuck just let us know :)

Quote:

Originally Posted by grail (Post 4246711)

As for your bash code, personally I find an infinite loop that is later broken out of using an if to be pointless.
Just put your testing in the while loop section, ...

I disagree with this, you've repeated the read statement, repeating code is never a good idea. You might later rename the csv variable and forget to change one of reads, leading to a bug that only shows up when the user makes a typo.

Alright sir, I am a little stuck. I have tried quite a few combinations of tests. Was reading through the gnu awk users guide and now I think I am over complicating things. So far I have managed to change the output not at all or clear the document. Nothing in between. Now for what it's worth, the csv file i'm playing with has 1555 rows and 2 columns. The text document has 3,636,436 lines. Yes they are deliberately massive. Some contain text and some are blank. Thanks to Nominal Animals initial little gawk script, I have managed to pick up some very useful stuff, and a few good hours of entertainment. Now i'm going to ask for another hint. I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.
Thanks again fellas.

Quote:

I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.

I might need a little more information on what 'destroying the data' means?
First point is that awk does not change the original file so there should be no data loss from this point of view.

Are you referring to the fact that the skipped lines are not in the new file (cause I sort of thought this was the point)?

Quote:

I disagree with this, you've repeated the read statement, repeating code is never a good idea.

Agreed that repeating is not a good idea, however, I did not wish to make the code too obscure so as to confuse the situation
and I was following the similar format the user had at the time to ask the question outside the loop:

Code:

cnt=0

until [[ -f $csv || -f ../$csv ]] || (( cnt++ == 5 ))

do

    (( cnt > 1 )) && echo "$csv could not be found. Please try again"

    echo "Please enter the path to the .csv file you will be using"

    read -r csv

done

Quote:

Originally Posted by grail (Post 4252601)

As far as the destroying the data goes, a few times that i have tried i ended up with completely blank output. The part that I am having a hard time with ( i think) is that as in my mind, i have already made the match, i should be able to query the line number of the match (it's my understanding that NR increments as the file is parsed) then once i hit the correct number of lines (NR += 9) print the data from the lookup table. It seems to me that it should be simple. This is how i was doing it in my original bash script. Perhaps i'm thinking too much along those lines?

Code:

 count=0

            while read -r LINE

            do

            if [[ $count == 23 ]]

            then

            echo "${record1[3]}"

            break

            else echo "$LINE"

            ((count+=1))

            fi

p.s. in reference to the sed command you asked about above, I now next to nothing about sed, that line was on a page of "useful sed oneliners" and looked to do what i was looking for so i incorporated it. ;)

Quote:

Originally Posted by aSingularity (Post 4252425)

For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16.

Use a line counter or replacement line array. A line counter can hold only one pending replacement, so I suggest using an array.

Change this

Code:

        # Reset field separators to linear whitespace for the text file.

        FS="[\t ]+"

    }



    {

        # Check if any of the fields in this record (line)

        # is a lookup key. If yes, replace with the lookup value.

        for (i = 1; i <= NF; i++)

            if ($i in lookup)

                $i = lookup[$i]



        # Output the (possibly modified) line.

        print $0

    }' "$PLAN" > "$NEW" ; then

to this

Code:

        # Reset field separators to linear whitespace for the text file.

        FS="[\t ]+"



        # No line replacements to do yet.

        split("", replacement)

    }



    ($1 in lookup) {

        # Replace 10th line following this one with the lookup value.

        replacement[NR + 10] = lookup[$1]

        # Empty this line.

        $0 = ""

    }



    (NR in replacement) {

        # This is a replaced line.

        print replacement[NR]

        delete replacement[NR]

        next

    }



    {

        # Output current line.

        print $0

    }' "$PLAN" > "$NEW" ; then

This adds an initially empty array, replacements, which contains the replacements for future records.

The first new section will check if the first field is something to look up; if yes, it adds the 10th following record to be replaced by the lookup value, then empties the record. (You can add next here if you want to skip this line from output, but note that the replacements count input records, not output records; be sharp with your record counts.)

The second section checks if the current record is in the replacements array. If yes, it prints the replacement, and removes the entry from the array to save memory. Note the next statement; it skips directly to the next record; normally the third section would also be run.

The third section just prints out the current record.

Did this help?

Nominal Animal

Hey Nominal ... I wonder if we are missing the forest for the trees here :)
It appears that the line to be replaced is always $1 from first file plus a constant away (please correct me if I am wrong OP).
So this would mean you awk could do the following:

Code:

if ! gawk -v "csv=$CSV" '

BEGIN {

    # Records (lines) are separated by some form of a newline.

    RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"



    # Fields are separated by a comma. Eat whitespace around commas.

    FS="[\t ]*,[\t ]*"  



    # Read the CSV file. If the first field only contains digits,

    # and there are at least two fields in the record,

    # add the second field to a lookup table keyed by the first field.

    while ((getline < csv) > 0) 

        if ($1 ~ /^[0-9]+$/ && NF >= 2)        

            lookup[$1 + 10] = $2



    # Reset field separators to linear whitespace for the text file.

    FS="[\t ]+"

}  



(NR in lookup){ $0 = lookup[NR] }1' "$PLAN" > "$NEW"