LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Nested loop questions and using a counter to increment array output (https://www.linuxquestions.org/questions/programming-9/nested-loop-questions-and-using-a-counter-to-increment-array-output-860259/)

aSingularity 02-02-2011 11:41 AM

Nested loop questions and using a counter to increment array output
 
2 Attachment(s)
Hey all. I work in a simulations environment. I'm trying to write a bash script that will read fields from a .csv file into an array, the first field being an identifiyng number and the second field being a corresponding url. There are about 1600 of these number/url combinations in the .csv file that i'm reading from. Once that is done i want it to parse a text file and match the number, when it has a match i want it to enter the corresponding url into a particular line in the text file. The script I have written (with the help of the people on this forum a while back) does this well, but now I have a lot more data to parse. I think the script itself is explanatory enough to see what i'm doing. What i would like to do is cut it down to one while loop nested inside another loop so that I don't have 1600 or so elif statements. I can't figure out how to increment the output of the array. for instance, the first cycle would find the number that matches ${record1[2]} and input the url stored in ${record1[3]}. the next cycle would match ${record1[4]} and input the url in ${record1[5]}, and so on, does that make sense?
The code is below and a sample .csv and text file are attached.
Thanks for any and all help!

Code:

#!/bin/bash

echo
echo "This script will insert URL's in the force planning file."




## This section identifies the .csv file
echo
echo "Please enter the path to the .csv file you will be using"

cnt=0
while true
do
read -r csv
 if [[ -f "$csv" || -f ../"$csv" ]]
  then
  echo "File exists"
  break
  else
  echo "Try again"
 fi

  ((cnt+=1))
  if [[ $cnt -eq 5 ]]
    then
    echo "exceeded 5 tries. quitting. Do you know what you're doing?"
    exit
  fi
done

echo
echo "Please enter the name of the text file you would like to change"

## This section is for identifying the text file
cnt=0
while true
do
read -r fplan
 if [[ -f "$fplan" || -f ../"$fplan" ]]
  then
  echo "File exists"
  break
  else
  echo "Try again"
 fi

  ((cnt+=1))
  if [[ $cnt -eq 5 ]]
    then
    echo "exceeded 5 tries. quitting. Do you know what you're doing?"
    exit
  fi
done

echo
echo "Please wait while the task completes"
echo "Patience is a virture, posess it if you can"
echo "Seldom found in women,  never in a man"

sleep 2                                         

var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters.

IFS=',' record1=( ${var1} )  ## Sets the field delimiter and reads the csv file into an array

sleep 2

xterm -e tail -f $PWD/complete &  ## Opens a tail window to see the progress

## This section will parse the csv data into the fplan text file in the proper locations
while read -r LINE
do                                                             
    if [[ "${LINE}" =~ "SystemDeclarationData" ]]             
    then
        echo "$LINE"
        read -r LINE 
        if [[ ${LINE} == ${record1[2]} ]]                 
        then                                                   
            echo "$LINE"                                       
            count=0
            while read -r LINE
            do
            if [[ $count == 23 ]]
            then
            echo "${record1[3]}"
            break
            else echo "$LINE"
            ((count+=1))
            fi
            done
        elif [[ ${LINE} == ${record1[4]} ]]           
        then                                                   
            echo "$LINE"                                       
            count=0
            while read -r LINE
            do
            if [[ $count == 23 ]]
            then
            echo "${record1[5]}"
            break
            else echo "$LINE"
            ((count+=1))
            fi
            done
        elif [[ ${LINE} == ${record1[6]} ]]                         
        then                                                   
            echo "$LINE"                                       
            count=0
            while read -r LINE
            do
            if [[ $count == 23 ]]
            then
            echo "${record1[7]}"
            break
            else echo "$LINE"
            ((count+=1))
            fi
            done
        elif [[ ${LINE} == ${record1[8]} ]]                       
        then                                                   
            echo "$LINE"                                       
            count=0
            while read -r LINE
            do
            if [[ $count == 23 ]]
            then
            echo "${record1[9]}"
            break
            else echo "$LINE"
            ((count+=1))
            fi
            done
        else echo "$LINE"
        fi
    else echo "$LINE"
    fi

done < "$fplan" > $PWD/complete                                                   

sleep 5

echo
echo
echo  "backing up original text file"

cp "$fplan" "$fplan".bak

sleep 5
echo
echo "renaming the changed file to work with the sim environment"

mv $PWD/complete "$fplan"

sleep 3
echo
echo "operation completed"
echo
echo

killall xterm


rustek 02-02-2011 01:52 PM

#prep your loop vars
evenctr=2
let oddctr="10#$evenctr+1"

# within your loop use
${record1[$evenctr]}
${record1[$oddctr]}

let evenctr="10#$evenctr+2"
let oddctr="10#$evenctr+1"

#you will need a way to break the loop, something like this maybe
#I didn't test this
if [ -z ${record1[$oddctr]} ]; then break

I won't write it for you, but this should get you started.

Russ

aSingularity 02-02-2011 03:35 PM

Heh. No worries, I don't want it written for me, just need some guidance.
It took me a minute, but i see what you're doing there. This is really my first time using an array, it didn't make sense to me at first.
Many thanks for the reply!

Can you tell me what the 10# does in let evenctr="10#$evenctr+2"?

Perhaps i'm not looking in the right place, but I didn't find it.
Thanks again!

Nominal Animal 02-02-2011 04:00 PM

This would be so much easier if you use gawk (GNU awk) for the lookup and replacement. Consider this skeleton script:
Code:

#!/bin/bash

PLAN=./sample.fplan
CSV=./test.csv

NEW=./sample.fplan.temp

if ! gawk -v "csv=$CSV" '
    BEGIN {
        # Records (lines) are separated by some form of a newline.
        RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"

        # Fields are separated by a comma. Eat whitespace around commas.
        FS="[\t ]*,[\t ]*"

        # Read the CSV file. If the first field only contains digits,
        # and there are at least two fields in the record,
        # add the second field to a lookup table keyed by the first field.
        while ((getline < csv) > 0)
            if ($1 ~ /^[0-9]+$/ && NF >= 2)
                lookup[$1] = $2

        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"
    }

    {
        # Check if any of the fields in this record (line)
        # is a lookup key. If yes, replace with the lookup value.
        for (i = 1; i <= NF; i++)
            if ($i in lookup)
                $i = lookup[$i]

        # Output the (possibly modified) line.
        print $0
    }' "$PLAN" > "$NEW" ; then

    echo "Error processing $PLAN or $CSV." >&2

    exit 1
else

    if ! mv -b --suffix=.old "$NEW" "$PLAN" ; then

        echo "Cannot replace $PLAN with the new one." >&2

        exit 1
    fi
fi

echo "Done!" >&2

exit 0

This does not have the user interface features. You will still need to ask the file names ($PLAN, $CSV) and construct a temporary file name ($NEW) before the gawk script is run.

Note that the gawk script replaces any complete token (separated by whitespace) matching a key in the CSV file. It does not have to be at the start of the line, or even the only thing on the line. I assumed that would be more useful to you. If, however, you only wish to check the first word in the file, change the check loop in gawk into
Code:

# Check if the first field in this record (line) is a lookup key.
# If yes, replace the first field with the lookup value.
if ($1 in lookup)
$1 = lookup[$1]

Hope this helps,
Nominal Animal

rustek 02-02-2011 05:30 PM

Quote:

Originally Posted by aSingularity (Post 4246297)
Heh. No worries, I don't want it written for me, just need some guidance.
It took me a minute, but i see what you're doing there. This is really my first time using an array, it didn't make sense to me at first.
Many thanks for the reply!

Can you tell me what the 10# does in let evenctr="10#$evenctr+2"?

Perhaps i'm not looking in the right place, but I didn't find it.
Thanks again!

The 10# makes sure the math is done in decimal, sometimes when your cutting numbers to use you get leading zeros and the number will be taken as octal.

I don't think you need it in your case, I just copied some code from one of my scripts that did need it and left it in.

it can be evenctr="$evenctr+2"

aSingularity 02-02-2011 07:11 PM

@nominal animal: gawk is something i will have to look into. Never really played with it. I do believe that I am going to play with your script though, I can see quite a few things i could use gawk for. Thank you for the reply.

@rustek: appreciate the response. I've been playing with your suggestions and have integrated a version of them not only in this script, but in another also. Thanks for the help!!

grail 02-03-2011 12:44 AM

Quote:

This does not have the user interface features. You will still need to ask the file names ($PLAN, $CSV) and construct a temporary file name ($NEW) before the gawk script is run.
This can be done within the BEGIN using getline from "-":
Code:

BEGIN{
    printf "Please enter the path to the .csv file you will be using: "
    getline csv_file_name < "-"
}

As for your bash code, personally I find an infinite loop that is later broken out of using an if to be pointless.
Just put your testing in the while loop section, like:
Code:

echo "Please enter the path to the .csv file you will be using"
read -r csv
cnt=0

until [[ -f $csv || -f ../$csv ]] || (( cnt++ == 5 ))
do
    echo "$csv could not be found. Please try again"
    read -r csv
done

if (( cnt > 5 ))
then
    echo "exceeded 5 tries. quitting. Do you know what you're doing?"
    exit
fi

Also I am curious about your sed command:
Code:

var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters.
If I understand the format of the data file to be:
Code:

123,data
456,data2

If you run the above it would return the following:
Code:

123,data456,data2
Which I would have said is not what you want?
Maybe it could just be simple like:
Code:

records=($(sed 's/,/ /g' $csv))
And lastly, maybe for the looping you could try something like:
Code:

exec 3<&0
exec < $fplan

while read -ru3 $LINE
do
    if [[ "${LINE}" =~ "SystemDeclarationData" ]]             
    then
        echo "$LINE"
        read -ru3 LINE
        echo "$LINE"

        found=0
        for (( cnt = 2; cnt <= ${!records[@]} || found; cnt+=2 ))
        do
            if [[ $LINE == ${records[cnt]} ]]
            then
                for (( cnt2 = 0; cnt2 < 23; cnt2++ ))
                do
                    read -ru3 LINE
                    echo "$LINE"
                done
                echo "${records[cnt+1]}"
            fi
        done
    else
        echo "$LINE"
    fi
done> $PWD/complete
exec 0<&3 3<&-


aSingularity 02-08-2011 05:19 PM

I'm beginning to really like awk for stuff like this. Heres a question though, This portion of the code replaces the matched lookup key with the string located in $0. How would i skip lines. For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16. The next pattern in the lookup table would be matched on line 18, so I would want the output of print $0 to appear 10 lines below that on line 28 and so on. In c# i would just use Console.ReadLine. I tried using getline and the NR variable, unfortunately, my lightbulb is very dim today.

Code:

{
        # Check if any of the fields in this record (line)
        # is a lookup key. If yes, replace with the lookup value.
        for (i = 1; i <= NF; i++)
            if ($i in lookup)
                $i = lookup[$i]

        # Output the (possibly modified) line.
        print $0
    }' "$PLAN" > "$NEW" ; then


grail 02-08-2011 07:41 PM

You will probably need to set your NR value so that the next read of the file will increment it to the value you are looking for.
Something like:
Code:

if (found what you want)
    NR += 9

So this will set NR to 15, if currently at line 6, then next line, 16, when read you can print what you require.
You will probably also need to add something to the if that lets you know this is the time to print.

I know I have not given the solution exactly, but figured you might enjoy playing to try and find it.

If you get stuck just let us know :)

ntubski 02-08-2011 08:51 PM

Quote:

Originally Posted by grail (Post 4246711)
As for your bash code, personally I find an infinite loop that is later broken out of using an if to be pointless.
Just put your testing in the while loop section, ...

I disagree with this, you've repeated the read statement, repeating code is never a good idea. You might later rename the csv variable and forget to change one of reads, leading to a bug that only shows up when the user makes a typo.

aSingularity 02-08-2011 09:41 PM

Alright sir, I am a little stuck. I have tried quite a few combinations of tests. Was reading through the gnu awk users guide and now I think I am over complicating things. So far I have managed to change the output not at all or clear the document. Nothing in between. Now for what it's worth, the csv file i'm playing with has 1555 rows and 2 columns. The text document has 3,636,436 lines. Yes they are deliberately massive. Some contain text and some are blank. Thanks to Nominal Animals initial little gawk script, I have managed to pick up some very useful stuff, and a few good hours of entertainment. Now i'm going to ask for another hint. I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.
Thanks again fellas.

grail 02-08-2011 09:52 PM

Quote:

I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.
I might need a little more information on what 'destroying the data' means?
First point is that awk does not change the original file so there should be no data loss from this point of view.

Are you referring to the fact that the skipped lines are not in the new file (cause I sort of thought this was the point)?

Quote:

I disagree with this, you've repeated the read statement, repeating code is never a good idea.
Agreed that repeating is not a good idea, however, I did not wish to make the code too obscure so as to confuse the situation
and I was following the similar format the user had at the time to ask the question outside the loop:
Code:

cnt=0
until [[ -f $csv || -f ../$csv ]] || (( cnt++ == 5 ))
do
    (( cnt > 1 )) && echo "$csv could not be found. Please try again"
    echo "Please enter the path to the .csv file you will be using"
    read -r csv
done


aSingularity 02-08-2011 10:04 PM

Quote:

Originally Posted by grail (Post 4252601)
I might need a little more information on what 'destroying the data' means?
First point is that awk does not change the original file so there should be no data loss from this point of view.

Are you referring to the fact that the skipped lines are not in the new file (cause I sort of thought this was the point)?

As far as the destroying the data goes, a few times that i have tried i ended up with completely blank output. The part that I am having a hard time with ( i think) is that as in my mind, i have already made the match, i should be able to query the line number of the match (it's my understanding that NR increments as the file is parsed) then once i hit the correct number of lines (NR += 9) print the data from the lookup table. It seems to me that it should be simple. This is how i was doing it in my original bash script. Perhaps i'm thinking too much along those lines?

Code:

count=0
            while read -r LINE
            do
            if [[ $count == 23 ]]
            then
            echo "${record1[3]}"
            break
            else echo "$LINE"
            ((count+=1))
            fi

p.s. in reference to the sed command you asked about above, I now next to nothing about sed, that line was on a page of "useful sed oneliners" and looked to do what i was looking for so i incorporated it. ;)

Nominal Animal 02-08-2011 11:11 PM

Quote:

Originally Posted by aSingularity (Post 4252425)
For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16.

Use a line counter or replacement line array. A line counter can hold only one pending replacement, so I suggest using an array.

Change this
Code:

        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"
    }

    {
        # Check if any of the fields in this record (line)
        # is a lookup key. If yes, replace with the lookup value.
        for (i = 1; i <= NF; i++)
            if ($i in lookup)
                $i = lookup[$i]

        # Output the (possibly modified) line.
        print $0
    }' "$PLAN" > "$NEW" ; then

to this
Code:

        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"

        # No line replacements to do yet.
        split("", replacement)
    }

    ($1 in lookup) {
        # Replace 10th line following this one with the lookup value.
        replacement[NR + 10] = lookup[$1]
        # Empty this line.
        $0 = ""
    }

    (NR in replacement) {
        # This is a replaced line.
        print replacement[NR]
        delete replacement[NR]
        next
    }

    {
        # Output current line.
        print $0
    }' "$PLAN" > "$NEW" ; then

This adds an initially empty array, replacements, which contains the replacements for future records.

The first new section will check if the first field is something to look up; if yes, it adds the 10th following record to be replaced by the lookup value, then empties the record. (You can add next here if you want to skip this line from output, but note that the replacements count input records, not output records; be sharp with your record counts.)

The second section checks if the current record is in the replacements array. If yes, it prints the replacement, and removes the entry from the array to save memory. Note the next statement; it skips directly to the next record; normally the third section would also be run.

The third section just prints out the current record.

Did this help?
Nominal Animal

grail 02-09-2011 04:04 AM

Hey Nominal ... I wonder if we are missing the forest for the trees here :)
It appears that the line to be replaced is always $1 from first file plus a constant away (please correct me if I am wrong OP).
So this would mean you awk could do the following:
Code:

if ! gawk -v "csv=$CSV" '
BEGIN {
    # Records (lines) are separated by some form of a newline.
    RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"

    # Fields are separated by a comma. Eat whitespace around commas.
    FS="[\t ]*,[\t ]*" 

    # Read the CSV file. If the first field only contains digits,
    # and there are at least two fields in the record,
    # add the second field to a lookup table keyed by the first field.
    while ((getline < csv) > 0)
        if ($1 ~ /^[0-9]+$/ && NF >= 2)       
            lookup[$1 + 10] = $2

    # Reset field separators to linear whitespace for the text file.
    FS="[\t ]+"


(NR in lookup){ $0 = lookup[NR] }1' "$PLAN" > "$NEW"



All times are GMT -5. The time now is 09:45 AM.