LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices

Reply
 
Search this Thread
Old 02-02-2011, 11:41 AM   #1
aSingularity
LQ Newbie
 
Registered: Mar 2010
Posts: 13

Rep: Reputation: 0
Nested loop questions and using a counter to increment array output


Hey all. I work in a simulations environment. I'm trying to write a bash script that will read fields from a .csv file into an array, the first field being an identifiyng number and the second field being a corresponding url. There are about 1600 of these number/url combinations in the .csv file that i'm reading from. Once that is done i want it to parse a text file and match the number, when it has a match i want it to enter the corresponding url into a particular line in the text file. The script I have written (with the help of the people on this forum a while back) does this well, but now I have a lot more data to parse. I think the script itself is explanatory enough to see what i'm doing. What i would like to do is cut it down to one while loop nested inside another loop so that I don't have 1600 or so elif statements. I can't figure out how to increment the output of the array. for instance, the first cycle would find the number that matches ${record1[2]} and input the url stored in ${record1[3]}. the next cycle would match ${record1[4]} and input the url in ${record1[5]}, and so on, does that make sense?
The code is below and a sample .csv and text file are attached.
Thanks for any and all help!

Code:
#!/bin/bash

echo
echo "This script will insert URL's in the force planning file."




## This section identifies the .csv file
echo 
echo "Please enter the path to the .csv file you will be using"

cnt=0
while true
do
read -r csv
 if [[ -f "$csv" || -f ../"$csv" ]]
   then
   echo "File exists"
   break
   else
   echo "Try again"
 fi

  ((cnt+=1))
  if [[ $cnt -eq 5 ]]
    then
    echo "exceeded 5 tries. quitting. Do you know what you're doing?"
    exit
  fi
done

echo 
echo "Please enter the name of the text file you would like to change"

## This section is for identifying the text file
cnt=0
while true
do
read -r fplan
 if [[ -f "$fplan" || -f ../"$fplan" ]]
   then
   echo "File exists"
   break
   else
   echo "Try again"
 fi

  ((cnt+=1))
  if [[ $cnt -eq 5 ]]
    then
    echo "exceeded 5 tries. quitting. Do you know what you're doing?"
    exit
  fi
done

echo
echo "Please wait while the task completes"
echo "Patience is a virture, posess it if you can"
echo "Seldom found in women,  never in a man"

sleep 2                                          

var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters. 

IFS=',' record1=( ${var1} )  ## Sets the field delimiter and reads the csv file into an array

sleep 2

xterm -e tail -f $PWD/complete &  ## Opens a tail window to see the progress

## This section will parse the csv data into the fplan text file in the proper locations
while read -r LINE
do                                                              
    if [[ "${LINE}" =~ "SystemDeclarationData" ]]              
    then
	echo "$LINE"
	read -r LINE  
	if [[ ${LINE} == ${record1[2]} ]]                   
	then                                                    
	    echo "$LINE"	                                
	    count=0
	    while read -r LINE
	    do
	    if [[ $count == 23 ]]
	    then
	    echo "${record1[3]}"
	    break
	    else echo "$LINE"
	    ((count+=1))
	    fi
	    done
	elif [[ ${LINE} == ${record1[4]} ]]            
	then                                                    
	    echo "$LINE"	                                
	    count=0
	    while read -r LINE
	    do
	    if [[ $count == 23 ]]
	    then
	    echo "${record1[5]}"
	    break
	    else echo "$LINE"
	    ((count+=1))
	    fi
	    done
	elif [[ ${LINE} == ${record1[6]} ]]                          
	then                                                    
	    echo "$LINE"	                                
	    count=0
	    while read -r LINE
	    do
	    if [[ $count == 23 ]]
	    then
	    echo "${record1[7]}"
	    break
	    else echo "$LINE"
	    ((count+=1))
	    fi
	    done
	elif [[ ${LINE} == ${record1[8]} ]]                         
	then                                                    
	    echo "$LINE"	                                
	    count=0
	    while read -r LINE
	    do
	    if [[ $count == 23 ]]
	    then
	    echo "${record1[9]}"
	    break
	    else echo "$LINE"
	    ((count+=1))
	    fi
	    done
	else echo "$LINE"
	fi
    else echo "$LINE"
    fi

done < "$fplan" > $PWD/complete                                                     

sleep 5

echo
echo
echo  "backing up original text file"

cp "$fplan" "$fplan".bak

sleep 5
echo
echo "renaming the changed file to work with the sim environment"

mv $PWD/complete "$fplan"

sleep 3
echo
echo "operation completed"
echo
echo

killall xterm
Attached Files
File Type: txt sample.fplan.txt (485 Bytes, 5 views)
File Type: txt test.csv.txt (86 Bytes, 5 views)
 
Old 02-02-2011, 01:52 PM   #2
rustek
Member
 
Registered: Jan 2010
Location: Melbourne, IA, USA
Distribution: Ubuntu
Posts: 93

Rep: Reputation: 8
#prep your loop vars
evenctr=2
let oddctr="10#$evenctr+1"

# within your loop use
${record1[$evenctr]}
${record1[$oddctr]}

let evenctr="10#$evenctr+2"
let oddctr="10#$evenctr+1"

#you will need a way to break the loop, something like this maybe
#I didn't test this
if [ -z ${record1[$oddctr]} ]; then break

I won't write it for you, but this should get you started.

Russ
 
1 members found this post helpful.
Old 02-02-2011, 03:35 PM   #3
aSingularity
LQ Newbie
 
Registered: Mar 2010
Posts: 13

Original Poster
Rep: Reputation: 0
Heh. No worries, I don't want it written for me, just need some guidance.
It took me a minute, but i see what you're doing there. This is really my first time using an array, it didn't make sense to me at first.
Many thanks for the reply!

Can you tell me what the 10# does in let evenctr="10#$evenctr+2"?

Perhaps i'm not looking in the right place, but I didn't find it.
Thanks again!

Last edited by aSingularity; 02-02-2011 at 03:54 PM.
 
Old 02-02-2011, 04:00 PM   #4
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
This would be so much easier if you use gawk (GNU awk) for the lookup and replacement. Consider this skeleton script:
Code:
#!/bin/bash

PLAN=./sample.fplan
CSV=./test.csv

NEW=./sample.fplan.temp

if ! gawk -v "csv=$CSV" '
    BEGIN {
        # Records (lines) are separated by some form of a newline.
        RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"

        # Fields are separated by a comma. Eat whitespace around commas.
        FS="[\t ]*,[\t ]*"

        # Read the CSV file. If the first field only contains digits,
        # and there are at least two fields in the record,
        # add the second field to a lookup table keyed by the first field.
        while ((getline < csv) > 0)
            if ($1 ~ /^[0-9]+$/ && NF >= 2)
                lookup[$1] = $2

        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"
    }

    {
        # Check if any of the fields in this record (line)
        # is a lookup key. If yes, replace with the lookup value.
        for (i = 1; i <= NF; i++)
            if ($i in lookup)
                $i = lookup[$i]

        # Output the (possibly modified) line.
        print $0
    }' "$PLAN" > "$NEW" ; then

    echo "Error processing $PLAN or $CSV." >&2

    exit 1
else

    if ! mv -b --suffix=.old "$NEW" "$PLAN" ; then

        echo "Cannot replace $PLAN with the new one." >&2

        exit 1
    fi
fi

echo "Done!" >&2

exit 0
This does not have the user interface features. You will still need to ask the file names ($PLAN, $CSV) and construct a temporary file name ($NEW) before the gawk script is run.

Note that the gawk script replaces any complete token (separated by whitespace) matching a key in the CSV file. It does not have to be at the start of the line, or even the only thing on the line. I assumed that would be more useful to you. If, however, you only wish to check the first word in the file, change the check loop in gawk into
Code:
# Check if the first field in this record (line) is a lookup key.
# If yes, replace the first field with the lookup value.
if ($1 in lookup)
$1 = lookup[$1]
Hope this helps,
Nominal Animal

Last edited by Nominal Animal; 03-21-2011 at 07:26 AM.
 
Old 02-02-2011, 05:30 PM   #5
rustek
Member
 
Registered: Jan 2010
Location: Melbourne, IA, USA
Distribution: Ubuntu
Posts: 93

Rep: Reputation: 8
Quote:
Originally Posted by aSingularity View Post
Heh. No worries, I don't want it written for me, just need some guidance.
It took me a minute, but i see what you're doing there. This is really my first time using an array, it didn't make sense to me at first.
Many thanks for the reply!

Can you tell me what the 10# does in let evenctr="10#$evenctr+2"?

Perhaps i'm not looking in the right place, but I didn't find it.
Thanks again!
The 10# makes sure the math is done in decimal, sometimes when your cutting numbers to use you get leading zeros and the number will be taken as octal.

I don't think you need it in your case, I just copied some code from one of my scripts that did need it and left it in.

it can be evenctr="$evenctr+2"
 
Old 02-02-2011, 07:11 PM   #6
aSingularity
LQ Newbie
 
Registered: Mar 2010
Posts: 13

Original Poster
Rep: Reputation: 0
@nominal animal: gawk is something i will have to look into. Never really played with it. I do believe that I am going to play with your script though, I can see quite a few things i could use gawk for. Thank you for the reply.

@rustek: appreciate the response. I've been playing with your suggestions and have integrated a version of them not only in this script, but in another also. Thanks for the help!!
 
Old 02-03-2011, 12:44 AM   #7
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,425

Rep: Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876
Quote:
This does not have the user interface features. You will still need to ask the file names ($PLAN, $CSV) and construct a temporary file name ($NEW) before the gawk script is run.
This can be done within the BEGIN using getline from "-":
Code:
BEGIN{
    printf "Please enter the path to the .csv file you will be using: "
    getline csv_file_name < "-"
}
As for your bash code, personally I find an infinite loop that is later broken out of using an if to be pointless.
Just put your testing in the while loop section, like:
Code:
echo "Please enter the path to the .csv file you will be using"
read -r csv
cnt=0

until [[ -f $csv || -f ../$csv ]] || (( cnt++ == 5 ))
do
    echo "$csv could not be found. Please try again"
    read -r csv
done

if (( cnt > 5 ))
then
    echo "exceeded 5 tries. quitting. Do you know what you're doing?"
    exit
fi
Also I am curious about your sed command:
Code:
var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters.
If I understand the format of the data file to be:
Code:
123,data
456,data2
If you run the above it would return the following:
Code:
123,data456,data2
Which I would have said is not what you want?
Maybe it could just be simple like:
Code:
records=($(sed 's/,/ /g' $csv))
And lastly, maybe for the looping you could try something like:
Code:
exec 3<&0
exec < $fplan

while read -ru3 $LINE
do
    if [[ "${LINE}" =~ "SystemDeclarationData" ]]              
    then
	echo "$LINE"
	read -ru3 LINE
	echo "$LINE"

        found=0
        for (( cnt = 2; cnt <= ${!records[@]} || found; cnt+=2 ))
        do
            if [[ $LINE == ${records[cnt]} ]]
            then
                for (( cnt2 = 0; cnt2 < 23; cnt2++ ))
                do
	            read -ru3 LINE
	            echo "$LINE"
                done
                echo "${records[cnt+1]}"
            fi
        done
    else
        echo "$LINE"
    fi
done> $PWD/complete
exec 0<&3 3<&-
 
1 members found this post helpful.
Old 02-08-2011, 05:19 PM   #8
aSingularity
LQ Newbie
 
Registered: Mar 2010
Posts: 13

Original Poster
Rep: Reputation: 0
I'm beginning to really like awk for stuff like this. Heres a question though, This portion of the code replaces the matched lookup key with the string located in $0. How would i skip lines. For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16. The next pattern in the lookup table would be matched on line 18, so I would want the output of print $0 to appear 10 lines below that on line 28 and so on. In c# i would just use Console.ReadLine. I tried using getline and the NR variable, unfortunately, my lightbulb is very dim today.

Code:
{
        # Check if any of the fields in this record (line)
        # is a lookup key. If yes, replace with the lookup value.
        for (i = 1; i <= NF; i++)
            if ($i in lookup)
                $i = lookup[$i]

        # Output the (possibly modified) line.
        print $0
    }' "$PLAN" > "$NEW" ; then
 
Old 02-08-2011, 07:41 PM   #9
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,425

Rep: Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876
You will probably need to set your NR value so that the next read of the file will increment it to the value you are looking for.
Something like:
Code:
if (found what you want)
    NR += 9
So this will set NR to 15, if currently at line 6, then next line, 16, when read you can print what you require.
You will probably also need to add something to the if that lets you know this is the time to print.

I know I have not given the solution exactly, but figured you might enjoy playing to try and find it.

If you get stuck just let us know
 
Old 02-08-2011, 08:51 PM   #10
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian
Posts: 2,395

Rep: Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814Reputation: 814
Quote:
Originally Posted by grail View Post
As for your bash code, personally I find an infinite loop that is later broken out of using an if to be pointless.
Just put your testing in the while loop section, ...
I disagree with this, you've repeated the read statement, repeating code is never a good idea. You might later rename the csv variable and forget to change one of reads, leading to a bug that only shows up when the user makes a typo.
 
Old 02-08-2011, 09:41 PM   #11
aSingularity
LQ Newbie
 
Registered: Mar 2010
Posts: 13

Original Poster
Rep: Reputation: 0
Alright sir, I am a little stuck. I have tried quite a few combinations of tests. Was reading through the gnu awk users guide and now I think I am over complicating things. So far I have managed to change the output not at all or clear the document. Nothing in between. Now for what it's worth, the csv file i'm playing with has 1555 rows and 2 columns. The text document has 3,636,436 lines. Yes they are deliberately massive. Some contain text and some are blank. Thanks to Nominal Animals initial little gawk script, I have managed to pick up some very useful stuff, and a few good hours of entertainment. Now i'm going to ask for another hint. I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.
Thanks again fellas.
 
Old 02-08-2011, 09:52 PM   #12
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,425

Rep: Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876
Quote:
I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.
I might need a little more information on what 'destroying the data' means?
First point is that awk does not change the original file so there should be no data loss from this point of view.

Are you referring to the fact that the skipped lines are not in the new file (cause I sort of thought this was the point)?

Quote:
I disagree with this, you've repeated the read statement, repeating code is never a good idea.
Agreed that repeating is not a good idea, however, I did not wish to make the code too obscure so as to confuse the situation
and I was following the similar format the user had at the time to ask the question outside the loop:
Code:
cnt=0
until [[ -f $csv || -f ../$csv ]] || (( cnt++ == 5 ))
do
    (( cnt > 1 )) && echo "$csv could not be found. Please try again"
    echo "Please enter the path to the .csv file you will be using"
    read -r csv
done
 
1 members found this post helpful.
Old 02-08-2011, 10:04 PM   #13
aSingularity
LQ Newbie
 
Registered: Mar 2010
Posts: 13

Original Poster
Rep: Reputation: 0
Quote:
Originally Posted by grail View Post
I might need a little more information on what 'destroying the data' means?
First point is that awk does not change the original file so there should be no data loss from this point of view.

Are you referring to the fact that the skipped lines are not in the new file (cause I sort of thought this was the point)?
As far as the destroying the data goes, a few times that i have tried i ended up with completely blank output. The part that I am having a hard time with ( i think) is that as in my mind, i have already made the match, i should be able to query the line number of the match (it's my understanding that NR increments as the file is parsed) then once i hit the correct number of lines (NR += 9) print the data from the lookup table. It seems to me that it should be simple. This is how i was doing it in my original bash script. Perhaps i'm thinking too much along those lines?

Code:
 count=0
	    while read -r LINE
	    do
	    if [[ $count == 23 ]]
	    then
	    echo "${record1[3]}"
	    break
	    else echo "$LINE"
	    ((count+=1))
	    fi
p.s. in reference to the sed command you asked about above, I now next to nothing about sed, that line was on a page of "useful sed oneliners" and looked to do what i was looking for so i incorporated it.

Last edited by aSingularity; 02-08-2011 at 10:07 PM.
 
Old 02-08-2011, 11:11 PM   #14
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942Reputation: 942
Quote:
Originally Posted by aSingularity View Post
For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16.
Use a line counter or replacement line array. A line counter can hold only one pending replacement, so I suggest using an array.

Change this
Code:
        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"
    }

    {
        # Check if any of the fields in this record (line)
        # is a lookup key. If yes, replace with the lookup value.
        for (i = 1; i <= NF; i++)
            if ($i in lookup)
                $i = lookup[$i]

        # Output the (possibly modified) line.
        print $0
    }' "$PLAN" > "$NEW" ; then
to this
Code:
        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"

        # No line replacements to do yet.
        split("", replacement)
    }

    ($1 in lookup) {
        # Replace 10th line following this one with the lookup value.
        replacement[NR + 10] = lookup[$1]
        # Empty this line.
        $0 = ""
    }

    (NR in replacement) {
        # This is a replaced line.
        print replacement[NR]
        delete replacement[NR]
        next
    }

    {
        # Output current line.
        print $0
    }' "$PLAN" > "$NEW" ; then
This adds an initially empty array, replacements, which contains the replacements for future records.

The first new section will check if the first field is something to look up; if yes, it adds the 10th following record to be replaced by the lookup value, then empties the record. (You can add next here if you want to skip this line from output, but note that the replacements count input records, not output records; be sharp with your record counts.)

The second section checks if the current record is in the replacements array. If yes, it prints the replacement, and removes the entry from the array to save memory. Note the next statement; it skips directly to the next record; normally the third section would also be run.

The third section just prints out the current record.

Did this help?
Nominal Animal

Last edited by Nominal Animal; 03-21-2011 at 07:08 AM.
 
Old 02-09-2011, 04:04 AM   #15
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,425

Rep: Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876Reputation: 1876
Hey Nominal ... I wonder if we are missing the forest for the trees here
It appears that the line to be replaced is always $1 from first file plus a constant away (please correct me if I am wrong OP).
So this would mean you awk could do the following:
Code:
if ! gawk -v "csv=$CSV" '
BEGIN {
    # Records (lines) are separated by some form of a newline.
    RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"

    # Fields are separated by a comma. Eat whitespace around commas.
    FS="[\t ]*,[\t ]*"  

    # Read the CSV file. If the first field only contains digits,
    # and there are at least two fields in the record,
    # add the second field to a lookup table keyed by the first field.
    while ((getline < csv) > 0) 
        if ($1 ~ /^[0-9]+$/ && NF >= 2)         
            lookup[$1 + 10] = $2

    # Reset field separators to linear whitespace for the text file.
    FS="[\t ]+"
}   

(NR in lookup){ $0 = lookup[NR] }1' "$PLAN" > "$NEW"
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] for loop and nested find kez1985 Linux - Newbie 1 10-01-2010 10:46 AM
Nested loop with 2 counters for 2 array variables briana.paige Linux - Newbie 2 06-19-2009 07:38 AM
can't put an if statement as an increment in a for loop. Why? japhy Programming 2 02-07-2009 06:07 AM
Nested-double loop error Harry Seldon Programming 3 05-06-2006 05:15 PM
facing problem in increment operator of set :infinite LOOP ashwinipahuja Programming 0 06-03-2004 12:05 AM


All times are GMT -5. The time now is 09:05 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration