Nested loop questions and using a counter to increment array output
2 Attachment(s)
Hey all. I work in a simulations environment. I'm trying to write a bash script that will read fields from a .csv file into an array, the first field being an identifiyng number and the second field being a corresponding url. There are about 1600 of these number/url combinations in the .csv file that i'm reading from. Once that is done i want it to parse a text file and match the number, when it has a match i want it to enter the corresponding url into a particular line in the text file. The script I have written (with the help of the people on this forum a while back) does this well, but now I have a lot more data to parse. I think the script itself is explanatory enough to see what i'm doing. What i would like to do is cut it down to one while loop nested inside another loop so that I don't have 1600 or so elif statements. I can't figure out how to increment the output of the array. for instance, the first cycle would find the number that matches ${record1[2]} and input the url stored in ${record1[3]}. the next cycle would match ${record1[4]} and input the url in ${record1[5]}, and so on, does that make sense?
The code is below and a sample .csv and text file are attached. Thanks for any and all help! Code:
#!/bin/bash |
#prep your loop vars
evenctr=2 let oddctr="10#$evenctr+1" # within your loop use ${record1[$evenctr]} ${record1[$oddctr]} let evenctr="10#$evenctr+2" let oddctr="10#$evenctr+1" #you will need a way to break the loop, something like this maybe #I didn't test this if [ -z ${record1[$oddctr]} ]; then break I won't write it for you, but this should get you started. Russ |
Heh. No worries, I don't want it written for me, just need some guidance.
It took me a minute, but i see what you're doing there. This is really my first time using an array, it didn't make sense to me at first. Many thanks for the reply! Can you tell me what the 10# does in let evenctr="10#$evenctr+2"? Perhaps i'm not looking in the right place, but I didn't find it. Thanks again! |
This would be so much easier if you use gawk (GNU awk) for the lookup and replacement. Consider this skeleton script:
Code:
#!/bin/bash Note that the gawk script replaces any complete token (separated by whitespace) matching a key in the CSV file. It does not have to be at the start of the line, or even the only thing on the line. I assumed that would be more useful to you. If, however, you only wish to check the first word in the file, change the check loop in gawk into Code:
# Check if the first field in this record (line) is a lookup key. Nominal Animal |
Quote:
I don't think you need it in your case, I just copied some code from one of my scripts that did need it and left it in. it can be evenctr="$evenctr+2" |
@nominal animal: gawk is something i will have to look into. Never really played with it. I do believe that I am going to play with your script though, I can see quite a few things i could use gawk for. Thank you for the reply.
@rustek: appreciate the response. I've been playing with your suggestions and have integrated a version of them not only in this script, but in another also. Thanks for the help!! |
Quote:
Code:
BEGIN{ Just put your testing in the while loop section, like: Code:
echo "Please enter the path to the .csv file you will be using" Code:
var1=`cat $csv | sed ':a;N;$!ba;s/\n//g'` ## Identifies the csv file and removes the newline characters. Code:
123,data Code:
123,data456,data2 Maybe it could just be simple like: Code:
records=($(sed 's/,/ /g' $csv)) Code:
exec 3<&0 |
I'm beginning to really like awk for stuff like this. Heres a question though, This portion of the code replaces the matched lookup key with the string located in $0. How would i skip lines. For example, I match the pattern on line 6. I want the contents of $0 to be printed ten lines below line 6on line 16. The next pattern in the lookup table would be matched on line 18, so I would want the output of print $0 to appear 10 lines below that on line 28 and so on. In c# i would just use Console.ReadLine. I tried using getline and the NR variable, unfortunately, my lightbulb is very dim today.
Code:
{ |
You will probably need to set your NR value so that the next read of the file will increment it to the value you are looking for.
Something like: Code:
if (found what you want) You will probably also need to add something to the if that lets you know this is the time to print. I know I have not given the solution exactly, but figured you might enjoy playing to try and find it. If you get stuck just let us know :) |
Quote:
|
Alright sir, I am a little stuck. I have tried quite a few combinations of tests. Was reading through the gnu awk users guide and now I think I am over complicating things. So far I have managed to change the output not at all or clear the document. Nothing in between. Now for what it's worth, the csv file i'm playing with has 1555 rows and 2 columns. The text document has 3,636,436 lines. Yes they are deliberately massive. Some contain text and some are blank. Thanks to Nominal Animals initial little gawk script, I have managed to pick up some very useful stuff, and a few good hours of entertainment. Now i'm going to ask for another hint. I can't for the life of me get it to print after a specified number of lines without destroying the data that it's skipping over.
Thanks again fellas. |
Quote:
First point is that awk does not change the original file so there should be no data loss from this point of view. Are you referring to the fact that the skipped lines are not in the new file (cause I sort of thought this was the point)? Quote:
and I was following the similar format the user had at the time to ask the question outside the loop: Code:
cnt=0 |
Quote:
Code:
count=0 |
Quote:
Change this Code:
# Reset field separators to linear whitespace for the text file. Code:
# Reset field separators to linear whitespace for the text file. The first new section will check if the first field is something to look up; if yes, it adds the 10th following record to be replaced by the lookup value, then empties the record. (You can add next here if you want to skip this line from output, but note that the replacements count input records, not output records; be sharp with your record counts.) The second section checks if the current record is in the replacements array. If yes, it prints the replacement, and removes the entry from the array to save memory. Note the next statement; it skips directly to the next record; normally the third section would also be run. The third section just prints out the current record. Did this help? Nominal Animal |
Hey Nominal ... I wonder if we are missing the forest for the trees here :)
It appears that the line to be replaced is always $1 from first file plus a constant away (please correct me if I am wrong OP). So this would mean you awk could do the following: Code:
if ! gawk -v "csv=$CSV" ' |
All times are GMT -5. The time now is 09:45 AM. |