LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Nested loop questions and using a counter to increment array output (https://www.linuxquestions.org/questions/programming-9/nested-loop-questions-and-using-a-counter-to-increment-array-output-860259/)

Nominal Animal 02-09-2011 11:22 AM

grail, how did you come up with that?

In the original script, when a key is encountered, the script outputs the key, copies the next 22 records to standard output, reads the 23rd record, but outputs the value instead. So I think the closest awk equivalent is actually
Code:

gawk -v "csv=$CSV" '
    BEGIN {
        RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"
        FS="[\t ]*,[\t ]*"
        while ((getline < csv) > 0)
            if ($1 ~ /^[0-9]+$/ && NF >= 2)
                lookup[$1] = $2
        FS="[\t ]+"
    }

    {
        if ($1 in lookup) {
            value = lookup[$1]
            for (i = 1; i <= 23; i++) {
                print $0
                if ((getline) <= 0)
                    break
            }
            print value
            next
        }

        print $0
    }' "$PLAN" > "$NEW"

I have a cold, so I'm a bit scattered; what do you think about this?
Nominal Animal

aSingularity 02-09-2011 08:09 PM

This, is exactly what i was trying to accomplish. It is also nowhere near the things that i was coming up with. I never evern thought about using an array.

Code:

# Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"

        # No line replacements to do yet.
        split("", replacement)
    }

    ($1 in lookup) {
        # Replace 10th line following this one with the lookup value.
        replacement[NR + 10] = lookup[$1]
        # Empty this line.
        $0 = ""
    }

    (NR in replacement) {
        # This is a replaced line.
        print replacement[NR]
        delete replacement[NR]
        next
    }

    {
        # Output current line.
        print $0
    }' "$PLAN" > "$NEW" ; then

Now, correct me if i'm wrong, looking at this, this is what i see:
replacement[NR+10] is loading the current line number + 10 into an array. The = lookup[$1] is telling it what initial line number to use? What is the purpose of the replaced line section, is it just counting up lines?
And one last question, is there an awk resource that ya'll would recommend? For some reason the syntax gets me, not sure why i'm having a hard time with it.

EDIT: Nominal thinks he's scattered, I didn't realize there was a second page to this thread until i hit the submit button on this post. As far as the original script goes, I'm not sure you really want to base too much off that, it's a rookies work. heh. either way. Nominals suggestion quoted in this post is what i'm shooting for. This may be a dumb question, but doesn't quoting out the $0 ="" in the above snippet accomplish about the same thing as the print $0 that is above the getline? I can't run that snippet at the moment, so I may be way off.

Nominal Animal 02-09-2011 09:20 PM

Quote:

Originally Posted by aSingularity (Post 4253630)
replacement[NR+10] is loading the current line number + 10 into an array. The = lookup[$1] is telling it what initial line number to use?

No. replacement[NR+10] = lookup[$1] means: Use $1 (the first field) to look up the replacement text from the lookup array. Save it in the replacement array, under key (current line + 10).

Note that NR is an automatic variable, the number of the current record.

Quote:

Originally Posted by aSingularity (Post 4253630)
What is the purpose of the replaced line section, is it just counting up lines?

Whenever you get to a record number that has been listed in the replacement array, the replaced line section will output the value from the replacement array instead (and delete the entry to save memory).

You see, the idea in this implementation is that the replacement array contains the desired contents for some future records. Whenever there is a trigger text, it just adds the desired future replacement to the array. The processing always progresses an input record at a time, and output record is decided on at the last possible moment.

When your original script encounters the triggering input, it flushes the next 23 lines to get to the place where it must do the replacement. My awk script just notes that the replacement must be done in the future, and progresses normally one record at a time, skipping nothing.

Quote:

Originally Posted by aSingularity (Post 4253630)
And one last question, is there an awk resource that ya'll would recommend?

The GNU Awk User Manual has served me extremely well. I use one-page version constantly when writing awk scripts. It is quite verbose, I've never read it all thorough myself, so I guess starting at the sample programs might be a good idea; of course using the manual in parallel to learn the syntax, built-in functions, and so on. Awk does have a bit surprising syntax, I tend to forget the gsub() and split() argument order for example.

Quote:

Originally Posted by aSingularity (Post 4253630)
doesn't quoting out the $0 = "" in the above snippet accomplish about the same thing as the print $0 that is above the getline?

I don't think so. You see, $0 = "" will clear the current record. Since there is no next, gawk will check if any of the other rules match. If the replacement rule does match, then the entire record is replaced by a previously stored string. Otherwise, the final rule will print the current string. In this case it will print an empty record -- just a newline --, because the record was cleared earlier.

If you omit $0 = "", then the lookup key will be left in the output. (Unless it is replaced because of a previous lookup match exactly ten records earlier.)

The surprising thing about your need is that the triggering text (the first fields in the CSV files) is not itself replaced, but just triggers a later replacement, identified by the line/record number. And the triggering text itself is a number.

Normally, the triggering text itself is replaced. Well, sometimes you can have a prearranged set of conflicting replacements (like aaaa->bbbb and aaaa->cccc), and have the trigger select/activate them.

At this point, I'd recommend you take a hard look at your file generation mechanism. Is it forged in iron, or can you revamp it?
If you can, I'd recommend simplifying the replacement mechanism. For example, if instead of counting lines you could just replace marker strings in the input (like say <{blah}> ), based on a replacement CSV file, this would be much closer to the usual methods. For future extension and compatibility, you could by default replace all <{...}> strings where ... contains letters and numbers, by an empty string; you'd never need to worry about seeing those in the final output. This would be much more robust, too. It is too easy to mistake seven empty lines for eight empty lines, in my opinion.
Nominal Animal

grail 02-09-2011 10:27 PM

Quote:

grail, how did you come up with that?
My bad ... got caught up with the line number stuff and thought maybe it was the identifying number from the lookup file.
That was just me being scattered (and I have no excuse :) )

I have been looking over the original script and assuming I have not missed anything, none of the lines from the original file are removed.
Hence where you have $0="", I am not sure this is required.

Also, you could probably forgo the 'next' command and simply assign replacement[NR] to $0 prior to deleting from the array.

aSingularity 02-09-2011 11:00 PM

Quote:

Originally Posted by Nominal Animal (Post 4253663)
The surprising thing about your need is that the triggering text (the first fields in the CSV files) is not itself replaced, but just triggers a later replacement, identified by the line/record number. And the triggering text itself is a number.

Sir, you and Grail and everyone else have been a huge help. I really appreciate the time you put into your posts.
As a background to possibly clear things up as to my needs, the main piece of software that i am forced to use is a bit antiquated. It uses quite a few text based configuration files and the software itself leaves a lot to be desired as far as building excercises. For instance, the software doesn't allow for naming individuals, it identifies every entity in the game with a number, this is necessary to tie the simulation federates together. The text file itself is basically a template, but the software gui used to build the scenario doesn't allow for manipulation of quite a few of the fields. In the case certain scenarios, we may have a need to be able to manipulate entities in the game by name (the most common), by unit, or by some other identifying string. The people that use our sims generally come in with predefined rosters. In order to provide that functionality we have to manipulate the config files manually. Which is a fairly painstaking process. So far my little bash scripts have worked fairly well, but as excercises get more detailed I don't think they're going to suffice. I could do it with c# but our systems are locked down and don't allow for the installation of mono to make this work. Thanks to efforts of people like Grail and yourself i know which direction i need to go to put together something that will be a lot more useful to me in the long run. My plan is to expound upon the basic framework that you gave me and put together something that will be able to manipulate the fields that the software doesn't allow for without having to reconfigure the script for specific purposes. I'm sure i'll have plenty more questions. heh.

tl/dr: The text file is huge and carved in stone. Entities from the simulation are identified by number. By using that number as a key i can manipulate data dealing with specific entities and the simulation will still recognize them. :)

Nominal Animal 02-10-2011 02:16 PM

Thank you for the information.

You may find it easier to manipulate the output file as an array. For example:
Code:

#!/bin/bash

PLAN=./sample.fplan
CSV=./test.csv

NEW=./sample.fplan.temp

if ! gawk -v "csv=$CSV" -v "plan=$PLAN" '
    BEGIN {
        # Records (lines) are separated by some form of a newline.
        RS="[\v\f]*(\r|\n|\r\n|\n\r)[\v\f]*"

        # Fields are separated by a comma. Eat whitespace around commas.
        FS="[\t ]*,[\t ]*"

        # Read the CSV file. If the first field only contains digits,
        # and there are at least two fields in the record,
        # add the second field to a lookup table keyed by the first field.
        while ((getline < csv) > 0)
            if ($1 ~ /^[0-9]+$/ && NF >= 2)
                lookup[$1] = $2

        # Reset field separators to linear whitespace for the text file.
        FS="[\t ]+"
    }

    {  # Read the plan into line array.
        line[NR] = $0
    }

    END {
        # Whenever a line in the plan matches a key in the
        # lookup array, replace the tenth following line
        # with the lookup value.

        # Reverse order since replacements are done *afterwards*.
        for (i = NR; i > 0; i--)
            if (line[i] in lookup)
              line[i + 10] = lookup[line[i]]

        # Output the modified plan.
        for (i = 1; i <= NR; i++)
            print line[i]

    }' "$PLAN" > "$NEW" ; then

    echo "Error processing $PLAN or $CSV." >&2

    exit 1
else

    if ! mv -b --suffix=.old "$NEW" "$PLAN" ; then

        echo "Cannot replace $PLAN with the new one." >&2

        exit 1
    fi
fi

echo "Done!" >&2

exit 0

Note that this is much easier to understand; all work is done in the first loop in the END section. The only negative side is that gawk reads both the CSV file and the text file fully into memory -- and since these files are at most megabytes in size, that is not an issue; you surely have enough memory to handle these.

I personally like to write my utility scripts as efficient in principle as possible, so I don't normally recommend the above approach, but in your case, I do believe this may very well be the most sensible approach.

Since you can supply variables to gawk using the -v option, you can do pretty complex logic in the END section if you want to, without the script getting too hairy. Feel free to keep the questions coming!
Nominal Animal

grail 02-10-2011 07:24 PM

Quote:

The only negative side is that gawk reads both the CSV file and the text file fully into memory
I would have thought that your previous script would be more preferable as it would only have the csv file and portions
of the other file in memory (as these are the parts stored in an array) ... Just my 2 cents :)

Nominal Animal 02-10-2011 07:55 PM

Quote:

Originally Posted by grail (Post 4254597)
I would have thought that your previous script would be more preferable as it would only have the csv file and portions
of the other file in memory (as these are the parts stored in an array) ... Just my 2 cents :)

You are right, except in this specific case the output format is so fragile that it's safer to do the replacements in memory. This way you don't need to worry about lines being removed or inserted by accident.

Sometimes the less efficient method is much easier to maintain in the long term. In my experience, when dealing with text files and legacy data or legacy formats, easy maintenance is much more important than efficiency. Well, as long as the efficiency is not abysmal! And in this case, the latter script is quite efficient, even if it "unnecessarily" reads both files into memory first.
Nominal Animal

grail 02-10-2011 09:28 PM

Fair call :)

james- 01-19-2012 02:06 PM

I have a similar problem to this and was wondering if anyone could help?

I have an until loop that goes through all the staff id's in a global array, the loop generates an array for each employee and stores a number of values in the array. This is the code used to create the array:

Code:

declare -a emp_$emp_count[$value_count]=$value
This gives me an array which I can retrieve all the values from using $emp_1[@] or $emp_2[@]. The trouble I am having is I want to create another loop that loops through all the employees. This is what I tried:

Code:

"Employee $emp_count has the values: ${emp_$emp_count[@]}"
I thought it may work with backticks but the following code does not work either:

Code:

"Employee $emp_count has the values: ${emp_1=`$emp_count`[@]}"
Any help would be greatly appreciated!


All times are GMT -5. The time now is 09:34 PM.