LinuxQuestions.org - awk comparing 2 rows and counting

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - awk comparing 2 rows and counting (https://www.linuxquestions.org/questions/linux-newbie-8/awk-comparing-2-rows-and-counting-4175453765/)

secret

03-12-2013 07:59 AM

awk comparing 2 rows and counting

Hi. I have a file that looks like this

1 1 2 3
2 1 3 2
3 1 2 3
4 2 1 3

and so on. column 1 describes a time and the rest are temperatures that may swap.
Id like to compare in column 2 to 3 line 1 with line 2. If the value is the same count 0 if it differs count 1. Then compare line 2 and 3 and so on till the end. In the end i want to know how many times the value of each line in column 2-4 changed. Id like to use a script in awk. But i got no clue how to define this.
awk '{current = $NF;getline; if($NF == current}print "match";else print "mismatch"}' file
this i found in another thread. It compare the lines and tells if its a match or mismatch. Instead of such an output id like a count at the end of how many mismatches there have been i think.
Thanks for the help :D

millgates

03-12-2013 08:07 AM

You'll have to "remember" the values from the last line in each iteration. So, for each line:
1) compare the values stored from the last iteration to the values from the current line.
2) if the numbers are different, increment counter
3) copy the values from the current line to the variables so you can access them in the next iteration.

You will also have to think about how to treat the first line.

secret

03-12-2013 08:33 AM

Ok sry for the german. What i said was i think maybe like this?

awk ' {current=$NF ; getline ; if ($NF != current)} print "++; else 0"

Though the 0 could mean that its not adding but writing just zero when there is no change.
The first row cant be compared to anything before so i guess the value should be 0? Ive never programmed anything before so im kinda confused :/

maybe more like this?

awk '{current=$NF ; getline ; if($NF!=current{count++})} print "count"

grail

03-12-2013 09:12 AM

Ok ... so putting that into google translate helped a little :)

So getline is not needed at all. NR is the current line count so this could be used to know when at line 1 or elsewhere.

When needing to print something when you are finished getting your data you need to investigate the END{} clause.

Here is the link to the manual online which I recommend reading:

http://www.gnu.org/software/gawk/man...ode/index.html

Read over millgates information again and use the page above it should be fairly straight forward.

secret

03-12-2013 10:55 AM

as i understand
current=$NF defines that the line read at the moment is stored as a variable called NF so when the next line is read it can compare the new current line with the line before.
then i have to tell it to actually compare by $NF!=current and somehow tell it if that is true count +1. ++ is the same as +1 right?
Then i want to keep track of the total count and get the total number printed. I dont need to know which matched and which didnt i only need a total count.
Also it would be nice if this could be done in one step for each column so i get a count for each column. To be honest i do understand what should be done even before i new what awk was BUT even with the manual it doesnt say anything about counts (not that i saw anything). You have to understand i have never programed anything before. So aside from the getline how should my idea be modified?
should look like this atm:

awk '{current=$NF , if($NF != current {count++})} END{print count}'

but this still doesnt answer how the first line should be treated nor if this works for each column individually

grail

03-12-2013 11:12 AM

Quote:

current=$NF defines that the line read at the moment is stored as a variable called NF so when the next line is read it can compare the new current line with the line before.

Incorrect. NF is the Number of Fields in a row which is determined by the FS (Field Separator), which as you have not changed it is the default of any contiguous white space.
So in the example, current is being set to the value of whatever is stored in the last field. So in your example data the first line would store the number 3 in current as it is the last field.

Quote:

++ is the same as +1 right?

Correct:

Code:

count++

# is the same as

count = count + 1

So you are kind of on the right track but need to see my information above about NF.

Assuming your syntax was right, which it currently is not, if you issue the following:

Code:

awk 'NR > 1 && $NF != current{count++}{current = $NF}NR == 1{next}END{print count}' file

See if that helps you in the correct direction.

secret

03-12-2013 11:35 AM

My Tutor pointed out i should just try things step by step and gave me the same hint as you about NF (>.<)
So i changed it slowly by trying and came to

awk 'BEGIN {count=0;var=0}{if ($2!=var) count++; var=$2} END {print count-1}' input

and it worked yeahi. My head hurts.
So thank you all very much for your patience :D

chrism01

03-12-2013 08:36 PM

Quote:

My Tutor pointed out i should just try things step by step ..

and that is the secret of successful programming, unless you are already highly experienced, in which case you can often get away with larger changes :)

All times are GMT -5. The time now is 04:39 AM.