LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   using variables in awk (http://www.linuxquestions.org/questions/programming-9/using-variables-in-awk-732451/)

vgr12386 06-12-2009 07:26 AM

using variables in awk
 
Hi i would like to add an extra dimension to a question i previously asked...
I'm not quiet sure as to how one uses different variable inside awk.
summary: i have bad data for instance, the use of capital letters in the middle of the word.
I identified the errors, made a list and put it in a file. Some errors are checked for a condition and then depending on the result the 2nd or 3rd value has to replace the actual value in the file.

error_correction.txt

Incorrect,Correct,Maybe
VeNOM,Venom,Venemous
nos,NOS,N2O
.
.
.



My data file looks like this:
data.txt:

vgr,bugatti veron,,3.5,Maybe,6,.......,....
vgr,lamborgini,,3.5,nos,6,.......,....
abc,bugatti veron,,3.5,N20,6,.......,.......
.
.
.
.



I need to replace the terms in the 5th field with that from the list, after checking with an if condition whether to pick out the 2nd column or the 3rd column from the error_correction.txt file.
How do i do this using awk??

Reference to previous question:
http://www.linuxquestions.org/questi...-field-730433/

colucix 06-12-2009 07:37 AM

Please, show us the awk code you're using now. Also, what is the condition to choose the 2nd or the 3rd field from error_correction.txt?

vgr12386 06-12-2009 08:10 AM

currently this is the simplest form of the code: [http://www.linuxquestions.org/questi...field-730433/]

current code:

awk -F"," 'FNR==NR{a[$1]=$2;next}
( $5 in a ){
$5=a[$5]; #This assigns the value from the 2nd column of the error_correction file
}' error_correction file


i want to add this condition to the awk code:

{
x=substr($1,2,5);
}
if ( x == "JB007" )
{
#Assign the value from the 3rd column of the error_correction file
}
else
{
$5=a[$5];#Assign the value from the 2nd column of the error_correction file
}


I think that i may have to use another variable like b[$1]=$3 after the NR but i'm not quiet sure how to loop it up

vgr12386 06-12-2009 09:16 AM

Is it possible to use functions in awk?

druuna 06-12-2009 09:25 AM

Hi,

Yes you can.

Take a look here: gawk manual - 8.2 User-Defined Functions

vgr12386 06-12-2009 09:42 AM

cool.....
n hey is it possible to use a switch case as well?
i tried the --enable-switch but it didn't work....

druuna 06-12-2009 09:45 AM

Hi again,

From the same manual (!!): 6.4.5 The switch Statement

You could have found that one yourself ;)

vgr12386 06-12-2009 09:54 AM

yup :)
i saw it n tried it out but it didn't work :(
it had something about it working only on version 3.1.3 for gawk.
Do u know of any other way?

colucix 06-12-2009 01:37 PM

If I correctly interpret your requirements (as explained in post #3 and in your previous thread) following the code posted by ghostdog74, this should do the trick:
Code:

awk 'BEGIN{ FS=","; OFS=","
}

FNR == NR {a[$1] = $2
          b[$1] = $3
        next
}

( $5 in a ){
  if ( substr($1,2,5) == "JB007" )
    $5 = b[$5]
  else
    $5 = a[$5]
}

FNR < NR' error_correction.txt input_file > output_file

Check it on your real example and look at the official manual suggested by druuna to correctly interpret this code.

colucix 06-12-2009 01:39 PM

Quote:

Originally Posted by vgr12386 (Post 3571789)
yup :)
i saw it n tried it out but it didn't work :(
it had something about it working only on version 3.1.3 for gawk.
Do u know of any other way?

As you've read, it is an experimental feature which is not enabled by default in previous versions. If you want to try it, you have to compile gawk from source adding --enable-switch in the configure step or eventually update to a more recent version of gawk. Anyway, I don't think you really need it, unless your requirements changed again! ;)

vgr12386 06-15-2009 04:47 AM

hey colicix,
guess what im back :D
hey im still not quiet familiar with awk!
im not sure how to compare the values of two files and pick up various columns separately.

In the code that you wrote,
Quote:

Code:

Code:

awk 'BEGIN{ FS=","; OFS=","
}

FNR == NR {a[$1] = $2
          b[$1] = $3
        next
}

( $5 in a ){ <-- what if there are multiple similar values?? like there are 3 entries of nos present in the error correction file along with an exrta column which is to matched with the data file?
  if ( substr($1,2,5) == "JB007" )
    $5 = b[$5]
  else
    $5 = a[$5]
}

FNR < NR' error_correction.txt input_file > output_file



vgr12386 06-15-2009 08:00 AM

any one around???

Code:

( $5 in a ){
  if ( substr($1,2,5) == "JB007" )
    $5 = b[$5]
  else
    $5 = a[$5]
}

If $5 occurs more than once in a, how do i make it loop to search for the second occurrence?

vgr12386 06-17-2009 08:07 AM

knock knock

crabboy 06-23-2009 08:48 AM

vgr, you have 3 threads running regarding the same awk problems, perhaps you should check your other threads for replies and ask new questions there.

vgr12386 06-24-2009 04:19 AM

Quote:

Originally Posted by crabboy (Post 3583372)
vgr, you have 3 threads running regarding the same awk problems, perhaps you should check your other threads for replies and ask new questions there.

Well the questions are different it's just that i have used similar data.

All i wanted to know was how to loop through repeated values in 2 separate columns present in two different files.
What was happening in the case above was that it checked for the first occurrence only and then exited the loop :(


All times are GMT -5. The time now is 06:53 AM.