LinuxQuestions.org - [SOLVED] Awk Problem in Deleting Fields from Lines

- Programming (https://www.linuxquestions.org/questions/programming-9/)

- - Awk Problem in Deleting Fields from Lines (https://www.linuxquestions.org/questions/programming-9/awk-problem-in-deleting-fields-from-lines-913685/)

Physicsphdsophia

11-15-2011 11:46 AM

Awk Problem in Deleting Fields from Lines

Hi Everyone,

I am a newbie to this forum :).
I am stuck with some awk programming.
Basically, I have a huge file of data (1000*16384).

My objective is that for each
impair line (so line 1, 3, ...; lines starting with 1), I want to delete the entries that are smaller than say 0.1 . Furthermore, I also want to delete the entries in the pair lines (i.e. line 2, 4, ...) that occupy the index positions of the deleted entries of the previous line (so if entry/field i=5 was deleted in line 1 because it is smaller than 0.1, then I want the entry/field i=5 in line 2 to be also deleted regardless of its value). Here is an attempt

awk '{ for (j=1; j<=NR; j=j+2) { for (i=1; i<=NF; i++) { { if ($i<0.1) { sub($i,"") ; c = i ; NR == k} }; NR==k+1 ; sub($c,"") }} } } } print }' dataFile1.output > dataFile2.output

thanks in advance

colucix

11-15-2011 12:01 PM

Please, can you post an example of input and the related output you want to obtain? It would be far more clear what the problem is. Anyway, I see your code doesn't take advantage of the awk power. The loop:

Code:

for (j=1; j<=NR; j=j+2) ...

means that for each line of input it cycle over the odd numbers from 1 to the line number read so far with step 2, but it doesn't cycle over the lines themselves. Awk reads one line at a time and execute all the rules enclosed in brackets on every line. To distinguish between odd and even lines, you might do something like:

Code:

NR % 2 == 0 {

  #

  # This is an even line

  #

}

NR % 2 == 1 {

  #

  # This is an odd line

  #

}

Physicsphdsophia

11-15-2011 12:09 PM

Thanks colucix for your quick reply.

Yes, here is the sort of input / output I mean
input

0.20 0.30 0.05 0.22 0.12 0.07 0.08 0.14...
20.8 20.6 20.4 20.2 20.0 19.8 19.6 19.4
0.16 0.25 0.31 0.02 0.19 0.04 0.28 0.12
20.8 20.6 20.4 20.2 20.0 19.8 19.6 19.4
...

output
0.20 0.30 0.22 0.12 0.14...
20.8 20.6 20.2 20.0 19.4
0.16 0.25 0.31 0.19 0.28 0.12
20.8 20.6 20.4 20.0 19.6 19.4
...

colucix

11-15-2011 12:18 PM

Good. I would try something like this: odd lines: print the fields > 0.1 and store (remember) the index of the printed fields; even lines: print only the fields in the list of indexes stored above. Translated in awk:

Code:

NR % 2 == 1 {

  #

  # This is an odd line

  #

  for ( i = 1; i <= NF; i++ )

    if ( $i > 0.1 ) {

      printf "%s ", $i

      #

      #  We want to preserve the i-th field in the next (even) line, so

      #  we store it as index of the array _

      #

      _[i]++

    }

  printf "\n"

}

NR % 2 == 0 {

  #

  # This is an even line

  #

  for ( i = 1; i <= NF; i++ )

    if ( i in _ )

      printf "%s ", $i

  printf "\n"

  #

  #  Forget about previously stored indexes now!

  #

  delete _

}

The delete statement is mandatory, so that after every pair of lines the stored indexes are forgotten and the _ is recreated upon reading the next line. Hope this helps.

Physicsphdsophia

11-17-2011 09:26 AM

Thanks a lot colucix, it worked!

All times are GMT -5. The time now is 12:38 AM.