awk multiple column into single column

MTK358 · 07-16-2010, 07:46 AM

Code:

#!/usr/bin/env perl

open my $file, $ARGV[1] or die "Could not open file";
while (<$file>) # iterate over the lines
{
    s/D/e/g;
    for (split) # iterate over the columns in the current line
    {
        $_ *= 219474.6306726;
        print "$_\n";
    }
}
close $file;

I have no idea if this will work, I'm still a bit new to Perl, but I couldn't resist

.

colucix · 07-16-2010, 07:59 AM

Quote:

Originally Posted by MTK358

I have no idea if this will work, I'm still a bit new to Perl, but I couldn't resist

.

Nice attempt! I'm not a perl expert, too... but I'm afraid it parses the file row by row, whereas we need a trick to parse by columns.

grail · 07-16-2010, 08:00 AM

Well I was hoping you would work the line number issue out, but it you were on the right track as to use NR:

Code:

awk '{for(i=1;i<=NF;i++)if(arr[i] ~ /./)arr[i]=arr[i]"\n"NR" "$i;else arr[i]=NR" "$i}END{for(x=1;x<=length(arr);x++)printf("%s\n",arr[x])}' in_file

I am not sure how this affect your double precision.

btw. I realise you were giving us a dumbed down example, but as you can see for future questions, dealing with floating point precision can make a large impact
if not known about earlier

Edit: Also, maybe you could show us some actual input? (changed to protect the data of course if necessary)

colucix · 07-16-2010, 08:27 AM

Hi grail! Your code is neat as always, but why not using just a counter to print the line number? Just another suggestion (a lot of work for the original poster

) but I would end-up with something like this:

Code:

BEGIN { factor = 219474.6306726 }

{
  gsub(/D/,"E")
  
  for ( i=1; i<=NF; i++ )
     array[++count] = $i
}

END {
  for ( i=1; i<=count/NR; i++ ) {
    for ( j=0; j<NR; j++ ) {
      print ++c, array[i+j*NF] * factor
    }
  }
}

MTK358 · 07-16-2010, 09:12 AM

Quote:

Originally Posted by colucix

Nice attempt! I'm not a perl expert, too... but I'm afraid it parses the file row by row, whereas we need a trick to parse by columns.

No, the while loop goes through the file line by line, all instances of 'D' are replaced with 'e', and the for loop goes through the current line column by column.

It makes heavy use of the implicit $_ variable.

grail · 07-16-2010, 09:18 AM

Quote:

Originally Posted by colucix

but why not using just a counter to print the line number?

Hey Colucix

Based on the OPs original request they wanted the line numbers to equal the line the column came from as opposed to where your code has an ever increasing
count, ie if there are 3 rows and 3 columns it would be

1 a1
2 a2
3 a3
1 b1
2 b2
...

grail · 07-16-2010, 09:40 AM

So after a little plagiarism from colucix (hope you don't mind

):

Code:

BEGIN { factor = 219474.6306726 }

{
    gsub(/D/,"E")

    for(i=1;i<=NF;i++)
    {
        if(arr[i])
            arr[i]=arr[i]"\n"

        arr[i]=arr[i]NR" " $i * factor
    }
}

END{
    for(x=1;x<=NF;x++)
        print arr[x]
}

colucix · 07-16-2010, 10:08 AM

Quote:

Originally Posted by grail

Based on the OPs original request they wanted the line numbers to equal the line the column came from as opposed to where your code has an ever increasing count

Uh, sorry... I missed that part. A little modification to my previous code:

Code:

BEGIN { factor = 219474.6306726 }

{
  gsub(/D/,"E")
  
  for ( i=1; i<=NF; i++ )
     array[++count] = $i
}

END {
  for ( i=1; i<=count/NR; i++ ) {
    for ( j=0; j<NR; j++ ) {
      print j+1, array[i+j*NF] * factor
    }
  }
}

Quote:

Originally Posted by grail

So after a little plagiarism from colucix (hope you don't mind ):

That's about it!

grail · 07-16-2010, 10:29 AM

@colucix - I still like the simplicity of your sequence, but I forgot to ask ... what does the pipe into 'n1' do?

Edit: Ignore last part ... I am an idiot .. it is the letter 'l' ...

colucix · 07-16-2010, 10:38 AM

Quote:

Originally Posted by MTK358

No, the while loop goes through the file line by line, all instances of 'D' are replaced with 'e', and the for loop goes through the current line column by column.

It makes heavy use of the implicit $_ variable.

Yup. But it should not print the splitted fields immediately, otherwise you get alternate output as you read it from left to right, e.g.

Code:

$ cat test.pl
#!/usr/bin/env perl
open $file, $ARGV[0] or die "Could not open file: $!";
while (<$file>) {
    s/D/e/g;
    for (split)
    {
        $_ *= 219474.6306726;
        print "$_\n";
    }
}
close $file or die "$file: $!";
$ cat file
1.0000D-05  2.0000D-05  3.0000D-05
1.0000D-05  2.0000D-05  3.0000D-05
1.0000D-05  2.0000D-05  3.0000D-05
$ ./test.pl file
2.194746306726
4.389492613452
6.584238920178
2.194746306726
4.389492613452
6.584238920178
2.194746306726
4.389492613452
6.584238920178
$

whereas it should be

Code:

2.194746306726
2.194746306726
2.194746306726
4.389492613452
4.389492613452
4.389492613452
6.584238920178
6.584238920178
6.584238920178

colucix · 07-16-2010, 10:40 AM

Quote:

Originally Posted by grail

Edit: Ignore last part ... I am an idiot .. it is the letter 'l' ...

He he!

Anyway, as you pointed out, it was wrong in this context.

grail · 07-16-2010, 10:47 AM

Quote:

Originally Posted by colucix

it was wrong in this context

Yeah but a little ++counter fixed that (mainly cause I couldn't workout how to use the sequence numbers again as the counter

)

Code:

eval cat $(seq -f "<(awk '{sub(/D/,\"E\"); print ++n\" \"$%.0f''*''219474.6306726}' file)" 1 6)

colucix · 07-16-2010, 11:05 AM

Great!

Just a little note (then I will stop... promised

): why do you use a quoted space in the print statement? Comma separated arguments will be printed out as OFS separated strings. I find the comma more elegant and quick to type.

MTK358 · 07-16-2010, 11:36 AM

Quote:

Originally Posted by colucix

Yup. But it should not print the splitted fields immediately, otherwise you get alternate output as you read it from left to right, e.g.

Code:

$ cat test.pl
#!/usr/bin/env perl
open $file, $ARGV[0] or die "Could not open file: $!";
while (<$file>) {
    s/D/e/g;
    for (split)
    {
        $_ *= 219474.6306726;
        print "$_\n";
    }
}
close $file or die "$file: $!";
$ cat file
1.0000D-05  2.0000D-05  3.0000D-05
1.0000D-05  2.0000D-05  3.0000D-05
1.0000D-05  2.0000D-05  3.0000D-05
$ ./test.pl file
2.194746306726
4.389492613452
6.584238920178
2.194746306726
4.389492613452
6.584238920178
2.194746306726
4.389492613452
6.584238920178
$

whereas it should be

Code:

2.194746306726
2.194746306726
2.194746306726
4.389492613452
4.389492613452
4.389492613452
6.584238920178
6.584238920178
6.584238920178

I didn't realize that.

Code:

#!/usr/bin/env perl

open my $file, $ARGV[1] or die "Could not open file";
for my $col (0 .. 2)
{
    seek $file, 0, SEEK_SET;
    
    while (<$file>)
    {
        my @cols = split;
        $_ = $cols[$col];
        s/D/e/g;
        $_ *= 219474.6306726;
        print "$_\n";
    }
}
close $file;

colucix · 07-16-2010, 12:58 PM

Nice! It works for me. It misses only the line numbers part.

Edit: just notice that the first argument in perl is $ARGV[0].