LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Converting columns to lines using AWK (https://www.linuxquestions.org/questions/programming-9/converting-columns-to-lines-using-awk-911677/)

bldcerealkiller 11-03-2011 10:39 AM

Converting columns to lines using AWK
 
Hi everybody,

I need to convert columns into rows in my file using awk.

The file looks like:

6 5 7 8
6 5 7 8
6 5 7 8

The output should be like this:
6 6 6
5 5 5
7 7 7
8 8 8

or this

6 6 6 5 5 5 7 7 7 8 8 8

Thanks in advance for your reply and sorry if this is a repost.

Cheers

Nominal Animal 11-03-2011 12:01 PM

For each record, loop over the fields, appending each field to an array indexed by the field number. Remember the maximum number of fields, so you'll know how many to print later on. Do not print anything yet. Then, in an end rule, print each value of the array separately:
Code:

awk '{ for (i = 1; i <= NF; i++) f[i] = f[i] " " $i ;
      if (NF > n) n = NF }
 END { for (i = 1; i <= n; i++) sub(/^  */, "", f[i]) ;
      for (i = 1; i <= n; i++) print f[i] }
    ' infile >outfile

The first loop in the end rule removes the superfluous leading spaces; it is simpler than not adding the leading space. I added extra semicolons, so you can put the entire scriptlet on one line if you wish.

Oh, and this seems to work for left-aligned triangular matrices too.

David the H. 11-03-2011 03:00 PM

The gawk user guide has an example script that does exactly this, here:

http://www.gnu.org/software/gawk/man...mensional.html


PS: Please use [code][/code] tags around your code (including example text), to preserve formatting and to improve readability.

bldcerealkiller 11-04-2011 06:38 AM

Quote:

Originally Posted by Nominal Animal (Post 4515009)
For each record, loop over the fields, appending each field to an array indexed by the field number. Remember the maximum number of fields, so you'll know how many to print later on. Do not print anything yet. Then, in an end rule, print each value of the array separately:
Code:

awk '{ for (i = 1; i <= NF; i++) f[i] = f[i] " " $i ;
      if (NF > n) n = NF }
 END { for (i = 1; i <= n; i++) sub(/^  */, "", f[i]) ;
      for (i = 1; i <= n; i++) print f[i] }
    ' infile >outfile

The first loop in the end rule removes the superfluous leading spaces; it is simpler than not adding the leading space. I added extra semicolons, so you can put the entire scriptlet on one line if you wish.

Oh, and this seems to work for left-aligned triangular matrices too.


Thank you very much for the answer my friend but it seems that I'm gettin problems with the last column. this is the output using your script:

6 6 6
5 5 5
7 7 7
8

Any suggestion?

P.s. the example in gawk user guide orders the rows in the opposite direction..but I guess I can find a way to modify that.

David the H. 11-04-2011 08:11 AM

Quote:

Originally Posted by bldcerealkiller (Post 4515640)
P.s. the example in gawk user guide orders the rows in the opposite direction..but I guess I can find a way to modify that.

When I run that script on the example text you posted, I get exactly the output you asked for; a clockwise quarter-turn. What do you get?

If you need it to do something different than what you originally requested, then you need to clarify that.

Edit: Ah, maybe I see it now. You aren't just rotating the array, you need each top-to-bottom column to become a left-to-right row, is that it? It would've been clearer if you'd used different numbers for each row.

Modifying the second loop in the END section to count up instead of down appears to do that.
change this...
Code:

for (y = max_nr; y >= 1; --y)
...to this:
Code:

for (y = 1; y <= max_nr; y++)
When I make the above change, this...
Code:

6 7 8 9
5 6 7 8
4 5 6 7

...becomes this:
Code:

6 5 4
7 6 5
8 7 6
9 8 7


bldcerealkiller 11-04-2011 08:21 AM

Quote:

Originally Posted by David the H. (Post 4515710)
When I run that script on the example text you posted, I get exactly the output you asked for; a clockwise quarter-turn. What do you get?

If you need it to do something different than what you originally requested, then you need to clarify that.

Yes you're right having the same numbers that script would work.
The problem is that if the input file has this format
1 2 3 4
5 6 7 8
9 10 11 12
the result with that script is
9 5 1
10 6 2
11 7 3
12 8 4
while I'd like to have
1 5 9
2 6 10
3 7 11
4 8 12

In conclusion, I needed a script to convert columns to rows not to make a clockwise 90° turn :)
Anyway, thanks for your support

grail 11-04-2011 09:41 AM

Seems to work perfectly. May I ask if the file containing your data was created on Windows? If so, try running dos2unix over it first and then see what your results are.

bldcerealkiller 11-04-2011 09:47 AM

Quote:

Originally Posted by grail (Post 4515779)
Seems to work perfectly. May I ask if the file containing your data was created on Windows? If so, try running dos2unix over it first and then see what your results are.

Are you referring to the first script? anyway, I'm using a file previously created with awk on unix.

grail 11-04-2011 11:06 AM

I am referring to the file with numbers in it ... was it created in windows? As I said, the code provides the exact output you are requesting when I run it.

bldcerealkiller 11-04-2011 11:39 AM

I've started again from the beginning and now it's working!
Thanks everybody for your support!

Nominal Animal 11-04-2011 02:02 PM

Good. I could not reproduce any of your problems using my script at all. For me, it always yields the correct output, me every time. I even tried different awk variants, and files missing a final newline.

If you happen to have data files created in non-Linux/UNIX systems, you might wish to use
Code:

env LANG=C LC_ALL=C awk '
BEGIN { RS="[\t\n\v\f\r ]*[\n\r][\t\n\v\f\r ]*" ; FS="[\t\v\f ]+" ; SP=" " ; NL="\n" }
      { for (i = 1; i <= NF; i++) f[i] = f[i] SP $i ;
        if (NF > n) n = NF }
  END { for (i = 1; i <= n; i++) sub(/^  */, "", f[i]) ;
        for (i = 1; i <= n; i++) printf("%s%s", f[i], NL) }
      ' infile >outfile

The env command runs the awk script using the C (or POSIX) locale. Most Linux distributions use an UTF-8 locale by default, and at least GNU awk stops processing if it sees a non-UTF8 sequence in the input. Explicitly setting the locale avoids the issue totally. Explicitly using env means the above form will work regardless of the shell you are using.

In the input, the BEGIN rule sets new record separator (RS) and new field separator (FS). The record separator is any ASCII whitespace, including any type of newlines, that contains at least one newline (linefeed or carriage return). The field separator is any ASCII whitespace, not including newlines.

In the output, the SP (space, above) defines the separator between columns, and NL (newline, above) defines the separator between rows. These are also defined in the BEGIN rule.

Note that the script does not require the values to be numbers. It reads and writes each input token (word) as-is, without trying to parse them at all. Other than the env command setting the locale explicitly, and the BEGIN rule, the script is still the same as before.


All times are GMT -5. The time now is 04:38 PM.