LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   awk multiple column into single column (http://www.linuxquestions.org/questions/programming-9/awk-multiple-column-into-single-column-819931/)

ilukacevic 07-15-2010 05:49 AM

awk multiple column into single column
 
Dear all,

I'm new to using awk or similar commands and I hope someone will be able to help me.

I have a multicolumn datas, like

a1 b1 ... f1
a2 b2 ... f2
. . ... .
. . ... .
. . ... .
an bn ... fn

I would like to make a file with all these data in one column, like

a1
a2
.
.
.
an
b1
b2
.
.
.
bn
.
.
.
f1
f2
.
.
.
fn


Can it be done with awk or some other command?
Also, is it possible then do add another column in front of this one with numbers of the lines (for every previous column), like

1 a1
2 a2
. .
. .
. .
n an
1 b1
2 b2
. .
. .
. .
n bn
. .
. .
. .
1 f1
2 f2
. .
. .
. .
n fn


Thank you all in advance!

Igor Lukacevic

stuart_cherrington 07-15-2010 06:35 AM

cat <filename> | awk '{print "1 "$1}' > <newfile>
cat <filename> | awk '{print "1 "$2}' >> <newfile>

repeat for the amount of columns.

You could automate this a little by putting it into a script which works out how many columns are in the file, then repeats the above line until the columns are finished.

Stuart.

H_TeXMeX_H 07-15-2010 07:03 AM

You can also do this within awk, you could make nested for loops using the NR and NF variables to go through and print all the columns. See:
http://www.grymoire.com/Unix/Awk.html

colucix 07-15-2010 07:04 AM

A little more tricky solution without awk:
Code:

eval cat $(seq -f "<(cut -d' ' -f%.0f file)" 1 N)
where you have to substitute the fields in red with the actual values: the file name and the number of columns, respectively. It uses multiple process substitutions to feed the cat command. It assumes that columns in the original file are separated by space.

ilukacevic 07-15-2010 07:49 AM

To stuart_cherrington:

Thanks for the tip! It nicely brings them all into one column. But it adds only number 1 to each row, and not the line numbers 1,...,n (as I draw in my first post).
Can this be amended?


To colucix:
Thanks for the tip! But your command line just gives me a blank terminal window. Nothing else.


To H_TeXMeX_H:
Thanks for the tip! I have a little trouble applying NR and NF loops. I need more time to see if it works.


Thank you all again!

Igor

grail 07-15-2010 07:53 AM

Well I will take the awk challenge :)
Code:

awk '{for(i=1;i<=NF;i++)if(arr[i] ~ /./)arr[i]=arr[i]"\n"$i;else arr[i]=$i}END{for(x=1;x<=length(arr);x++)printf("%s\n",arr[x])}' in_file
Edit: Although there are probably issues if n is very large??

konsolebox 07-15-2010 08:20 AM

For me the solution really depends on the total number of lines or at least the average. Also the problem tastes like C though.

colucix 07-15-2010 09:17 AM

Quote:

Originally Posted by ilukacevic (Post 4033808)
To colucix:
Thanks for the tip! But your command line just gives me a blank terminal window. Nothing else.

Uh, I'm sorry. Maybe it depends on what actually is the field separator in the original file.

grail 07-15-2010 10:29 AM

Worked just fine for me ... thanx for the lesson as always colucix :)

ilukacevic 07-16-2010 04:03 AM

Quote:

Originally Posted by colucix (Post 4033919)
Uh, I'm sorry. Maybe it depends on what actually is the field separator in the original file.

The field separator is just 2 spaces. How does that influence the result?

Anyway, I used a mix of solution fromstuart_cherrington and H_TeXMeX_H and it worked just as I want it. The only thing I still lack is how to automatize several command with one loop. I tried with for loop, but failed. Here are the shell scripts I have.


This one works (there are only 6 columns in the original file, so it's not a problem):
#!/bin/sh

cat trf2_5_band2eps.freq | awk '{print NR,$1}' > trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{print NR,$2}' >> trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{print NR,$3}' >> trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{print NR,$4}' >> trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{print NR,$5}' >> trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{print NR,$6}' >> trf2_5_band2eps_edit3.freq

awk '{print 10,$1,$2}' trf2_5_band2eps_edit3.freq > trf2_5_band2eps_edit2.freq



This one doesn't work. I would appreciate any help with it:
#!/bin/sh

cat trf2_5_band2eps.freq | awk '{print NR,$1}' > trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{for(i=2; i<=6; i++) print NR,$i}' >> trf2_5_band2eps_edit3.freq

awk '{print 10,$1,$2}' trf2_5_band2eps_edit3.freq > trf2_5_band2eps_edit2.freq



Thank you all again!

Igor

grail 07-16-2010 05:58 AM

Quote:

Originally Posted by ilukacevic
The field separator is just 2 spaces. How does that influence the result?

Well of course this will affect the result seeing the delimeter being passed to cut is a single space!!

So did you bother to try my solution? It is number of columns independent.

Quote:

Originally Posted by ilukacevic
This one doesn't work. I would appreciate any help with it:
#!/bin/sh

cat trf2_5_band2eps.freq | awk '{print NR,$1}' > trf2_5_band2eps_edit3.freq
cat trf2_5_band2eps.freq | awk '{for(i=2; i<=6; i++) print NR,$i}' >> trf2_5_band2eps_edit3.freq

awk '{print 10,$1,$2}' trf2_5_band2eps_edit3.freq > trf2_5_band2eps_edit2.freq

Hardly surprising. awk reads a file line by line and patterns / actions are performed on each line
So your second line will print the line number (NR) 5 times along with the corresponding field ($i) from that line

Hence why I have stored the results into an array to be delivered at the END of the script.

I also find it curious why you use 'cat' to pass the contents of a file to a program that reads the contents of a file???

ilukacevic 07-16-2010 08:06 AM

Quote:

Originally Posted by grail (Post 4035104)
Well of course this will affect the result seeing the delimeter being passed to cut is a single space!!

So did you bother to try my solution? It is number of columns independent.

Yes, I did. It's good, but it doesn't print out the line numbers as the first column. I tried adding the NF in printf, but it doesn't work.


Quote:

Originally Posted by grail (Post 4035104)
Hardly surprising. awk reads a file line by line and patterns / actions are performed on each line
So your second line will print the line number (NR) 5 times along with the corresponding field ($i) from that line

I understand the mistake...is there a way around it (using this script)? Maybe smth similar to yours?

Hence why I have stored the results into an array to be delivered at the END of the script.

Quote:

Originally Posted by grail (Post 4035104)
I also find it curious why you use 'cat' to pass the contents of a file to a program that reads the contents of a file???

I just copied what stuart_cherrington gave me, not thinking about it. Are you saying that it is unnecessary?


Igor

ilukacevic 07-16-2010 08:14 AM

Also, a new issue arrised in the meanwhile. My data are of the format 2.33456D-05 and so on. I the end, I need them multiplied by a constant factor (decimal number 219474.6306726). But when I do that, it seems that awk doesn't understand the format of my data, and gives back incorrect results.

I know that one can ask for a certain format with printf, but how can I tell awk that the input data are of that format, so that it can read them correctly?

thnx

Igor

MTK358 07-16-2010 08:29 AM

Quote:

Originally Posted by ilukacevic (Post 4035208)
Also, a new issue arrised in the meanwhile. My data are of the format 2.33456D-05 and so on. I the end, I need them multiplied by a constant factor (decimal number 219474.6306726).

I don't understand.

Also, why not Perl?

colucix 07-16-2010 08:34 AM

Awk works in double precision but it's not aware of the fortran D notation. You should first convert D to E. Following my previous suggestion (but it can be applied for all the others) you can try something like:
Code:

eval cat $(seq -f "<(awk '{sub(/D/,\"E\"); print $%.0f''*''219474.6306726}' file)" 1 6) | nl
Edit: a little more explicit:
Code:

for i in $(seq 1 6)
do
  awk '{sub(/D/,"E"); print $'$i'*219474.6306726}' file
done | nl



All times are GMT -5. The time now is 03:08 AM.