Converting a file with Rows and Columns to just Columns
I have a file with entries that look like this:
Pos 148 A 0 C 0 G 0.081985 T 0.918015 207 A 0.021697 C 0.978303 G 0 T 0 I need to convert this to something that looks like: Pos A C G T 148 0 0 0.081985 0.918015 207 0.021697 0.978303 0 0 So, my "Pos" entries are more or less already in a column. However, I need to convert the A, C, G, T rows to columns. Any help would be appreciated. |
What have you tried and where are you such (I mean stuck... stupid table) ?
I think awk would be useful here. |
Question: Does this have something to do with Genomes?
|
I haven't tried anything specific yet. I found a few methods to convert rows to columns, most using sed, but I am at a loss for how apply that to what I am working with. I basically would like to convert the A, C, G, and T rows to columns, then have their entries line up to the positions.
Also, yes, these are nucleotide frequencies at a given position. |
Is this the fixed format of the data? Its repeated EXACTLY like this over and over(allowing for different data obviously)? Like this, in perpetuity:
Code:
Pos |
That is almost exactly how it is, except the "Pos" and "freq" aren't repeated. Here is a copy of lines directly from the file:
Pos freq 148 A 0.000000 C 0.000000 G 0.081985 T 0.918015 207 A 0.021697 C 0.978303 G 0.000000 T 0.000000 208 A 0.979209 C 0.000000 G 0.020791 T 0.000000 |
Try searching for columns to rows on this site as this has been done multiple times. I would include the key word 'awk' as well as columnized data is better suited to this command than sed
|
And, without diving into a code excercise, and sticking to bash alone:
Requirements:
Code:
while read line1; do #first line pos |
That did the trick. Thank you!
|
hmmmm ... I am curious how that did the trick for you?
Based on the data and format in post #6 and using the code from post #8, the output I get is: Code:
Pos freq 0.000000 0.000000 0.081985 Assuming you altered the snippet provided, maybe you could show your solution that does provide the output you were looking for, so others may benefit :) |
Yea, the file had to be altered slightly for it to work. Based on the requirements szboardstretcher outlined, I removed the first line containing "Pos" and "Freq". I can't attest to whether or not its the most efficient way, but I basically did it in two steps as shown below:
Code:
more +2 oldfile > newfile |
Cheers :)
So now, using your addition, the correct output is: Code:
148 0 0 0.081985 0.918015 |
Grail,. not sure what you are doing..
Edit: You are using the original data. OP provided another snippet further down in the post. Code:
# The data Code:
# the script Code:
# the output |
Just to finesse this a little for posterity's sake, grail has a point about the substring indexing (which actually starts at zero) in the parameter expansions used in the echo command. It works with :1 as the leading blank is skipped.
Using the data in post#6 Code:
tail -n +2 data.txt | \ Code:
Pos A C G T |
@szboardstretcher - you are right that I am not sure what was happening?? I have now run the same code at home and all is fine ... go figure ... sorry for the confusion.
I should give a solution for all those shenanigans: Code:
awk '{ORS=/T/?"\n":OFS}!/^$/{print $NF}' file |
All times are GMT -5. The time now is 11:30 AM. |