Addition of characters to column in tab file
Guys, I have a tab delimited file. Multiple columns.
Need to add the characters chr to the 3rd column which already has a 2 digit number there. Is there a command line that I can use to add in the characters 'chr' in front of the digits in every row in the 3rd column? Thanks, Julian |
It would help to see a few before and after example rows, but its definitely do-able.
|
Here's what I want to do. Before: (First 5 lines of 70+ million lines)
SRR036740.6 WICMT-SOLEXA2_FC20A5VAAXX:4:1:880:684 length=26 + 2 115333875 TTACAATAAGGAGAAAGGTGCATCTG IIIIIIIIIIIIIIIIIIIIIIIIII SRR036740.1 WICMT-SOLEXA2_FC20A5VAAXX:4:1:875:740 length=26 + 2 166030789 TATCGAGTCTCTTTTCAAAGCATTCA IIIIIIII.IIIIIII>BI$@0GIII SRR036740.4 WICMT-SOLEXA2_FC20A5VAAXX:4:1:877:779 length=26 - 17 84738197 TATTACATTCCCTCTTACAGACAAAA HI29IDI<IIEIIIIIIIII&IIIII SRR036740.3 WICMT-SOLEXA2_FC20A5VAAXX:4:1:884:705 length=26 - 5 129917611 TATATAATTCCAAATTTAGGCCTAAA IIIIIIIIIIIIIIIIIIIIIIIIII SRR036740.9 WICMT-SOLEXA2_FC20A5VAAXX:4:1:876:896 length=26 + 7 14454197 TGAGTTTGTTTATATCGTGAATTATG IIII%IIIIII)I'I%@?=;I4I@I7 After: SRR036740.6 WICMT-SOLEXA2_FC20A5VAAXX:4:1:880:684 length=26 + chr2 115333875 TTACAATAAGGAGAAAGGTGCATCTG IIIIIIIIIIIIIIIIIIIIIIIIII SRR036740.1 WICMT-SOLEXA2_FC20A5VAAXX:4:1:875:740 length=26 + chr2 166030789 TATCGAGTCTCTTTTCAAAGCATTCA IIIIIIII.IIIIIII>BI$@0GIII SRR036740.4 WICMT-SOLEXA2_FC20A5VAAXX:4:1:877:779 length=26 - chr17 84738197 TATTACATTCCCTCTTACAGACAAAA HI29IDI<IIEIIIIIIIII&IIIII SRR036740.3 WICMT-SOLEXA2_FC20A5VAAXX:4:1:884:705 length=26 - chr5 129917611 TATATAATTCCAAATTTAGGCCTAAA IIIIIIIIIIIIIIIIIIIIIIIIII SRR036740.9 WICMT-SOLEXA2_FC20A5VAAXX:4:1:876:896 length=26 + chr7 14454197 TGAGTTTGTTTATATCGTGAATTATG IIII%IIIIII)I'I%@?=;I4I@I7 The chr* column is actually a tab delimited column on its own with 1 or 2 digits (column #3) but somehow it does not show up here in the text. I want to add the characters 'chr' in front of each digit pair in that column. Hope this clears up any confusion. Thanks! |
Your sample input file has blank-delimited fields so I worked with that, rather than tabs.
Try this ... Code:
awk -F " " '{print $1,$2,$3,$4,"chr",$5,$6,$7,$8,$9}' $InFile > $OutFile |
This proposed solution relies on columnar consistency rather than field separator characters.
Try this ... Code:
sed -r 's/(.{62})/\1 chr /' $InFile >$OutFile |
To show layout accurately, use the CODE tags as described here https://www.linuxquestions.org/quest...do=bbcode#code
|
Quote:
Code:
awk -F " " '{print $1,$2,$3,$4,"chr"$5,$6,$7,$8,$9}' $InFile > $OutFile |
I am curious why the awk had to be so complicated?
Code:
awk '$5="chr"$5' file |
Quote:
Code:
~$ awk '{if($5="chr"$5) print $0}' infile.txt |
Did you try it? Worked just fine for me. Also, how does your 'if' make sense? You are assigning a value, not testing it.
|
Quote:
One minor quibble... OP said "The chr* column is actually a tab delimited column on its own ..." Not clear on this. Perhaps he wants chr to stand apart. If so, that is accomplished with a trivial change to your awk. Code:
awk '$5="chr "$5' $InFile >$OutFile |
Thank you for the replies guys. I will try the solutions later this afternoon and report back.
|
Halleluiah! Solved! All your answers helped solve it. Grail's solution was the shortest and simplest ;)
Thanks! |
Quote:
|
When you say your system, is that solaris? I understand it has a different variant of awk, whereas most linux distros are using gawk or similar.
|
All times are GMT -5. The time now is 02:37 AM. |