LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)
-   -   Addition of characters to column in tab file (https://www.linuxquestions.org/questions/linux-newbie-8/addition-of-characters-to-column-in-tab-file-4175456593/)

sawdusted 04-02-2013 05:25 PM

Addition of characters to column in tab file
 
Guys, I have a tab delimited file. Multiple columns.
Need to add the characters chr to the 3rd column which already has a 2 digit number there.

Is there a command line that I can use to add in the characters 'chr' in front of the digits in every row in the 3rd column?

Thanks,
Julian

chrism01 04-02-2013 06:18 PM

It would help to see a few before and after example rows, but its definitely do-able.

sawdusted 04-02-2013 07:53 PM

Here's what I want to do. Before: (First 5 lines of 70+ million lines)
SRR036740.6 WICMT-SOLEXA2_FC20A5VAAXX:4:1:880:684 length=26 + 2 115333875 TTACAATAAGGAGAAAGGTGCATCTG IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.1 WICMT-SOLEXA2_FC20A5VAAXX:4:1:875:740 length=26 + 2 166030789 TATCGAGTCTCTTTTCAAAGCATTCA IIIIIIII.IIIIIII>BI$@0GIII
SRR036740.4 WICMT-SOLEXA2_FC20A5VAAXX:4:1:877:779 length=26 - 17 84738197 TATTACATTCCCTCTTACAGACAAAA HI29IDI<IIEIIIIIIIII&IIIII
SRR036740.3 WICMT-SOLEXA2_FC20A5VAAXX:4:1:884:705 length=26 - 5 129917611 TATATAATTCCAAATTTAGGCCTAAA IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.9 WICMT-SOLEXA2_FC20A5VAAXX:4:1:876:896 length=26 + 7 14454197 TGAGTTTGTTTATATCGTGAATTATG IIII%IIIIII)I'I%@?=;I4I@I7

After:
SRR036740.6 WICMT-SOLEXA2_FC20A5VAAXX:4:1:880:684 length=26 + chr2 115333875 TTACAATAAGGAGAAAGGTGCATCTG IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.1 WICMT-SOLEXA2_FC20A5VAAXX:4:1:875:740 length=26 + chr2 166030789 TATCGAGTCTCTTTTCAAAGCATTCA IIIIIIII.IIIIIII>BI$@0GIII
SRR036740.4 WICMT-SOLEXA2_FC20A5VAAXX:4:1:877:779 length=26 - chr17 84738197 TATTACATTCCCTCTTACAGACAAAA HI29IDI<IIEIIIIIIIII&IIIII
SRR036740.3 WICMT-SOLEXA2_FC20A5VAAXX:4:1:884:705 length=26 - chr5 129917611 TATATAATTCCAAATTTAGGCCTAAA IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.9 WICMT-SOLEXA2_FC20A5VAAXX:4:1:876:896 length=26 + chr7 14454197 TGAGTTTGTTTATATCGTGAATTATG IIII%IIIIII)I'I%@?=;I4I@I7

The chr* column is actually a tab delimited column on its own with 1 or 2 digits (column #3) but somehow it does not show up here in the text. I want to add the characters 'chr' in front of each digit pair in that column.

Hope this clears up any confusion.

Thanks!

danielbmartin 04-02-2013 09:04 PM

Your sample input file has blank-delimited fields so I worked with that, rather than tabs.

Try this ...
Code:

awk -F " " '{print $1,$2,$3,$4,"chr",$5,$6,$7,$8,$9}' $InFile > $OutFile
Daniel B. Martin

danielbmartin 04-02-2013 09:16 PM

This proposed solution relies on columnar consistency rather than field separator characters.

Try this ...
Code:

sed -r 's/(.{62})/\1 chr /' $InFile >$OutFile
Daniel B. Martin

chrism01 04-02-2013 10:16 PM

To show layout accurately, use the CODE tags as described here https://www.linuxquestions.org/quest...do=bbcode#code

shivaa 04-02-2013 11:29 PM

Quote:

Originally Posted by danielbmartin (Post 4923930)
Try this ...
Code:

awk -F " " '{print $1,$2,$3,$4,"chr",$5,$6,$7,$8,$9}' $InFile > $OutFile

It should be like:
Code:

awk -F " " '{print $1,$2,$3,$4,"chr"$5,$6,$7,$8,$9}' $InFile > $OutFile

grail 04-03-2013 01:54 AM

I am curious why the awk had to be so complicated?
Code:

awk '$5="chr"$5' file

shivaa 04-03-2013 02:02 AM

Quote:

Originally Posted by grail (Post 4924066)
I am curious why the awk had to be so complicated?
Code:

awk '$5="chr"$5' file

Are you sure, it's working? I think it should be like:
Code:

~$ awk '{if($5="chr"$5) print $0}' infile.txt

grail 04-03-2013 05:41 AM

Did you try it? Worked just fine for me. Also, how does your 'if' make sense? You are assigning a value, not testing it.

danielbmartin 04-03-2013 07:24 AM

Quote:

Originally Posted by grail (Post 4924066)
I am curious why the awk had to be so complicated?

Only one reason: lack of experience. I am still on the awk learning curve. Your solution is simple and effective. Thank you for showing the way.

One minor quibble... OP said "The chr* column is actually a tab delimited column on its own ..."
Not clear on this. Perhaps he wants chr to stand apart.
If so, that is accomplished with a trivial change to your awk.
Code:

awk '$5="chr "$5' $InFile >$OutFile
Daniel B. Martin

sawdusted 04-03-2013 07:40 AM

Thank you for the replies guys. I will try the solutions later this afternoon and report back.

sawdusted 04-03-2013 10:06 AM

Halleluiah! Solved! All your answers helped solve it. Grail's solution was the shortest and simplest ;)

Thanks!

shivaa 04-03-2013 10:10 AM

Quote:

Originally Posted by grail (Post 4924182)
Did you try it? Worked just fine for me. Also, how does your 'if' make sense? You are assigning a value, not testing it.

It actually didn't work on my system (returned no output), but addition of if statement did the job. Anyway, thanks to you.

grail 04-03-2013 01:58 PM

When you say your system, is that solaris? I understand it has a different variant of awk, whereas most linux distros are using gawk or similar.


All times are GMT -5. The time now is 02:37 AM.