LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 04-02-2013, 06:25 PM   #1
sawdusted
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Rep: Reputation: Disabled
Addition of characters to column in tab file


Guys, I have a tab delimited file. Multiple columns.
Need to add the characters chr to the 3rd column which already has a 2 digit number there.

Is there a command line that I can use to add in the characters 'chr' in front of the digits in every row in the 3rd column?

Thanks,
Julian
 
Old 04-02-2013, 07:18 PM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,417

Rep: Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397
It would help to see a few before and after example rows, but its definitely do-able.
 
Old 04-02-2013, 08:53 PM   #3
sawdusted
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Original Poster
Rep: Reputation: Disabled
Here's what I want to do. Before: (First 5 lines of 70+ million lines)
SRR036740.6 WICMT-SOLEXA2_FC20A5VAAXX:4:1:880:684 length=26 + 2 115333875 TTACAATAAGGAGAAAGGTGCATCTG IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.1 WICMT-SOLEXA2_FC20A5VAAXX:4:1:875:740 length=26 + 2 166030789 TATCGAGTCTCTTTTCAAAGCATTCA IIIIIIII.IIIIIII>BI$@0GIII
SRR036740.4 WICMT-SOLEXA2_FC20A5VAAXX:4:1:877:779 length=26 - 17 84738197 TATTACATTCCCTCTTACAGACAAAA HI29IDI<IIEIIIIIIIII&IIIII
SRR036740.3 WICMT-SOLEXA2_FC20A5VAAXX:4:1:884:705 length=26 - 5 129917611 TATATAATTCCAAATTTAGGCCTAAA IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.9 WICMT-SOLEXA2_FC20A5VAAXX:4:1:876:896 length=26 + 7 14454197 TGAGTTTGTTTATATCGTGAATTATG IIII%IIIIII)I'I%@?=;I4I@I7

After:
SRR036740.6 WICMT-SOLEXA2_FC20A5VAAXX:4:1:880:684 length=26 + chr2 115333875 TTACAATAAGGAGAAAGGTGCATCTG IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.1 WICMT-SOLEXA2_FC20A5VAAXX:4:1:875:740 length=26 + chr2 166030789 TATCGAGTCTCTTTTCAAAGCATTCA IIIIIIII.IIIIIII>BI$@0GIII
SRR036740.4 WICMT-SOLEXA2_FC20A5VAAXX:4:1:877:779 length=26 - chr17 84738197 TATTACATTCCCTCTTACAGACAAAA HI29IDI<IIEIIIIIIIII&IIIII
SRR036740.3 WICMT-SOLEXA2_FC20A5VAAXX:4:1:884:705 length=26 - chr5 129917611 TATATAATTCCAAATTTAGGCCTAAA IIIIIIIIIIIIIIIIIIIIIIIIII
SRR036740.9 WICMT-SOLEXA2_FC20A5VAAXX:4:1:876:896 length=26 + chr7 14454197 TGAGTTTGTTTATATCGTGAATTATG IIII%IIIIII)I'I%@?=;I4I@I7

The chr* column is actually a tab delimited column on its own with 1 or 2 digits (column #3) but somehow it does not show up here in the text. I want to add the characters 'chr' in front of each digit pair in that column.

Hope this clears up any confusion.

Thanks!

Last edited by sawdusted; 04-02-2013 at 09:04 PM.
 
Old 04-02-2013, 10:04 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,606

Rep: Reputation: 488Reputation: 488Reputation: 488Reputation: 488Reputation: 488
Your sample input file has blank-delimited fields so I worked with that, rather than tabs.

Try this ...
Code:
awk -F " " '{print $1,$2,$3,$4,"chr",$5,$6,$7,$8,$9}' $InFile > $OutFile
Daniel B. Martin
 
Old 04-02-2013, 10:16 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,606

Rep: Reputation: 488Reputation: 488Reputation: 488Reputation: 488Reputation: 488
This proposed solution relies on columnar consistency rather than field separator characters.

Try this ...
Code:
sed -r 's/(.{62})/\1 chr /' $InFile >$OutFile
Daniel B. Martin

Last edited by danielbmartin; 04-02-2013 at 10:38 PM. Reason: Cosmetic improvement
 
Old 04-02-2013, 11:16 PM   #6
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.9, Centos 7.3
Posts: 17,417

Rep: Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397Reputation: 2397
To show layout accurately, use the CODE tags as described here https://www.linuxquestions.org/quest...do=bbcode#code
 
Old 04-03-2013, 12:29 AM   #7
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by danielbmartin View Post
Try this ...
Code:
awk -F " " '{print $1,$2,$3,$4,"chr",$5,$6,$7,$8,$9}' $InFile > $OutFile
It should be like:
Code:
awk -F " " '{print $1,$2,$3,$4,"chr"$5,$6,$7,$8,$9}' $InFile > $OutFile
 
Old 04-03-2013, 02:54 AM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,566

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
I am curious why the awk had to be so complicated?
Code:
awk '$5="chr"$5' file
 
1 members found this post helpful.
Old 04-03-2013, 03:02 AM   #9
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by grail View Post
I am curious why the awk had to be so complicated?
Code:
awk '$5="chr"$5' file
Are you sure, it's working? I think it should be like:
Code:
~$ awk '{if($5="chr"$5) print $0}' infile.txt
 
1 members found this post helpful.
Old 04-03-2013, 06:41 AM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,566

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
Did you try it? Worked just fine for me. Also, how does your 'if' make sense? You are assigning a value, not testing it.
 
Old 04-03-2013, 08:24 AM   #11
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,606

Rep: Reputation: 488Reputation: 488Reputation: 488Reputation: 488Reputation: 488
Quote:
Originally Posted by grail View Post
I am curious why the awk had to be so complicated?
Only one reason: lack of experience. I am still on the awk learning curve. Your solution is simple and effective. Thank you for showing the way.

One minor quibble... OP said "The chr* column is actually a tab delimited column on its own ..."
Not clear on this. Perhaps he wants chr to stand apart.
If so, that is accomplished with a trivial change to your awk.
Code:
awk '$5="chr "$5' $InFile >$OutFile
Daniel B. Martin
 
Old 04-03-2013, 08:40 AM   #12
sawdusted
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Original Poster
Rep: Reputation: Disabled
Thank you for the replies guys. I will try the solutions later this afternoon and report back.
 
Old 04-03-2013, 11:06 AM   #13
sawdusted
LQ Newbie
 
Registered: Dec 2012
Posts: 14

Original Poster
Rep: Reputation: Disabled
Halleluiah! Solved! All your answers helped solve it. Grail's solution was the shortest and simplest

Thanks!

Last edited by sawdusted; 04-03-2013 at 11:47 AM.
 
Old 04-03-2013, 11:10 AM   #14
shivaa
Senior Member
 
Registered: Jul 2012
Location: Grenoble, Fr.
Distribution: Sun Solaris, RHEL, Ubuntu, Debian 6.0
Posts: 1,800
Blog Entries: 4

Rep: Reputation: 286Reputation: 286Reputation: 286
Quote:
Originally Posted by grail View Post
Did you try it? Worked just fine for me. Also, how does your 'if' make sense? You are assigning a value, not testing it.
It actually didn't work on my system (returned no output), but addition of if statement did the job. Anyway, thanks to you.
 
Old 04-03-2013, 02:58 PM   #15
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,566

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
When you say your system, is that solaris? I understand it has a different variant of awk, whereas most linux distros are using gawk or similar.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] column issue with accented characters... masavini Programming 2 10-30-2012 01:58 PM
[SOLVED] print first 2 characters + second column udiubu Linux - Newbie 2 05-22-2012 01:48 PM
compare second column of a file then print the first column of it in a ne fil if true java_girl Linux - Newbie 2 03-16-2012 05:50 AM
Help needed - How to check quality of a specific column in a tab-delimited file? Jason7449 Linux - Newbie 3 03-08-2010 10:36 AM
nawk question - addition of a column dazdaz Programming 1 02-19-2008 12:47 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 08:10 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration