LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-05-2014, 11:25 PM   #1
Aida
LQ Newbie
 
Registered: Feb 2014
Posts: 6

Rep: Reputation: Disabled
Smile Split string (e.g two allele) into one


Hi everyone. I am newbie. I tried to split two allele into one but it didn't work.I am not sure what is wrong. The allele start from 12th column.


CC CC CC

awk '{for(i=12; i<=NF;++i) -F""; print $i "\n" substr($i, 1, 1)}' no_header.txt > split_allele.txt

awk '{for(i=12; i<=NF;++i) tr"" "\n" ; print $i "\n" substr($i, 1, 1)}' no_header.txt > split_allele.txt

awk '{ split($i,a,""); for(i=12; i<=NF;++i) print a[i]}' no_header.txt > split_allele.txt

awk 'BEGIN{FS=""}{for (i=12;i<=NF;++i) print $i}' no_header.txt > split_allele.txt

Thank you so much
 
Old 02-06-2014, 12:18 AM   #2
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
Can you post an example of the input and what you'd like the output to look like, using CODE Tags https://www.linuxquestions.org/quest...do=bbcode#code
 
1 members found this post helpful.
Old 02-06-2014, 06:25 PM   #3
Aida
LQ Newbie
 
Registered: Feb 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Smile

Hi chrism01,

Thank you for your reply but i am sorry, I dont know how to use the CODE Tags. My input are like this:

rs9629043 C/T chr1 554636 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ CC CC CC CC CC CC CC CC
rs12565286 C/G chr1 711153 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ GG GG GG GG GG CG GG GG
rs12082473 A/G chr1 730720 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ GG GG GG GG GG GG GG GG
rs3094315 A/G chr1 742429 + ncbi_36 gis Illumina1M_single commercial.assay QC+ AA AA AA AA AA AG AA
rs3131972 A/G chr1 742584 + ncbi_36 gis Illumina1M_single commercial.assay QC+ GG GG GG GG GG

And I want to split the two allele starting from 12th columns to the end of field into single allele like this using linux:

C C C C C C C C C C C C C C C C
G G G G G G G G C G G G G G G
G G G G G G G G G G G G G G G
A A A A A A A A A A A G A A
G G G G G G G G G G

Thank you
 
Old 02-06-2014, 08:09 PM   #4
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,429

Rep: Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348
While you are waiting for the 'awk' solution.
Code:
bash-4.2$ cat allele.txt
s9629043 C/T chr1 554636 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ CC CC CC CC CC CC CC CC
rs12565286 C/G chr1 711153 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ GG GG GG GG GG CG GG GG
rs12082473 A/G chr1 730720 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ GG GG GG GG GG GG GG GG
rs3094315 A/G chr1 742429 + ncbi_36 gis Illumina1M_single commercial.assay QC+ AA AA AA AA AA AG AA
rs3131972 A/G chr1 742584 + ncbi_36 gis Illumina1M_single commercial.assay QC+ GG GG GG GG GG

bash-4.2$ cut -d " " -f 11- allele.txt |  while IFS=" " read -a line; do for (( i=0; i<${#line[@]}; i++ )); do echo -n "${line[$i]:0:1} ${line[$i]:1:1} "; done ; echo "";done 
C C C C C C C C C C C C C C C C 
G G G G G G G G G G C G G G G G 
G G G G G G G G G G G G G G G G 
A A A A A A A A A A A G A A 
G G G G G G G G G G
If you want to retain the first fields then:
Code:
bash-4.2$ cut -d " " -f 1-10 allele.txt > allele2.txt
bash-4.2$ cut -d " " -f 11- allele.txt |  while IFS=" " read -a line; do for (( i=0; i<${#line[@]}; i++ )); do echo -n "${line[$i]:0:1} ${line[$i]:1:1} "; done ; echo "";done > allele3.txt
bash-4.2$ paste -d " " allele2.txt allele3.txt
s9629043 C/T chr1 554636 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ C C C C C C C C C C C C C C C C 
rs12565286 C/G chr1 711153 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ G G G G G G G G G G C G G G G G 
rs12082473 A/G chr1 730720 + ncbi_36 gis AffymetrixSNP6 commercial.assay QC+ G G G G G G G G G G G G G G G G 
rs3094315 A/G chr1 742429 + ncbi_36 gis Illumina1M_single commercial.assay QC+ A A A A A A A A A A A G A A 
rs3131972 A/G chr1 742584 + ncbi_36 gis Illumina1M_single commercial.assay QC+ G G G G G G G G G G
 
1 members found this post helpful.
Old 02-06-2014, 08:48 PM   #5
Aida
LQ Newbie
 
Registered: Feb 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Smile

Hi allend,

Thank you for your reply. I have tried your suggestion:

cat no_header.txt > allele.txt
cut -d " " -f 12- allele.txt | while IFS=" " read -a line; do for (( i=0; i<${#line[@]}; i++ )); do echo -n "${line[$i]:0:1} ${line[$i]:1:1} "; done ; echo ""; done> allele2.txt

but it give me this output:

r s
r s
r s
r s
r s
r s
r s
r s

Do I need to replace anything in your codes given? I am sorry if it sound stupid, I really new to linux. I don't really understand everything. Thank you for your help and time. Really appreciated it.

Many thanks,
Aida
 
Old 02-06-2014, 09:11 PM   #6
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,429

Rep: Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348
The code I posted is for use in a bash shell. Are you using bash?

Also, I based my code on the example you supplied, but you have used '-f 12-' as an option to the cut command instead of the '-f 11-' that I used.

Please post the output of 'head -4 allele.txt' in BB code tags. The use of the code tags is linked in post#2.
 
Old 02-06-2014, 10:22 PM   #7
Aida
LQ Newbie
 
Registered: Feb 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Smile

Hi again Allend,

I hope I use the code tags correctly:

Code:
bash-3.2$ head -4 allele.txt
rs9629043       C/T     chr1    554636  +       ncbi_36 gis     AffymetrixSNP6  commercial.assay        Malay-89-unrelated      QC+     CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC  CC       CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC  CC       CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      NN      CC      CC  CC       CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC      CC
rs12565286      C/G     chr1    711153  +       ncbi_36 gis     AffymetrixSNP6  commercial.assay        Malay-89-unrelated      QC+     GG      GG      GG      GG      GG      CG      GG      GG      GG      CG      GG      GG      GG  GG       GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG  GG       GG      CG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      CG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG  GG       GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      CG      GG      GG      GG
rs12082473      A/G     chr1    730720  +       ncbi_36 gis     AffymetrixSNP6  commercial.assay        Malay-89-unrelated      QC+     GG      GG      GG      GG      GG      GG      GG      GG      GG      AG      GG      GG      GG  GG       GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG  GG       GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      NN      GG      GG      GG      GG      GG      GG      GG      GG  GG       GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG      GG
rs3094315       A/G     chr1    742429  +       ncbi_36 gis     Illumina1M_single       commercial.assay        Malay-89-unrelated      QC+     AA      AA      AA      AA      AA      AG      AA      AA      AA      AA      AA      AA  AA       AA      AA      AA      AG      AG      AG      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AG      AA      AA      AA      AA      AA  AA       AA      AA      AG      AA      AA      AA      AA      AA      AA      AA      AA      AA      AA      AG      AA      AA      AA      AG      AA      AA      AA      AA      AA      AA      AG      AA      AG      AA      AA  AG       AA      AA      AA      AA      AA      AA      AG      AA      AA      AA      AA      AA      AG      AA      AA      AA
Thanks again.

Aida
 
Old 02-06-2014, 10:35 PM   #8
Aida
LQ Newbie
 
Registered: Feb 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Smile

Hi Allend,

Thank you so much for helping me. I found awk code that works finally.

Code:
-unilogin-3.2$ awk '{OFS=""}{ORS=""}{for(i=12; i<=NF-1; i++) print substr($i, 1, 1) "\t" substr($i, 2, 1) "\t"; print substr($NF, 1, 1) "\t" substr($NF, 2, 1) "\n"}' allele.txt > split_allele.txt
Thank you everyone

Cheers,
Aida
 
Old 02-06-2014, 10:58 PM   #9
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,243

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
As many others have said (and chrism01 event gave you a link), please place [code][/code] tags around your code and data to make them more legible and retain formatting.

Here is an alternative solution in Ruby:
Code:
ruby -ane 'puts $F[11..-1].each{|x| x.gsub!(/(.)(.)/, "\\1 \\1")}.join(" ")' file
 
1 members found this post helpful.
Old 02-06-2014, 11:01 PM   #10
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,243

Rep: Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684Reputation: 2684
You do not need to set ORS in your awk script you can simply use 'print' without the use of "\n" and in your for loop use 'printf'
 
Old 02-06-2014, 11:13 PM   #11
Aida
LQ Newbie
 
Registered: Feb 2014
Posts: 6

Original Poster
Rep: Reputation: Disabled
Thank you Grail.
 
Old 02-06-2014, 11:46 PM   #12
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,429

Rep: Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348Reputation: 1348
Like I said, while you wait! Nothing like posting a bash solution to bring the big boys out to play. Nice one Grail
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[perl]How to treat string like "a b" as a single string when split? john.daker Programming 21 06-01-2009 06:57 PM
string split ovince Programming 4 06-10-2007 06:45 PM
split string prob izza_azhar Programming 3 02-08-2005 01:11 AM
split string izza_azhar Programming 6 01-18-2005 09:24 PM
Need to split an input string general4172 Linux - Software 6 10-30-2003 12:57 AM


All times are GMT -5. The time now is 08:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration