replacing characters only within a string of length 30 in multiple files
Hello,
I need help replacing 0 and 1 with a and c in strings of length 30 chracters in multiples files. The content of one file is: [Data] [[Samples]] #Number of independent chromosomes: 1 #Total number of polymorphic sites: 30 #Reporting status of a maximum of 30 sites # 30 polymorphic positions on chromosome 1 #1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 SampleName="Sample 1" SampleSize=27 SampleData= { 1_1 1 000001000000000010000000100010 000110001100000000000000000001 1_2 1 010000010000010000001000100000 010000000000010000001100100000 1_3 1 010000000000010000001000100000 010000010000010000001000100000 1_4 1 000110001101000000000000000000 010000000000010000001100100000 I want to replace the 0 and 1 in only the 30 character strings example:000001000000000010000000100010 with a and c respectively. Here is what I have tried: awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/0/a/g' trial_1_1.arp awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/1/c/g' trial_1_1.arp but this changes all 0 and 1 in the file called trial_1_1.arp without restricting it to only the 30 character string. I hope I can get help on this. |
not a full solution but heres a hint:
Code:
[schneidz@hyper sd-bak-04.04.2013]$ cat kdo.txt | while read line somewhere in the for will need to be an if something like: Code:
if [ `echo $line | wc -c` -lt 30 ] |
Building on schneidz suggestion, you could read the file (in this code I used test.txt) line by line in a bash while loop and test the line length using a bash parameter expansion.
Code:
#!/bin/bash Code:
[Data] |
Personally, I would use perl -
Code:
#!/usr/bin/perl This has the benefit of not looking at the comments which may also have 30 characters on a line. I think it could even be reduced to a single line: Code:
perl -ne 'if (/^[\d]+$/) { s/0/a/g; s/1/c/g;} print;' <inputdatafile >outputdatafile |
@jpollard - the downside with the perl script is you will need to also include white space as currently your script returns the original file in tact.
|
Dear all,
Thanks for the effort to help me solve my problem. Unfortunately all the answers provided return my file intact without changing the 0 and 1 within the strings of 30 characters.Below is the output I am looking for (in several files): [Data] [[Samples]] #Number of independent chromosomes: 1 #Total number of polymorphic sites: 30 #Reporting status of a maximum of 30 sites # 30 polymorphic positions on chromosome 1 #1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30 SampleName="Sample 1" SampleSize=27 SampleData= { 1_1 1 aaaaacaaaaaaaaaacaaaaaaacaaaca aaaccaaaaccaaaaaaaaaaaaaaaaaaac 1_2 1 acaaaaacaaaaacaaaaaacaaacaaaaa acaaaaaaaaaaacaaaaaaccaacaaaaa 1_3 1 aaaaaaaaaaaaacaaaaaacaaacaaaaa acaaaaacaaaaaaaaaaaacaaacaaaaa 1_4 1 aaaccaaaccacaaaaaaaaaaaaaaaaaaaa acaaaaaaaaaaacaaaaaaccaacaaaaa Thank you |
Well, the following seems to work form
Code:
#!/usr/bin/perl Now I am still assuming the 30 character number is at the end of a line... so that last line of your sample output has more than 30 digits... |
If the perl script doesn't do it, would you take a solution in C ?
|
Hello Jpollard,
Your script worked for me. Thanks so much. The only thing remaining is that I want the changes to save to the file. I will play with your scripts to see how I can do this. Thanks once again. |
Hello Metaschima,
The perl script worked. Thanks for offering to help. |
Quote:
|
You will also find that if you use [code][/code] tags around code or data it will preserve the formatting and help people understand the format of the data better :)
Just as a quick alternative: Code:
sed -r '/^[[:space:]]*[01]{30}$/{s/0/a/g;s/1/c/g}' file |
drat. Didn't think of the {30} construct...
But that still would modify comments. |
Quote:
|
Quote:
|
All times are GMT -5. The time now is 04:43 AM. |