[SOLVED] replacing characters only within a string of length 30 in multiple files
Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I want to replace the 0 and 1 in only the 30 character strings example:000001000000000010000000100010 with a and c respectively.
Here is what I have tried:
awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/0/a/g' trial_1_1.arp
awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/1/c/g' trial_1_1.arp
but this changes all 0 and 1 in the file called trial_1_1.arp without restricting it to only the 30 character string. I hope I can get help on this.
[schneidz@hyper sd-bak-04.04.2013]$ cat kdo.txt | while read line
> do
> echo number of chars = `echo $line | wc -c`
> echo $line | tr '01' 'ac'
> done
number of chars = 7
[Data]
number of chars = 12
[[Samples]]
number of chars = 1
number of chars = 38
#Number of independent chromosomes: c
number of chars = 39
#Total number of polymorphic sites: 3a
number of chars = 43
#Reporting status of a maximum of 3a sites
number of chars = 43
# 3a polymorphic positions on chromosome c
number of chars = 111
#c, 2, 3, 4, 5, 6, 7, 8, 9, ca, cc, c2, c3, c4, c5, c6, c7, c8, c9, 2a, 2c, 22, 23, 24, 25, 26, 27, 28, 29, 3a
number of chars = 1
number of chars = 22
SampleName="Sample c"
number of chars = 14
SampleSize=27
number of chars = 14
SampleData= {
number of chars = 37
c_c c aaaaacaaaaaaaaaacaaaaaaacaaaca
number of chars = 31
aaaccaaaccaaaaaaaaaaaaaaaaaaac
number of chars = 37
c_2 c acaaaaacaaaaacaaaaaacaaacaaaaa
number of chars = 31
acaaaaaaaaaaacaaaaaaccaacaaaaa
number of chars = 37
c_3 c acaaaaaaaaaaacaaaaaacaaacaaaaa
number of chars = 31
acaaaaacaaaaacaaaaaacaaacaaaaa
number of chars = 37
c_4 c aaaccaaaccacaaaaaaaaaaaaaaaaaa
number of chars = 31
acaaaaaaaaaaacaaaaaaccaacaaaaa
awk and sed can probably do it to but this came to mind fisrst.
somewhere in the for will need to be an if something like:
Code:
if [ `echo $line | wc -c` -lt 30 ]
then
do something
else
do something else
fi
Building on schneidz suggestion, you could read the file (in this code I used test.txt) line by line in a bash while loop and test the line length using a bash parameter expansion.
Code:
#!/bin/bash
while read line; do
if [[ ${#line} == 30 ]]; then
echo "$line" | tr '01' 'ac';
else
echo "$line";
fi;
done < test.txt
The output is
Code:
[Data]
[[Samples]]
#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
SampleName="Sample 1"
SampleSize=27
SampleData= {
1_1 1 000001000000000010000000100010
aaaccaaaccaaaaaaaaaaaaaaaaaaac
1_2 1 010000010000010000001000100000
acaaaaaaaaaaacaaaaaaccaacaaaaa
1_3 1 010000000000010000001000100000
acaaaaacaaaaacaaaaaacaaacaaaaa
1_4 1 000110001101000000000000000000
acaaaaaaaaaaacaaaaaaccaacaaaaa
This looks for only lines composed of digits (assuming it could be 0-9 as well as just 0/1). If it IS only 0/1 then replace the \d with 01 (it would then look like /^[01]$/).
This has the benefit of not looking at the comments which may also have 30 characters on a line.
I think it could even be reduced to a single line:
Thanks for the effort to help me solve my problem. Unfortunately all the answers provided
return my file intact without changing the 0 and 1 within the strings of 30 characters.Below is the output
I am looking for (in several files):
[Data]
[[Samples]]
#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30
Hello Jpollard,
Your script worked for me. Thanks so much. The only
thing remaining is that I want the changes to save
to the file. I will play with your scripts to see
how I can do this. Thanks once again.
Hello Jpollard,
Your script worked for me. Thanks so much. The only
thing remaining is that I want the changes to save
to the file. I will play with your scripts to see
how I can do this. Thanks once again.
The simplest is to redirect input from the file, and output to a new file.
You will also find that if you use [code][/code] tags around code or data it will preserve the formatting and help people understand the format of the data better
Just as a quick alternative:
Code:
sed -r '/^[[:space:]]*[01]{30}$/{s/0/a/g;s/1/c/g}' file
Once you are happy with the output, simply add the -i option.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.