LinuxQuestions.org - [SOLVED] replacing characters only within a string of length 30 in multiple files

Page 1 of 2

Show 50 post(s) from this thread on one page

- Linux - Newbie (https://www.linuxquestions.org/questions/linux-newbie-8/)

- - replacing characters only within a string of length 30 in multiple files (https://www.linuxquestions.org/questions/linux-newbie-8/replacing-characters-only-within-a-string-of-length-30-in-multiple-files-4175496541/)

kdo	02-28-2014 09:15 AM

replacing characters only within a string of length 30 in multiple files

Hello,
I need help replacing 0 and 1 with a and c in strings of length 30 chracters in multiples files. The content of one file is:
[Data]
[[Samples]]

#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30

SampleName="Sample 1"
SampleSize=27
SampleData= {
1_1 1 000001000000000010000000100010
000110001100000000000000000001
1_2 1 010000010000010000001000100000
010000000000010000001100100000
1_3 1 010000000000010000001000100000
010000010000010000001000100000
1_4 1 000110001101000000000000000000
010000000000010000001100100000

I want to replace the 0 and 1 in only the 30 character strings example:000001000000000010000000100010 with a and c respectively.

Here is what I have tried:
awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/0/a/g' trial_1_1.arp

awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/1/c/g' trial_1_1.arp
but this changes all 0 and 1 in the file called trial_1_1.arp without restricting it to only the 30 character string. I hope I can get help on this.

schneidz

02-28-2014 09:26 AM

not a full solution but heres a hint:

Code:

[schneidz@hyper sd-bak-04.04.2013]$ cat kdo.txt | while read line

> do

>  echo number of chars = `echo $line | wc -c`

>  echo $line | tr '01' 'ac'

> done

number of chars = 7

[Data]

number of chars = 12

[[Samples]]

number of chars = 1



number of chars = 38

#Number of independent chromosomes: c

number of chars = 39

#Total number of polymorphic sites: 3a

number of chars = 43

#Reporting status of a maximum of 3a sites

number of chars = 43

# 3a polymorphic positions on chromosome c

number of chars = 111

#c, 2, 3, 4, 5, 6, 7, 8, 9, ca, cc, c2, c3, c4, c5, c6, c7, c8, c9, 2a, 2c, 22, 23, 24, 25, 26, 27, 28, 29, 3a

number of chars = 1



number of chars = 22

SampleName="Sample c"

number of chars = 14

SampleSize=27

number of chars = 14

SampleData= {

number of chars = 37

c_c c aaaaacaaaaaaaaaacaaaaaaacaaaca

number of chars = 31

aaaccaaaccaaaaaaaaaaaaaaaaaaac

number of chars = 37

c_2 c acaaaaacaaaaacaaaaaacaaacaaaaa

number of chars = 31

acaaaaaaaaaaacaaaaaaccaacaaaaa

number of chars = 37

c_3 c acaaaaaaaaaaacaaaaaacaaacaaaaa

number of chars = 31

acaaaaacaaaaacaaaaaacaaacaaaaa

number of chars = 37

c_4 c aaaccaaaccacaaaaaaaaaaaaaaaaaa

number of chars = 31

acaaaaaaaaaaacaaaaaaccaacaaaaa

awk and sed can probably do it to but this came to mind fisrst.

somewhere in the for will need to be an if something like:

Code:

if [ `echo $line | wc -c` -lt 30 ]

then

 do something

else

 do something else

fi

allend

02-28-2014 10:42 AM

Building on schneidz suggestion, you could read the file (in this code I used test.txt) line by line in a bash while loop and test the line length using a bash parameter expansion.

Code:

#!/bin/bash



while read line; do 

  if [[ ${#line} == 30 ]]; then

    echo "$line" | tr  '01' 'ac';

  else

    echo "$line";

  fi;

done < test.txt

The output is

Code:

[Data]

[[Samples]]



#Number of independent chromosomes: 1

#Total number of polymorphic sites: 30

#Reporting status of a maximum of 30 sites

# 30 polymorphic positions on chromosome 1

#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30



SampleName="Sample 1"

SampleSize=27

SampleData= {

1_1 1 000001000000000010000000100010

aaaccaaaccaaaaaaaaaaaaaaaaaaac

1_2 1 010000010000010000001000100000

acaaaaaaaaaaacaaaaaaccaacaaaaa

1_3 1 010000000000010000001000100000

acaaaaacaaaaacaaaaaacaaacaaaaa

1_4 1 000110001101000000000000000000

acaaaaaaaaaaacaaaaaaccaacaaaaa

jpollard

02-28-2014 10:55 AM

Personally, I would use perl -

Code:

#!/usr/bin/perl



while(<>) {

    if (/^\d*$/) {

        s/0/a/g; s/1/c/g;

    }

  print;

}

This looks for only lines composed of digits (assuming it could be 0-9 as well as just 0/1). If it IS only 0/1 then replace the \d with 01 (it would then look like /^[01]$/).

This has the benefit of not looking at the comments which may also have 30 characters on a line.

I think it could even be reduced to a single line:

Code:

perl -ne 'if (/^[\d]+$/) { s/0/a/g; s/1/c/g;} print;' <inputdatafile >outputdatafile

grail

02-28-2014 11:19 AM

@jpollard - the downside with the perl script is you will need to also include white space as currently your script returns the original file in tact.

kdo	02-28-2014 12:04 PM

Dear all,

Thanks for the effort to help me solve my problem. Unfortunately all the answers provided
return my file intact without changing the 0 and 1 within the strings of 30 characters.Below is the output
I am looking for (in several files):

[Data]
[[Samples]]

#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30

SampleName="Sample 1"
SampleSize=27
SampleData= {
1_1 1 aaaaacaaaaaaaaaacaaaaaaacaaaca
aaaccaaaaccaaaaaaaaaaaaaaaaaaac
1_2 1 acaaaaacaaaaacaaaaaacaaacaaaaa
acaaaaaaaaaaacaaaaaaccaacaaaaa
1_3 1 aaaaaaaaaaaaacaaaaaacaaacaaaaa
acaaaaacaaaaaaaaaaaacaaacaaaaa
1_4 1 aaaccaaaccacaaaaaaaaaaaaaaaaaaaa
acaaaaaaaaaaacaaaaaaccaacaaaaa

Thank you

jpollard

02-28-2014 03:42 PM

Well, the following seems to work form

Code:

#!/usr/bin/perl



while(<>) {

    if (! /^#/) {

        @v = split;

        if (30 == length($v[$#v])) {

            if ($v[$#v] =~ /^[01]*$/) {

                $v[$#v] =~ s/0/a/g;

                $v[$#v] =~ s/1/c/g;

            }

        }

        print join(' ',@v),"\n";

    } else {

        print;

    }

}

It is longer, but having to handle parts of a record is a bit trickier.

Now I am still assuming the 30 character number is at the end of a line...

so that last line of your sample output has more than 30 digits...

metaschima

02-28-2014 04:14 PM

If the perl script doesn't do it, would you take a solution in C ?

kdo	02-28-2014 04:40 PM

Hello Jpollard,
Your script worked for me. Thanks so much. The only
thing remaining is that I want the changes to save
to the file. I will play with your scripts to see
how I can do this. Thanks once again.

kdo	02-28-2014 04:46 PM

Hello Metaschima,
The perl script worked. Thanks for offering to help.

jpollard

02-28-2014 08:05 PM

Quote:

Originally Posted by kdo (Post 5126735)

The simplest is to redirect input from the file, and output to a new file.

grail

02-28-2014 11:12 PM

You will also find that if you use [code][/code] tags around code or data it will preserve the formatting and help people understand the format of the data better :)

Just as a quick alternative:

Code:

sed -r '/^[[:space:]]*[01]{30}$/{s/0/a/g;s/1/c/g}' file

Once you are happy with the output, simply add the -i option.

jpollard

03-01-2014 05:37 AM

drat. Didn't think of the {30} construct...

But that still would modify comments.

grail

03-01-2014 09:02 AM

Quote:

But that still would modify comments.

I fail to see how as the sed encompasses the entire line (^$)? Unless whitespace prior to the digits signifies a comment??

jpollard

03-01-2014 02:05 PM

Quote:

Originally Posted by grail (Post 5126990)

I fail to see how as the sed encompasses the entire line (^$)? Unless whitespace prior to the digits signifies a comment??

You are right. I'm an idiot.

All times are GMT -5. The time now is 04:43 AM.

Page 1 of 2

Show 50 post(s) from this thread on one page