LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 02-28-2014, 10:15 AM   #1
kdo
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Rep: Reputation: Disabled
replacing characters only within a string of length 30 in multiple files


Hello,
I need help replacing 0 and 1 with a and c in strings of length 30 chracters in multiples files. The content of one file is:
[Data]
[[Samples]]

#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30

SampleName="Sample 1"
SampleSize=27
SampleData= {
1_1 1 000001000000000010000000100010
000110001100000000000000000001
1_2 1 010000010000010000001000100000
010000000000010000001100100000
1_3 1 010000000000010000001000100000
010000010000010000001000100000
1_4 1 000110001101000000000000000000
010000000000010000001100100000

I want to replace the 0 and 1 in only the 30 character strings example:000001000000000010000000100010 with a and c respectively.

Here is what I have tried:
awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/0/a/g' trial_1_1.arp

awk 'length($1) == 30 { print $1 }' trial_1_1.arp | sed -i 's/1/c/g' trial_1_1.arp
but this changes all 0 and 1 in the file called trial_1_1.arp without restricting it to only the 30 character string. I hope I can get help on this.
 
Old 02-28-2014, 10:26 AM   #2
schneidz
LQ Guru
 
Registered: May 2005
Location: boston, usa
Distribution: fc-15/ fc-20-live-usb/ aix
Posts: 5,167

Rep: Reputation: 890Reputation: 890Reputation: 890Reputation: 890Reputation: 890Reputation: 890Reputation: 890
not a full solution but heres a hint:
Code:
[schneidz@hyper sd-bak-04.04.2013]$ cat kdo.txt | while read line
> do
>  echo number of chars = `echo $line | wc -c`
>  echo $line | tr '01' 'ac'
> done
number of chars = 7
[Data]
number of chars = 12
[[Samples]]
number of chars = 1

number of chars = 38
#Number of independent chromosomes: c
number of chars = 39
#Total number of polymorphic sites: 3a
number of chars = 43
#Reporting status of a maximum of 3a sites
number of chars = 43
# 3a polymorphic positions on chromosome c
number of chars = 111
#c, 2, 3, 4, 5, 6, 7, 8, 9, ca, cc, c2, c3, c4, c5, c6, c7, c8, c9, 2a, 2c, 22, 23, 24, 25, 26, 27, 28, 29, 3a
number of chars = 1

number of chars = 22
SampleName="Sample c"
number of chars = 14
SampleSize=27
number of chars = 14
SampleData= {
number of chars = 37
c_c c aaaaacaaaaaaaaaacaaaaaaacaaaca
number of chars = 31
aaaccaaaccaaaaaaaaaaaaaaaaaaac
number of chars = 37
c_2 c acaaaaacaaaaacaaaaaacaaacaaaaa
number of chars = 31
acaaaaaaaaaaacaaaaaaccaacaaaaa
number of chars = 37
c_3 c acaaaaaaaaaaacaaaaaacaaacaaaaa
number of chars = 31
acaaaaacaaaaacaaaaaacaaacaaaaa
number of chars = 37
c_4 c aaaccaaaccacaaaaaaaaaaaaaaaaaa
number of chars = 31
acaaaaaaaaaaacaaaaaaccaacaaaaa
awk and sed can probably do it to but this came to mind fisrst.

somewhere in the for will need to be an if something like:
Code:
if [ `echo $line | wc -c` -lt 30 ]
then
 do something
else
 do something else
fi

Last edited by schneidz; 02-28-2014 at 10:29 AM.
 
Old 02-28-2014, 11:42 AM   #3
allend
Senior Member
 
Registered: Oct 2003
Location: Melbourne
Distribution: Slackware-current
Posts: 4,690

Rep: Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574Reputation: 1574
Building on schneidz suggestion, you could read the file (in this code I used test.txt) line by line in a bash while loop and test the line length using a bash parameter expansion.
Code:
#!/bin/bash

while read line; do 
  if [[ ${#line} == 30 ]]; then
    echo "$line" | tr  '01' 'ac';
  else
    echo "$line";
  fi;
done < test.txt
The output is
Code:
[Data]
[[Samples]]

#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30

SampleName="Sample 1"
SampleSize=27
SampleData= {
1_1 1 000001000000000010000000100010
aaaccaaaccaaaaaaaaaaaaaaaaaaac
1_2 1 010000010000010000001000100000
acaaaaaaaaaaacaaaaaaccaacaaaaa
1_3 1 010000000000010000001000100000
acaaaaacaaaaacaaaaaacaaacaaaaa
1_4 1 000110001101000000000000000000
acaaaaaaaaaaacaaaaaaccaacaaaaa
 
1 members found this post helpful.
Old 02-28-2014, 11:55 AM   #4
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,714

Rep: Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280
Personally, I would use perl -
Code:
#!/usr/bin/perl

while(<>) {
    if (/^\d*$/) {
        s/0/a/g; s/1/c/g;
    }
   print;
}
This looks for only lines composed of digits (assuming it could be 0-9 as well as just 0/1). If it IS only 0/1 then replace the \d with 01 (it would then look like /^[01]$/).

This has the benefit of not looking at the comments which may also have 30 characters on a line.

I think it could even be reduced to a single line:
Code:
perl -ne 'if (/^[\d]+$/) { s/0/a/g; s/1/c/g;} print;' <inputdatafile >outputdatafile
 
Old 02-28-2014, 12:19 PM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,565

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
@jpollard - the downside with the perl script is you will need to also include white space as currently your script returns the original file in tact.
 
Old 02-28-2014, 01:04 PM   #6
kdo
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Dear all,

Thanks for the effort to help me solve my problem. Unfortunately all the answers provided
return my file intact without changing the 0 and 1 within the strings of 30 characters.Below is the output
I am looking for (in several files):


[Data]
[[Samples]]

#Number of independent chromosomes: 1
#Total number of polymorphic sites: 30
#Reporting status of a maximum of 30 sites
# 30 polymorphic positions on chromosome 1
#1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30

SampleName="Sample 1"
SampleSize=27
SampleData= {
1_1 1 aaaaacaaaaaaaaaacaaaaaaacaaaca
aaaccaaaaccaaaaaaaaaaaaaaaaaaac
1_2 1 acaaaaacaaaaacaaaaaacaaacaaaaa
acaaaaaaaaaaacaaaaaaccaacaaaaa
1_3 1 aaaaaaaaaaaaacaaaaaacaaacaaaaa
acaaaaacaaaaaaaaaaaacaaacaaaaa
1_4 1 aaaccaaaccacaaaaaaaaaaaaaaaaaaaa
acaaaaaaaaaaacaaaaaaccaacaaaaa

Thank you
 
Old 02-28-2014, 04:42 PM   #7
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,714

Rep: Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280
Well, the following seems to work form
Code:
#!/usr/bin/perl

while(<>) {
    if (! /^#/) {
        @v = split;
        if (30 == length($v[$#v])) {
            if ($v[$#v] =~ /^[01]*$/) {
                $v[$#v] =~ s/0/a/g;
                $v[$#v] =~ s/1/c/g;
            }
        }
        print join(' ',@v),"\n";
    } else {
        print;
    }
}
It is longer, but having to handle parts of a record is a bit trickier.

Now I am still assuming the 30 character number is at the end of a line...

so that last line of your sample output has more than 30 digits...

Last edited by jpollard; 02-28-2014 at 04:44 PM.
 
Old 02-28-2014, 05:14 PM   #8
metaschima
Senior Member
 
Registered: Dec 2013
Distribution: Slackware
Posts: 1,982

Rep: Reputation: 491Reputation: 491Reputation: 491Reputation: 491Reputation: 491
If the perl script doesn't do it, would you take a solution in C ?
 
Old 02-28-2014, 05:40 PM   #9
kdo
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hello Jpollard,
Your script worked for me. Thanks so much. The only
thing remaining is that I want the changes to save
to the file. I will play with your scripts to see
how I can do this. Thanks once again.

Last edited by kdo; 02-28-2014 at 05:43 PM.
 
Old 02-28-2014, 05:46 PM   #10
kdo
LQ Newbie
 
Registered: Feb 2014
Posts: 5

Original Poster
Rep: Reputation: Disabled
Hello Metaschima,
The perl script worked. Thanks for offering to help.
 
Old 02-28-2014, 09:05 PM   #11
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,714

Rep: Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280
Quote:
Originally Posted by kdo View Post
Hello Jpollard,
Your script worked for me. Thanks so much. The only
thing remaining is that I want the changes to save
to the file. I will play with your scripts to see
how I can do this. Thanks once again.
The simplest is to redirect input from the file, and output to a new file.
 
Old 03-01-2014, 12:12 AM   #12
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,565

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
You will also find that if you use [code][/code] tags around code or data it will preserve the formatting and help people understand the format of the data better

Just as a quick alternative:
Code:
sed -r '/^[[:space:]]*[01]{30}$/{s/0/a/g;s/1/c/g}' file
Once you are happy with the output, simply add the -i option.
 
Old 03-01-2014, 06:37 AM   #13
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,714

Rep: Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280
drat. Didn't think of the {30} construct...

But that still would modify comments.
 
Old 03-01-2014, 10:02 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,565

Rep: Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901Reputation: 2901
Quote:
But that still would modify comments.
I fail to see how as the sed encompasses the entire line (^$)? Unless whitespace prior to the digits signifies a comment??
 
Old 03-01-2014, 03:05 PM   #15
jpollard
Senior Member
 
Registered: Dec 2012
Location: Washington DC area
Distribution: Fedora, CentOS, Slackware
Posts: 4,714

Rep: Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280Reputation: 1280
Quote:
Originally Posted by grail View Post
I fail to see how as the sed encompasses the entire line (^$)? Unless whitespace prior to the digits signifies a comment??
You are right. I'm an idiot.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Replacing multiple string in multiple files with awk jnorbert Linux - Newbie 9 03-26-2013 01:39 PM
search and replace string having multiple special characters say_hi_ravi Linux - Newbie 4 08-26-2009 08:43 AM
Replacing lines in files that contain special characters arizonagroovejet Linux - General 3 06-22-2009 10:19 PM
Truncating the length of files\directors to 31 characters or less daysleeper Linux - General 3 09-27-2006 01:28 AM
Trying to figure out the number of characters in a std::string, using length. RHLinuxGUY Programming 4 05-23-2006 10:39 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 05:25 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration