LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 11-15-2011, 08:27 PM   #1
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Rep: Reputation: Disabled
sed command to replace special character /


Hi All,

I'm a biochemist/geneticist at heart, but I have been dabbling with our local high performance computer a bit to assemble some sequence data.

I'm using the following command to replace "2:N:0:ACAGT" with "/2"

sed -i 's/2:N:0:ACAGT//2/g' NQ001_R2t.fastq > NQ001_R2tr.fastq &

Naturally, the command doesn't like the additional "/" any ideas to fix this or an alternative solution?

The file I'm working on is quite large... there are about 30 million entries that need to be changed.
 
Old 11-15-2011, 09:14 PM   #2
evo2
Guru
 
Registered: Jan 2009
Location: Japan
Distribution: Mostly Debian and Scientific Linux
Posts: 5,520

Rep: Reputation: 1217Reputation: 1217Reputation: 1217Reputation: 1217Reputation: 1217Reputation: 1217Reputation: 1217Reputation: 1217Reputation: 1217
Hi,

an often overlooked feature of sed is that you don't have to use / as the "delimiter" (sorry don't remember the correct name). You can use any character. So just use something else and then / should be treated correctly in your search string.

Eg, using "$" instead of "/"
Code:
sed 's$2:N:0:ACAGT$/2$g' NQ001_R2t.fastq  > NQ001_R2tr.fastq
Note also that since you are redirecting the output to a different file is makes no sense to specify an in place edit (-i).

HTH,

Evo2.
 
Old 11-15-2011, 09:21 PM   #3
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
That's great, thanks!!

As I said, I'm completely new to this. My main programming experience was 17 years ago in very beginner basic.
I'm mainly copying stuff and trying to figure out how I can get it to work for my own needs.
I was wondering why redirecting the output didn't work, it just produced an empty file. I will remove "-i".
 
Old 11-16-2011, 01:03 AM   #4
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.5, Centos 5.10
Posts: 16,311

Rep: Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040Reputation: 2040
Here's a good sed HOWTO http://www.grymoire.com/Unix/Sed.html#uh-0. More generally, this is good http://rute.2038bug.com/index.html.gz
 
Old 11-16-2011, 01:39 AM   #5
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
Thanks for the links. So much to read up on!
 
Old 11-16-2011, 04:01 AM   #6
linuxwin2
Member
 
Registered: Oct 2011
Posts: 44

Rep: Reputation: Disabled
Or


sed -i 's/2:N:0:ACAGT/\/2/g' NQ001_R2t.fastq > NQ001_R2tr.fastq &
 
Old 11-16-2011, 04:11 AM   #7
fortran
Member
 
Registered: Nov 2011
Location: Cairo, Egypt
Distribution: CentOS, RHEL, Fedora
Posts: 300
Blog Entries: 2

Rep: Reputation: 50
sed 's/2:N:0:ACAGT/\/2/g' ""path of old file.txt" > "path of new file.txt"
 
Old 11-16-2011, 11:36 AM   #8
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
By the way, "any delimiter" really means almost any ascii character. I've done some tests, and the only ones you cannot use are null and newline. You can even use non-printing control characters with the help of various shell features that let you insert their literal values.

Code:
I=$'\a'	#sets variable I to "system bell" using ansi-c style quoting (bash/ksh)

sed "s${I}foo${I}bar${I}" file
Using the full ${var} form makes it more readable, IMO.
 
Old 11-22-2011, 04:01 AM   #9
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
Hi All,

I have an additional problem. Turns out that there is some variation in the string I want to replace (I have 5x 30 million entries to go through).

I thought the string is 1:N:0:ACAGTG
However, the N can also be a Y, and the 0 can also be a number with 2 to 4 digits.

I tried to use wildcards (i.e. 1:*:*:ACAGTG, 1:*:**:ACAGTG, 1:*:***:ACAGTG, 1:*:****:ACAGTG in four different sed commands) but it didn't work. Any ideas how I can replace them all? There could be about a hundert variations of the numbers and I don't want to replace them individually.
 
Old 11-23-2011, 06:13 AM   #10
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian sid + kde 3.5 & 4.4
Posts: 6,823

Rep: Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950Reputation: 1950
It would likely help more if you could post some actual example lines, and show us where they need to be changed. Include examples of lines that should not be matched, if there's any risk of catching the wrong ones.

And please use [code][/code] tags around your code and data, to preserve formatting and to improve readability.

sed doesn't use "wildcards" (traditionally called globbing in computer-ese), it uses regular expressions, which are more complex and powerful, and have a different syntax.

In regex, * means "zero or more of the previous character" (or expression), so ":*" matches a string of colons of any length (including none). Also note that * is what we call "greedy", which means that it will always try to match the longest string possible. In a simple case like this it may not be a problem, but it can be hard to control with more complex patterns.

Many, many computer tools support and use regex, so I highly suggest you get online, find yourself a good regular expressions tutorial, and work your way through it. At the very least learn the basic level stuff. Here's a fairly decent guide to start you off: http://www.grymoire.com/Unix/Regular.html


Anyway, to do what you want you first need to apply sed's "extended" regex option (-r). Taking the description you gave, here are two possibilities, depending on how accurate you need it to be.

Code:
sed -r 's^1:[NY]:[0-9]{1,4}:ACAGTG^/2^'
I decided to use ^ as the delimiter.

[NY] means to match either a single N or Y. [..] is used to specify a list of possible characters that can exist at a position. Similarly...

[0-9]{1,4} means one to four digits. [0-9] means any digit, and {m,n} specifies the number of repeats of the previous match that are allowed (this is the part that requires the use of ext. regex).

So the above will match any number up to four digits long in the third field. However your description appears to state that the field can be either a single 0, or 2-4 digits of any kind. If this means that we have to avoid matching a single digit other than 0, then we have to be more cautious.

Code:
sed -r 's^1:[NY]:(0|[0-9]{2,4}):ACAGTG^/2^'
For this, I've grouped two possible choices together using (|). That position can now be either 0, or it can be 2-4 digits. But it can't be a single non-zero digit.


Notice how effective use of regular expressions depends on you being able to clearly define the pattern to match. In particular you need to be able to state what exactly makes the section you want different from the sections you don't want. So if what I gave doesn't suit your purposes, you'll need to come back with a more detailed explanation of what needs to be matched.
 
Old 11-23-2011, 08:26 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,627

Rep: Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947Reputation: 1947
It might also help if you obeyed LQ rules and only asked your question in one place (http://www.linuxquestions.org/questi...ommand-914971/)
 
Old 11-23-2011, 08:59 AM   #12
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
Yes, I'm sorry about that. I hope it is ok if I change this thread on solved an answer in the other thread.

Thank you for your very extensive answer David, it was very helpful. I will read up on the uses regular expression.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] sed help - replace line feed with different character bradvan Programming 7 04-23-2012 12:31 AM
sed code to replace character vishesh Linux - Newbie 2 05-17-2011 09:01 AM
Using sed to replace special characters in a file wskibum Linux - Software 2 03-24-2011 10:47 PM
Replace 2nd to last Character with SED elproducto Programming 5 03-31-2009 01:41 PM
how to replace special character by any word in a string. mksc Linux - Newbie 1 08-21-2008 03:33 AM


All times are GMT -5. The time now is 04:41 PM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration