LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 11-22-2011, 05:14 PM   #1
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Rep: Reputation: Disabled
Using wildcards in a sed command


Hi All,

I have an additional problem with the sed command.

I would like to replace the string "1:N:0:ACAGTG" with /1. However, I recently found out that the N can also be a Y, and the 0 can also be a number with 1 to 4 digits.

The basic command I'm using is: sed -i 's! 1:N:0:ACAGTG!/1!g' NQ001/NQ001_R1r.fastq

I tried to use wildcards (i.e. 1:*:*:ACAGTG, 1:*:**:ACAGTG, 1:*:***:ACAGTG, 1:*:****:ACAGTG in four different sed commands) but it didn't work. Any ideas how I can replace them all? There could be about a hundert variations of the numbers in about 30 million entries per file and I don't want to replace them individually.
 
Old 11-22-2011, 06:23 PM   #2
jthill
Member
 
Registered: Mar 2010
Distribution: Arch
Posts: 209

Rep: Reputation: 65
I think you're confusing the shell's wildcard * with regular-expressions' 0-n repeat * operator. Your 1:*:*:ACAGTG (etc.) specifies a 1 followed by any number of colons followed by any other number of colons followed by a single colon .... but nowhere in there are you searching for anything between the colons. The only text your expression can match is " 1" followed by at least one colon followed by ACAGTG.

What I think you want is sed -i 's, 1:[^:]*:[^:]*:ACAGTG,/1,g'. That'll match " 1:Flew:OvertheCuckoo'sNest:ACAGTG" so you may want to hunt up how to restrict the matches a bit better.

Last edited by jthill; 11-22-2011 at 06:25 PM.
 
Old 11-22-2011, 06:36 PM   #3
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
That's great, thanks!

I'm pretty sure that the start and end are quite unique and there is no need to restrict the matches better.
 
Old 11-22-2011, 06:51 PM   #4
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
Well if you did want to:
Code:
sed -ri 's!1:[NY]:[0-9]{1,4}:ACAGTG!/1!g' NQ001/NQ001_R1r.fastq
 
Old 11-22-2011, 07:07 PM   #5
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
In both cases I get the following error message:

[jc167987@login NQ017]$ sed -i 's! 1:[^:]*:[^:]*:GCCAAT!/1!g' NQ017_R1r.fastq
/1!g: Event not found.
[jc167987@login NQ017]$ sed -ri 's! 1:[NY]:[0-9]{1,4}:GCCAAT!/1!g' NQ017_R1r.fastq
/1!g: Event not found.

Edit: actually, if I run it in a script it seems to work, but I have only tested the first command so far.

Last edited by Lokelo; 11-22-2011 at 07:21 PM. Reason: adding information
 
Old 11-23-2011, 01:22 AM   #6
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
I am curious where you are running this as 'Event not found' is not an error message I have ever seen before from sed??
I have tested and the solutions seem to work ok for me.
 
Old 11-23-2011, 01:51 AM   #7
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
I ran them on our high performance computer running Linux by using putty.

I'm new to all of this, so I'm not sure what other information I could give you.
 
Old 11-23-2011, 02:21 AM   #8
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
What version of sed are you running?
Are the lines you have shown and the respective errors typed or you have copy and pasted from the terminal?

Last edited by grail; 11-23-2011 at 02:22 AM.
 
Old 11-23-2011, 02:34 AM   #9
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Quote:
Originally Posted by Lokelo View Post
In both cases I get the following error message:

[jc167987@login NQ017]$ sed -i 's! 1:[^:]*:[^:]*:GCCAAT!/1!g' NQ017_R1r.fastq
/1!g: Event not found.
[jc167987@login NQ017]$ sed -ri 's! 1:[NY]:[0-9]{1,4}:GCCAAT!/1!g' NQ017_R1r.fastq
/1!g: Event not found.

Edit: actually, if I run it in a script it seems to work, but I have only tested the first command so far.
Hi,

the ! runs the last command that matches the following letters (history expansion). Example:
Code:
$ echo hello
hello
$ !ec
echo hello
hello
$
If you run the above 'sed' with double-quotes instead of single-quotes then bash will try to match a command that you issued earlier that starts with
/1g

Since there is no such command it gives the 'event not found' error. Are you sure that you used single-quotes? Those should prevent this kind of error.

Last edited by crts; 11-23-2011 at 02:38 AM.
 
Old 11-23-2011, 08:15 AM   #10
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
Thanks for all the answers and sorry about the double posting of my question.

I couldn't get the version of sed. I'm working remotly on a high performance computer and sed -V (or v) didn't come up with a version number.
I just started working with linux a month ago, so I'm not 100% sure yet what I'm doing all the time.

The lines I posted above containing the error message were directly copy/pasted from the terminal.

Crts, you said if I run it with double quotes it will try to redo a previous command, but I ran it with single quotes.

The thing is, it runs perfectly fine if I use it within a script (see below), but I got the error message when using them directly in the terminal.

The data looks like this:

Code:
@HWI-ST261:396:B0D48ABXX:8:1101:15630:3112 1:N:0:ACAGTG
TCAGGGGTGAATGGATGCACTGTTCTGGATGGTGGTGCTGAACTAGCACCGGGTGCTTGTGGATGTGCCAGAGAAGCATACAAGGGGACGGTGGAAGGAT
+
BCCFFFF@FHGHHJJJJJJJJJJJJJIJEGIJCAH@GHGIHHIIEHE?FHIJJ5AHHHGBDD@DA;;AC;;5;(;?>,:@C:>:ABD555>0?<+49@<:
@HWI-ST261:396:B0D48ABXX:8:1101:15519:3117 1:N:0:ACAGTG
CCGCGATATGCCGTCTCGACGCCGACAACGAGCATCATCAAGATAATCGACCACTTCTATGATCTGAAGCTCGGTTGTTGCCTCTTCTCTCCTCCAGTCT
+
@CCFFFFFHHHHHJJJJJJJJJJJJJJIJJJJJHHHHHHFFFEFFEEECEDDDDDDDD>DCCCA@CDDDDDDDDB@B@BA>@ACDDDD@CDD@<C#####
@HWI-ST261:396:B0D48ABXX:8:1101:15632:3220 1:Y:2016:ACAGTG
CGGAGAGGGAGTAGACGAGCTGCGGCAGCACCTCGTTCGAGACGACCGCCTCAGCGAGCTCGTCGTTGTAGTTGGCGAGGCGCCCGAGGGCGAGCGCTGC
+
@CCFFFFFHFHFHIJIJJJJJJJGJJGHEDAHIJGGIGIGHGFAD8>BDBBDDDA'5057@-&8;;@C?:>(4:>CB5<B9>-5@B##############
@HWI-ST261:396:B0D48ABXX:8:1103:3693:192960 1:N:514:ACAGTG
CTCGCCAACATCGCCGCCCCTATTTTGATGGAGTAGTACGCCCCTCGCCTCCGAACACAACTCATCCGATGGCATCACGTCGTTGGGCACTTGAGACCGG
+
@@@BFFDDHHHHHJJIIJJIDFHGIIE=GIJ3BFHHGIJCGHHHHHFFDBDDD8;?BCB@BACCDCCB<?9-<3@?C@0+8>BBC###############
As I said, I managed to replace the bit I wanted using the follwing script:
Code:
#!/bin/bash

sed -i 's! 1:[^:]*:[^:]*:GCCAAT!/1!g' /home/11/jc167987/NGSdata/Data/NQ017/NQ017_R1r.fastq &
sed -i 's! 1:[^:]*:[^:]*:CAGATC!/1!g' /home/11/jc167987/NGSdata/Data/NQ040/NQ040_R1r.fastq &
sed -i 's! 1:[^:]*:[^:]*:ACTTGA!/1!g' /home/11/jc167987/NGSdata/Data/NQ136/NQ136_R1r.fastq &
sed -i 's! 1:[^:]*:[^:]*:GATCAG!/1!g' /home/11/jc167987/NGSdata/Data/NQ283/NQ283_R1r.fastq &

sed -i 's! 2:[^:]*:[^:]*:GCCAAT!/2!g' /home/11/jc167987/NGSdata/Data/NQ017/NQ017_R2r.fastq &
sed -i 's! 2:[^:]*:[^:]*:CAGATC!/2!g' /home/11/jc167987/NGSdata/Data/NQ040/NQ040_R2r.fastq &
sed -i 's! 2:[^:]*:[^:]*:ACTTGA!/2!g' /home/11/jc167987/NGSdata/Data/NQ136/NQ136_R2r.fastq &
sed -i 's! 2:[^:]*:[^:]*:GATCAG!/2!g' /home/11/jc167987/NGSdata/Data/NQ283/NQ283_R2r.fastq &
I only tried the second command once in a script, which didn't work. But since the command above worked I didn't follow it up further, although the other command is a bit more elegant.

Last edited by Lokelo; 11-23-2011 at 06:12 PM. Reason: slightly changed the dataoutput to remove accidental smileys
 
Old 11-23-2011, 08:28 AM   #11
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,442

Rep: Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880Reputation: 1880
No short option for version so you would need --version.

I would ask, does the above actually work, ie have the changes been made in the file(s)?

Are you pushing all the commands into the background because the file(s) are so large?
 
Old 11-23-2011, 09:14 AM   #12
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
The files are about 5 Gb each. Am I correct in the understanding that if I didn't use &, they would just be carried out in sequence? And with the & they are done in parallel?

The above works and the changes were made in the files. I checked by grepping the adaptor sequences (I.e. the strings of A,C,T and G at the end of the ID) and the searches came up with nothing. Each file has about 30 million entries containing the four lines shown above. That's why I didn't find the variation in the numbers until I learned about grep, since they are comparably rare.

I have a limited time to assemble my transcriptomes from scratch and I as much as I would love to read up on everything I'm detail, I'm just focussing on what I need for the time being. Hence I use a lot of copying with just enough understanding to make it work.
However, I'm highly fascinated by this experience (when I was 16 I had the choice to go into chemistry or IT, and chemistry won, even though I still like using computers on a higher level) and certainly will get as much Linux knowledge as I can over time.

I will get the version number tomorrow morning.
 
Old 11-23-2011, 10:40 AM   #13
jthill
Member
 
Registered: Mar 2010
Distribution: Arch
Posts: 209

Rep: Reputation: 65
No need for the version number: crts nailed it. Lose the bangs ("!").

Instead of s!this!that!g use s,this,that,g or s`this`that`g or whatever.

I like commas or backticks myself, they make a visible break. Until you have time to get better acquainted with shell syntax and its interactive assists, get in the habit of single-quoting any argument that has anything but alphanumerics or +-_/,. You'll gradually find more safe ones, but bang ("!") is high-priority metasyntax for interactively constructing command lines, fast, from pieces of earlier ones.

gtg, sorry if this was too elliptical, happy thanksgiving,
Jim
 
Old 11-23-2011, 10:56 AM   #14
crts
Senior Member
 
Registered: Jan 2010
Posts: 1,604

Rep: Reputation: 446Reputation: 446Reputation: 446Reputation: 446Reputation: 446
Quote:
Originally Posted by Lokelo View Post
I couldn't get the version of sed. I'm working remotly on a high performance computer and sed -V (or v) didn't come up with a version number.
I just started working with linux a month ago, so I'm not 100% sure yet what I'm doing all the time.

The lines I posted above containing the error message were directly copy/pasted from the terminal.

Crts, you said if I run it with double quotes it will try to redo a previous command, but I ran it with single quotes.

The thing is, it runs perfectly fine if I use it within a script (see below), but I got the error message when using them directly in the terminal.
Hmm, this is strange. But since you are working remotely I wonder which shell you are using on the remote system.
Another alternative would be to deactivate history expansion. In bash you can do it with
Code:
set +H
The reason why it works inside a script is because history expansion does not work inside a script.

Can you post the name of the system and the shell you are logged in? And how do you log in (ssh, telnet ...)? This is definitely not a 'sed' issue.

Last edited by crts; 11-23-2011 at 10:57 AM.
 
Old 11-23-2011, 06:15 PM   #15
Lokelo
LQ Newbie
 
Registered: Nov 2011
Location: Townsville
Posts: 15

Original Poster
Rep: Reputation: Disabled
That seems to have done the trick. Thanks for all your help.
I agree, I like the look of commas. I will make sure that I acknowledge this forum in my thesis for your continued help!

Just to be complete:

Code:
/bin/tcsh
Linux login 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.1 (Santiago)


GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.

Last edited by Lokelo; 11-23-2011 at 06:16 PM.
 
  


Reply

Tags
regular expression, sed


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Using sed with wildcards elliot01 Linux - Newbie 6 11-26-2010 05:23 AM
Sed qestion - how to use wildcards tensigh Linux - Software 5 03-04-2010 09:55 PM
wildcards with the "rename" command mattn Linux - General 3 05-13-2004 07:43 PM
ls command line, color and wildcards ioio85 Linux - Newbie 1 05-13-2004 05:01 AM
sed command linuxdev Linux - Newbie 9 02-24-2004 04:50 PM


All times are GMT -5. The time now is 09:26 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration