Linux - NewbieThis Linux forum is for members that are new to Linux.
Just starting out and have a question?
If it is not in the man pages or the how-to's this is the place!
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
I have an additional problem with the sed command.
I would like to replace the string "1:N:0:ACAGTG" with /1. However, I recently found out that the N can also be a Y, and the 0 can also be a number with 1 to 4 digits.
The basic command I'm using is: sed -i 's! 1:N:0:ACAGTG!/1!g' NQ001/NQ001_R1r.fastq
I tried to use wildcards (i.e. 1:*:*:ACAGTG, 1:*:**:ACAGTG, 1:*:***:ACAGTG, 1:*:****:ACAGTG in four different sed commands) but it didn't work. Any ideas how I can replace them all? There could be about a hundert variations of the numbers in about 30 million entries per file and I don't want to replace them individually.
I think you're confusing the shell's wildcard * with regular-expressions' 0-n repeat * operator. Your 1:*:*:ACAGTG (etc.) specifies a 1 followed by any number of colons followed by any other number of colons followed by a single colon .... but nowhere in there are you searching for anything between the colons. The only text your expression can match is " 1" followed by at least one colon followed by ACAGTG.
What I think you want is sed -i 's, 1:[^:]*:[^:]*:ACAGTG,/1,g'. That'll match " 1:Flew:OvertheCuckoo'sNest:ACAGTG" so you may want to hunt up how to restrict the matches a bit better.
I am curious where you are running this as 'Event not found' is not an error message I have ever seen before from sed??
I have tested and the solutions seem to work ok for me.
[jc167987@login NQ017]$ sed -i 's! 1:[^:]*:[^:]*:GCCAAT!/1!g' NQ017_R1r.fastq
/1!g: Event not found.
[jc167987@login NQ017]$ sed -ri 's! 1:[NY]:[0-9]{1,4}:GCCAAT!/1!g' NQ017_R1r.fastq
/1!g: Event not found.
Edit: actually, if I run it in a script it seems to work, but I have only tested the first command so far.
Hi,
the ! runs the last command that matches the following letters (history expansion). Example:
Code:
$ echo hello
hello
$ !ec
echo hello
hello
$
If you run the above 'sed' with double-quotes instead of single-quotes then bash will try to match a command that you issued earlier that starts with
/1g
Since there is no such command it gives the 'event not found' error. Are you sure that you used single-quotes? Those should prevent this kind of error.
Thanks for all the answers and sorry about the double posting of my question.
I couldn't get the version of sed. I'm working remotly on a high performance computer and sed -V (or v) didn't come up with a version number.
I just started working with linux a month ago, so I'm not 100% sure yet what I'm doing all the time.
The lines I posted above containing the error message were directly copy/pasted from the terminal.
Crts, you said if I run it with double quotes it will try to redo a previous command, but I ran it with single quotes.
The thing is, it runs perfectly fine if I use it within a script (see below), but I got the error message when using them directly in the terminal.
As I said, I managed to replace the bit I wanted using the follwing script:
Code:
#!/bin/bash
sed -i 's! 1:[^:]*:[^:]*:GCCAAT!/1!g' /home/11/jc167987/NGSdata/Data/NQ017/NQ017_R1r.fastq &
sed -i 's! 1:[^:]*:[^:]*:CAGATC!/1!g' /home/11/jc167987/NGSdata/Data/NQ040/NQ040_R1r.fastq &
sed -i 's! 1:[^:]*:[^:]*:ACTTGA!/1!g' /home/11/jc167987/NGSdata/Data/NQ136/NQ136_R1r.fastq &
sed -i 's! 1:[^:]*:[^:]*:GATCAG!/1!g' /home/11/jc167987/NGSdata/Data/NQ283/NQ283_R1r.fastq &
sed -i 's! 2:[^:]*:[^:]*:GCCAAT!/2!g' /home/11/jc167987/NGSdata/Data/NQ017/NQ017_R2r.fastq &
sed -i 's! 2:[^:]*:[^:]*:CAGATC!/2!g' /home/11/jc167987/NGSdata/Data/NQ040/NQ040_R2r.fastq &
sed -i 's! 2:[^:]*:[^:]*:ACTTGA!/2!g' /home/11/jc167987/NGSdata/Data/NQ136/NQ136_R2r.fastq &
sed -i 's! 2:[^:]*:[^:]*:GATCAG!/2!g' /home/11/jc167987/NGSdata/Data/NQ283/NQ283_R2r.fastq &
I only tried the second command once in a script, which didn't work. But since the command above worked I didn't follow it up further, although the other command is a bit more elegant.
Last edited by Lokelo; 11-23-2011 at 06:12 PM.
Reason: slightly changed the dataoutput to remove accidental smileys
The files are about 5 Gb each. Am I correct in the understanding that if I didn't use &, they would just be carried out in sequence? And with the & they are done in parallel?
The above works and the changes were made in the files. I checked by grepping the adaptor sequences (I.e. the strings of A,C,T and G at the end of the ID) and the searches came up with nothing. Each file has about 30 million entries containing the four lines shown above. That's why I didn't find the variation in the numbers until I learned about grep, since they are comparably rare.
I have a limited time to assemble my transcriptomes from scratch and I as much as I would love to read up on everything I'm detail, I'm just focussing on what I need for the time being. Hence I use a lot of copying with just enough understanding to make it work.
However, I'm highly fascinated by this experience (when I was 16 I had the choice to go into chemistry or IT, and chemistry won, even though I still like using computers on a higher level) and certainly will get as much Linux knowledge as I can over time.
No need for the version number: crts nailed it. Lose the bangs ("!").
Instead of s!this!that!g use s,this,that,g or s`this`that`g or whatever.
I like commas or backticks myself, they make a visible break. Until you have time to get better acquainted with shell syntax and its interactive assists, get in the habit of single-quoting any argument that has anything but alphanumerics or +-_/,. You'll gradually find more safe ones, but bang ("!") is high-priority metasyntax for interactively constructing command lines, fast, from pieces of earlier ones.
gtg, sorry if this was too elliptical, happy thanksgiving,
Jim
I couldn't get the version of sed. I'm working remotly on a high performance computer and sed -V (or v) didn't come up with a version number.
I just started working with linux a month ago, so I'm not 100% sure yet what I'm doing all the time.
The lines I posted above containing the error message were directly copy/pasted from the terminal.
Crts, you said if I run it with double quotes it will try to redo a previous command, but I ran it with single quotes.
The thing is, it runs perfectly fine if I use it within a script (see below), but I got the error message when using them directly in the terminal.
Hmm, this is strange. But since you are working remotely I wonder which shell you are using on the remote system.
Another alternative would be to deactivate history expansion. In bash you can do it with
Code:
set +H
The reason why it works inside a script is because history expansion does not work inside a script.
Can you post the name of the system and the shell you are logged in? And how do you log in (ssh, telnet ...)? This is definitely not a 'sed' issue.
That seems to have done the trick. Thanks for all your help.
I agree, I like the look of commas. I will make sure that I acknowledge this forum in my thesis for your continued help!
Just to be complete:
Code:
/bin/tcsh
Linux login 2.6.32-131.17.1.el6.x86_64 #1 SMP Thu Sep 29 10:24:25 EDT 2011 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 6.1 (Santiago)
GNU sed version 4.2.1
Copyright (C) 2009 Free Software Foundation, Inc.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.