LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-10-2009, 02:03 PM   #1
kmkocot
Member
 
Registered: Dec 2007
Location: Tuscaloosa, AL
Posts: 126

Rep: Reputation: 15
Question Need help with sed to modify only lines of text meeting certain criteria


Hi all,

I have a large text file with the format below:

>MCAL_43C14_r_00 872 1 872
GGCCCTTATGGCCTTTTTTTTTTTTTTTTCAAACTTTATAAAAGCTTTAA
TTGGTAGTTTGCTCCTTTAAATGGTAAAATCACAGATAAATTTATTGTGA
TAATTGTCTAGATGATTTTACAAGCAGTATAAATACATAATTGTAAACTC
AGTATATCTGCACAGAGAACAGAATAATTATACTTTTCGCAACTCGTTTC
GACGGTAAGAATGCACCAATTATATCGTCTATGCATGGTTCTCTTTCAAA
ATCTTAAAAATTGTGGGTAACTTTATTGTGTGCACGCCTGAAAGCTAGAA
TGAACTAATTTCTATTGTCCATAAATTTCCTTCAAAATATAGTAGTATAT
TGTAGTGACTTAAATTGGTCGTATTACATGACGTAATTGACGCCACTCCA
TTGGTTGGTAATCCATTTTCAGGATGATGTTGTCCAATCACACGTTTCGG
TCAGCACTTTTGGGAAATATTTCCCAGAATGCATCACATTCTTAAACGAT
TAATTGATATAGACAGATGTTCTTTTTGTTCTTGCTGCAAATAATGATTC
ATGAGACTATAATAATTATACATAGAACATCTTTAAATAAATGAAATTCA
TGAAAATCAAACAGCAGCAACCCGCGGAGTAAAGTGCATTCTCGTCATAT
TTCATACTTTGTCAGATTTATAAACTTTACTGGTATATTTGAGTTCAGTG
TAGATTTTCCATCTTAGCAGTAACGATTTGCTAAATAACATAAATGAGAC
ATATAAAAGCTTAATAAACGCCAACTACCAACAGATATATCTTTAAAAGC
GAAAGCCAACTCTTTTGCCATTTCATCAGTTGAAATCAGCATTTCAGAGG
CACTTATGTTCATGAAAAAATT
>MCAL_52K01_f_00 766 1 766
TAAAAAGAAAAATATCTTGAAGTCTAAAGGTAACTTGAAACACATTTGTT
GGAAAAAGTTCCTTCTTGGAGATCAGTTACCAACAGGTTTTCCAGGGATA
TATGACAAATATTTCATTTGTCAGCGCTTTGCACAGGATGATACGTATAT
AGGATCCACGTCTAGGAAGGCTTTCTATGGTACTGTTTATGATACAAAAC
AACGTGTTCCTCTGATGTCTTTTGGAAGACTAAGAAACCTCTCAGATACT
TCAAAACCACTAATGAAGTTTATGATTGAAAAGGGTTTGGTGTCAACTAA
AAAGCATAAAAGTGTGGTATCGACAGTATACAACTGGCTGAATGGTGCAG
AAGGAAAAGGAATGTTCTACGACAATGGTGAAATTTCAGTCTGTAATCTA
GGTCAATTTCAGGCTGTAAACACCGATTATGATACCTCAGAGTACAAAAT
GCAACAACTTTTACCACACAGTCTCACAGGAAATGACGTAAGAGAGAAAA
TAGCAACATACACACTGACTAATACGGCGCCAATTCACACATCACTACAT
GGAATGTGGGAAACTGCTTTGTCAACTGCGCGTACTTTCGCTGTCGAAAA
GTGTGGAATTCCAGTACTTTTAAATCCCGTGAGGAGACAACGAAACAGAG
TATCACGTGACCATCCAGAGATGTATGTAATATCAGGTGCGGTATCATTA
AACGATGCTGATAGCACAATAGGGAATGGGGTAGCTGTTCCATATCTATT
TTGGTTCGCAGGATGC

I am trying to remove the trailing 3 sets of spaces and numbers (e.g., " 872 1 872") such that the first one would be renamed >MCAL_43C14_r_00 and the second one would be renamed >MCAL_52K01_f_00. I know sed is at least in part the tool for the job but I'm stuck. I must not have the search pattern formatted correctly but I can't figure out what is wrong with it. Also, how do I ask sed to leave the characters I like untouched?

sed -n 's/>MCAL_[0-9][0-9][A-Z][0-9][0-9]_[fr]_[0-9][0-9] [0-9][0-9][0-9] 1 [0-9][0-9][0-9]/>MCAL_[0-9][0-9][A-Z][0-9][0-9]_[fr]_[0-9][0-9]/g' Mytilus_californianus.txt

Thanks!!!
Kevin
 
Old 08-10-2009, 02:10 PM   #2
forrestt
Senior Member
 
Registered: Mar 2004
Location: Cary, NC, USA
Distribution: Fedora, Kubuntu, RedHat, CentOS, SuSe
Posts: 1,288

Rep: Reputation: 99
Provided that all the lines are formatted as above, the code:

Code:
sed -e 's/ .\+//' Mytilus_californianus.txt
should do the trick.

HTH

Forrest
 
Old 08-10-2009, 02:11 PM   #3
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Does
Code:
awk '{print $1}' file > output
not work?

edit: or even
Code:
cut -f1 < file > output
?
 
Old 08-10-2009, 02:15 PM   #4
indienick
Senior Member
 
Registered: Dec 2005
Location: London, ON, Canada
Distribution: Arch, Ubuntu, Slackware, OpenBSD, FreeBSD
Posts: 1,853

Rep: Reputation: 65
Several things; the sed command you have there won't work; you have to supply the "-e" parameter just before the regular expression. Also, your sed command will just print out the changes, but not actually commit them. You will need to supply the "-i" parameter ("edit in-place") and definitely with a filename suffix for the original. Your sed command should look something like this:
Code:
sed -n -i.orig -e 's/>MCAL_[0-9][0-9][A-Z][0-9][0-9]_[fr]_[0-9][0-9] [0-9][0-9][0-9] 1 [0-9][0-9][0-9]/>MCAL_[0-9][0-9][A-Z][0-9][0-9]_[fr]_[0-9][0-9]/g' Mytilus_californianus.txt
Also, you might also want to look into awk. I would highly suggest looking to see what awk will do when you pass it your text file as an argument and see how it splits your file up into fields. Also check out cut.
 
Old 08-10-2009, 02:25 PM   #5
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
First, be sure that you identify what the general pattern is---either the one to keep or the one to discard.

Let's suppose that we use this for the pattern to KEEP:
">MCAL", any 2 digits, "_","f" or "r","_", any two digits

You can use a "backreference" to find any line containing the pattern and then discard everything except the pattern

The general form (assumes pattern at the beginning of the line):

Code:
sed 's/\(pattern\).*/\1/' filename > newfilename      ##matches pattern plus everything following, and replaces with pattern
With your pattern that you alread defined:

Code:
sed 's/\(>MCAL_[0-9][0-9][A-Z][0-9][0-9]_[fr]_[0-9][0-9]\).*/\1/' filename > newfilename
 
Old 08-11-2009, 06:02 AM   #6
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
There's a couple of comments in this thread about the -e flag in SED. Note that this is required only when using multiple commands within the same sed invocation.

From the Grymoire tutorial:
http://www.grymoire.com/Unix/Sed.html#uh-13
 
Old 08-11-2009, 12:09 PM   #7
mrtiller
LQ Newbie
 
Registered: Aug 2009
Posts: 3

Rep: Reputation: 0
Quote:
Originally Posted by pixellany View Post
There's a couple of comments in this thread about the -e flag in SED. Note that this is required only when using multiple commands within the same sed invocation.

From the Grymoire tutorial:
http://www.grymoire.com/Unix/Sed.html#uh-13
Yes, and another alternative for multiple commands is to use newlines to separate the commands. In this case, you also do not need the -e. example:

Code:
sed '
s/xxx/yyy/
s/zzz/www/
...
'
 
Old 08-11-2009, 12:10 PM   #8
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Quote:
Originally Posted by mrtiller View Post
Yes, and another alternative for multiple commands is to use newlines to separate the commands. In this case, you also do not need the -e. example:

Code:
sed '
s/xxx/yyy/
s/zzz/www/
...
'
Or semi-colons, I believe:
Code:
sed 's/xxx/yyy/;s/zzz/www/'
 
Old 08-11-2009, 01:13 PM   #9
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
Quote:
Originally Posted by pwc101 View Post
Or semi-colons, I believe:
Code:
sed 's/xxx/yyy/;s/zzz/www/'
How about that!! So why do we have the -e flag? Maybe for readability?
 
Old 08-11-2009, 03:11 PM   #10
pwc101
Senior Member
 
Registered: Oct 2005
Location: UK
Distribution: Slackware
Posts: 1,847

Rep: Reputation: 128Reputation: 128
Quote:
Originally Posted by pixellany View Post
How about that!! So why do we have the -e flag? Maybe for readability?
No idea, I'm afraid, but I remember seeing it once, and thought it made sense
 
Old 08-17-2009, 10:24 AM   #11
kmkocot
Member
 
Registered: Dec 2007
Location: Tuscaloosa, AL
Posts: 126

Original Poster
Rep: Reputation: 15
Question

Thank you all for your help!

Forrestt, can you explain what the pattern in the search term of the sed command you gave means? I realize that it translates to "everything after the space" but I don't understand how.

The command was:
Code:
sed -e 's/ .\+//'
 
Old 08-17-2009, 11:08 AM   #12
forrestt
Senior Member
 
Registered: Mar 2004
Location: Cary, NC, USA
Distribution: Fedora, Kubuntu, RedHat, CentOS, SuSe
Posts: 1,288

Rep: Reputation: 99
The s/// means substitute. The "." means match any character. The "+" means one or more times, but it must be escaped so that is isn't read as an actual "+" sign. You could have also used a "*" instead of the "+". It means zero or more times.

So, you get, substitute a space followed by any character one or more times and replace it with an empty string. This removes the chars after the first space.

HTH

Forrest
 
Old 08-17-2009, 11:50 AM   #13
pixellany
LQ Veteran
 
Registered: Nov 2005
Location: Annapolis, MD
Distribution: Mint
Posts: 17,809

Rep: Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743Reputation: 743
If you are confused by the escape (\), you can also turn on extended regular expressions using the -r flag.

Thus, this works:
Code:
sed -e -r 's/ .+//'
With the -r flag, a literal "+" would then require the escape.

So, I guess there really is no way to escape the escape....... NO--WAIT: There is:
If you escape an escape, then it's not an escape......e.g. "\\" means a literal "\"

Are you confused yet?......

Last edited by pixellany; 08-17-2009 at 11:54 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
sed or grep : delete lines containing matching text raj000 Linux - General 18 09-08-2012 09:38 AM
sed/grep new lines in text hk20 Linux - Newbie 2 09-13-2008 06:47 PM
Modify a text files with awk/sed/perl climber75 Programming 15 08-05-2008 03:35 PM
Replacing text on specific lines with sed or awk? Lantzvillian Linux - Newbie 5 10-17-2007 09:00 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 04:56 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration