[SOLVED] use sed to find string pattern and delete subsequent characters

jigg_fly · 05-03-2010, 07:11 AM

I have a file with a number of strings like the ones below

string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

I'm trying to delete everything after #** so that

string1#maasdfeaveasdfawefas
string2#mfaaebvrserhserh

becomes

string1#ma
string2#mf

tried sed 's/#..*//g' but as you all will know it returns string1, string2 etc.

PMP · 05-03-2010, 07:19 AM

Code:

echo 'string1#m1asdfe23easdf23wefas'  | sed 's/\(.*\#..\).*/\1/g'

pixellany · 05-03-2010, 07:23 AM

So what is the rule?

For example, is it: 'find the first "#" and delete everything after "#" plus 2 characters'?

Your code finds the pattern: '"#", followed by any character, then any number of characters'

Try this:

Code:

sed 's/\(#..\).*/\1/' filename

This uses a backreference to capture "#" plus any 2 characters (as part of the total matched expression), and re-insert that pattern in place of the total match.

Go here for a really good SED tutorial:
http://www.grymoire.com/Unix/Sed.html

pixellany · 05-03-2010, 07:53 AM

Quote:

Originally Posted by PMP

Code:

echo 'string1#m1asdfe23easdf23wefas'  | sed 's/\(.*\#..\).*/\1/g'

This is not going to work....I can explain later (Have to be on a conference call.)
<<Edit: It was later established that I was wrong>>

PMP · 05-03-2010, 07:56 AM

Quote:

Originally Posted by pixellany

This is not going to work....I can explain later (Have to be on a conference call.)

Code:

-bash-3.2$ cat test
string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

Code:

-bash-3.2$  sed 's/\(.*\#..\).*/\1/g' test
string1#m1
string2#mf
anotherstring#ji
anotherone#m1

It worked for me !! Happy to see your views !!

jigg_fly · 05-03-2010, 08:02 AM

Thanks to both pixellany and PMP.

I've tried both solutions and they both seem to work. Also curious as to why PMP's ideal.

pixellany, I'm using yours as it seems to work faster on a big file.

druuna · 05-03-2010, 08:07 AM

Hi,

Both pixellany's and PMP's example work.

@PMP: The first .* and the g option are not needed (but don't do any harm for the task at hand).

@pixellany: I would have chosen your example, but the "missing" first part (everything up to the #) can be confusing if you are not familiar with sed. As long as the first .* in PMP's example is part of the back referencing all is ok.

grail · 05-03-2010, 08:07 AM

@PMP - I think what pixellany is referring to is if you change the pattern to have anymore hashes (#) in it then yours will be a little greedy. Try this string:

Code:

string1#m1asdfe23easdf2#3wefas

PMP · 05-03-2010, 08:16 AM

@grail,
I took the sample data provided by OP and where it is mentioned

Quote:

I'm trying to delete everything after #** so that

Even I am waiting for pixellany's view. Let him finish his con-call

pixellany · 05-03-2010, 08:50 AM

I was in error!! The confusion was in the fact that the backreference in PMP's solution was replacing everything on the first part of line, whereas mine replaces on what starts with "#..". It was not obvious at a glance that they were doing the same thing.

In PMP's solution, why does the "#" have to be escaped?

PMP · 05-03-2010, 08:56 AM

It can be ignored, Thought of playing safe, later did not removed it

rkski · 05-03-2010, 01:15 PM

Isn't pixellany's solution more robust (not to mention more efficient) for the reason stated by grail in post#8?

druuna · 05-03-2010, 01:18 PM

Hi,

@rkski: pixellany's solution is indeed "better". The OP noticed that (see post #6).

Ayubstation · 10-09-2013, 03:21 PM

Quote:

Originally Posted by pixellany View Post

This is not going to work....I can explain later (Have to be on a conference call.)

Code:
-bash-3.2$ cat test
string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

Code:
-bash-3.2$ sed 's/$.*\#..$.*/\1/g' test
string1#m1
string2#mf
anotherstring#ji
anotherone#m1

It worked for me !! Happy to see your views !!

How is it if I want to print only after that pattern?

danielbmartin · 10-09-2013, 04:53 PM

OP asked for a sed solution and a good one has already been posted.
Therefore I will contribute an awk solution.

With this InFile ...

Code:

string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

... this awk ...

Code:

awk '{print substr($0,1,index($0,"#")+2)}' $InFile >$OutFile

... produced this OutFile ...

Code:

string1#m1
string2#mf
anotherstring#ji
anotherone#m1

Daniel B. Martin