LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   use sed to find string pattern and delete subsequent characters (https://www.linuxquestions.org/questions/programming-9/use-sed-to-find-string-pattern-and-delete-subsequent-characters-805626/)

jigg_fly 05-03-2010 07:11 AM

use sed to find string pattern and delete subsequent characters
 
I have a file with a number of strings like the ones below

string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

I'm trying to delete everything after #** so that

string1#maasdfeaveasdfawefas
string2#mfaaebvrserhserh

becomes

string1#ma
string2#mf

tried sed 's/#..*//g' but as you all will know it returns string1, string2 etc.

PMP 05-03-2010 07:19 AM

Code:

echo 'string1#m1asdfe23easdf23wefas'  | sed 's/\(.*\#..\).*/\1/g'

pixellany 05-03-2010 07:23 AM

So what is the rule?

For example, is it: 'find the first "#" and delete everything after "#" plus 2 characters'?

Your code finds the pattern: '"#", followed by any character, then any number of characters'


Try this:
Code:

sed 's/\(#..\).*/\1/' filename
This uses a backreference to capture "#" plus any 2 characters (as part of the total matched expression), and re-insert that pattern in place of the total match.

Go here for a really good SED tutorial:
http://www.grymoire.com/Unix/Sed.html

pixellany 05-03-2010 07:53 AM

Quote:

Originally Posted by PMP (Post 3955582)
Code:

echo 'string1#m1asdfe23easdf23wefas'  | sed 's/\(.*\#..\).*/\1/g'

This is not going to work....I can explain later (Have to be on a conference call.)
<<Edit: It was later established that I was wrong>>

PMP 05-03-2010 07:56 AM

Quote:

Originally Posted by pixellany (Post 3955603)
This is not going to work....I can explain later (Have to be on a conference call.)

Code:

-bash-3.2$ cat test
string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

Code:

-bash-3.2$  sed 's/\(.*\#..\).*/\1/g' test
string1#m1
string2#mf
anotherstring#ji
anotherone#m1


It worked for me !! Happy to see your views !!

jigg_fly 05-03-2010 08:02 AM

Thanks to both pixellany and PMP.

I've tried both solutions and they both seem to work. Also curious as to why PMP's ideal.

pixellany, I'm using yours as it seems to work faster on a big file.

druuna 05-03-2010 08:07 AM

Hi,

Both pixellany's and PMP's example work.

@PMP: The first .* and the g option are not needed (but don't do any harm for the task at hand).

@pixellany: I would have chosen your example, but the "missing" first part (everything up to the #) can be confusing if you are not familiar with sed. As long as the first .* in PMP's example is part of the back referencing all is ok.

grail 05-03-2010 08:07 AM

@PMP - I think what pixellany is referring to is if you change the pattern to have anymore hashes (#) in it then yours will be a little greedy. Try this string:

Code:

string1#m1asdfe23easdf2#3wefas

PMP 05-03-2010 08:16 AM

@grail,
I took the sample data provided by OP and where it is mentioned

Quote:

I'm trying to delete everything after #** so that
Even I am waiting for pixellany's view. Let him finish his con-call :)

pixellany 05-03-2010 08:50 AM

I was in error!! The confusion was in the fact that the backreference in PMP's solution was replacing everything on the first part of line, whereas mine replaces on what starts with "#..". It was not obvious at a glance that they were doing the same thing.

In PMP's solution, why does the "#" have to be escaped?

PMP 05-03-2010 08:56 AM

It can be ignored, Thought of playing safe, later did not removed it :(

rkski 05-03-2010 01:15 PM

Isn't pixellany's solution more robust (not to mention more efficient) for the reason stated by grail in post#8?

druuna 05-03-2010 01:18 PM

Hi,

@rkski: pixellany's solution is indeed "better". The OP noticed that (see post #6).

Ayubstation 10-09-2013 03:21 PM

Quote:


Originally Posted by pixellany View Post

This is not going to work....I can explain later (Have to be on a conference call.)



Code:
-bash-3.2$ cat test
string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt


Code:
-bash-3.2$ sed 's/\(.*\#..\).*/\1/g' test
string1#m1
string2#mf
anotherstring#ji
anotherone#m1

It worked for me !! Happy to see your views !!


How is it if I want to print only after that pattern?

danielbmartin 10-09-2013 04:53 PM

OP asked for a sed solution and a good one has already been posted.
Therefore I will contribute an awk solution.

With this InFile ...
Code:

string1#m1asdfe23easdf23wefas
string2#mfaaeb2vr1rhserh
anotherstring#ji89ensrsegr
anotherone#m1ynmdt324nsdt

... this awk ...
Code:

awk '{print substr($0,1,index($0,"#")+2)}' $InFile >$OutFile
... produced this OutFile ...
Code:

string1#m1
string2#mf
anotherstring#ji
anotherone#m1

Daniel B. Martin


All times are GMT -5. The time now is 12:14 AM.