use sed to find string pattern and delete subsequent characters
I have a file with a number of strings like the ones below
string1#m1asdfe23easdf23wefas string2#mfaaeb2vr1rhserh anotherstring#ji89ensrsegr anotherone#m1ynmdt324nsdt I'm trying to delete everything after #** so that string1#maasdfeaveasdfawefas string2#mfaaebvrserhserh becomes string1#ma string2#mf tried sed 's/#..*//g' but as you all will know it returns string1, string2 etc. |
Code:
echo 'string1#m1asdfe23easdf23wefas' | sed 's/\(.*\#..\).*/\1/g' |
So what is the rule?
For example, is it: 'find the first "#" and delete everything after "#" plus 2 characters'? Your code finds the pattern: '"#", followed by any character, then any number of characters' Try this: Code:
sed 's/\(#..\).*/\1/' filename Go here for a really good SED tutorial: http://www.grymoire.com/Unix/Sed.html |
Quote:
<<Edit: It was later established that I was wrong>> |
Quote:
Code:
-bash-3.2$ cat test Code:
-bash-3.2$ sed 's/\(.*\#..\).*/\1/g' test It worked for me !! Happy to see your views !! |
Thanks to both pixellany and PMP.
I've tried both solutions and they both seem to work. Also curious as to why PMP's ideal. pixellany, I'm using yours as it seems to work faster on a big file. |
Hi,
Both pixellany's and PMP's example work. @PMP: The first .* and the g option are not needed (but don't do any harm for the task at hand). @pixellany: I would have chosen your example, but the "missing" first part (everything up to the #) can be confusing if you are not familiar with sed. As long as the first .* in PMP's example is part of the back referencing all is ok. |
@PMP - I think what pixellany is referring to is if you change the pattern to have anymore hashes (#) in it then yours will be a little greedy. Try this string:
Code:
string1#m1asdfe23easdf2#3wefas |
@grail,
I took the sample data provided by OP and where it is mentioned Quote:
|
I was in error!! The confusion was in the fact that the backreference in PMP's solution was replacing everything on the first part of line, whereas mine replaces on what starts with "#..". It was not obvious at a glance that they were doing the same thing.
In PMP's solution, why does the "#" have to be escaped? |
It can be ignored, Thought of playing safe, later did not removed it :(
|
Isn't pixellany's solution more robust (not to mention more efficient) for the reason stated by grail in post#8?
|
Hi,
@rkski: pixellany's solution is indeed "better". The OP noticed that (see post #6). |
Quote:
Originally Posted by pixellany View Post This is not going to work....I can explain later (Have to be on a conference call.) Code: -bash-3.2$ cat test string1#m1asdfe23easdf23wefas string2#mfaaeb2vr1rhserh anotherstring#ji89ensrsegr anotherone#m1ynmdt324nsdt Code: -bash-3.2$ sed 's/\(.*\#..\).*/\1/g' test string1#m1 string2#mf anotherstring#ji anotherone#m1 It worked for me !! Happy to see your views !! How is it if I want to print only after that pattern? |
OP asked for a sed solution and a good one has already been posted.
Therefore I will contribute an awk solution. With this InFile ... Code:
string1#m1asdfe23easdf23wefas Code:
awk '{print substr($0,1,index($0,"#")+2)}' $InFile >$OutFile Code:
string1#m1 |
All times are GMT -5. The time now is 12:14 AM. |