LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Remove string in sed (http://www.linuxquestions.org/questions/programming-9/remove-string-in-sed-482983/)

twantrd 09-13-2006 02:55 AM

Remove string in sed
 
Hi,

I have a log file and I want to remove a string for parsing purposes. My log file looks like this

Code:

2006-09-12 19:53:27 1GNKsR-0001yZ-00 <= root@blah.com U=root P=local S=344
I want to remove the username, U=root. However, the username can be anything else as well such as Sandy, Mike, Joe, etc. Therefore, I want to remove U=* but NOT the rest. The result should be

Code:

2006-09-12 19:53:27 1GNKsR-0001yZ-00 <= root@blah.com P=local S=344
My sed syntax is
Code:

cat /tmp/log | sed '/U=.*/,/^$/d'
So, I'm trying to match everything after U= and stop at a blank line but it's not working right. Can someone shed some light? Thanks!

-twantrd

druuna 09-13-2006 03:02 AM

Hi,

This should work:

sed 's/ U=.* P/ P/' <infile>

It assumes that there's always a P= entry after U=<user>.

Hope this helps.

spirit receiver 09-13-2006 04:07 AM

... and if you want to stop at space characters instead, you can use
Code:

sed 's/ U=[^[:space:]]*//'

makyo 09-13-2006 09:26 AM

Hi, twantrd.
Quote:

Originally Posted by twantrd
Can someone shed some light?

You received 2 good specific answers for the problem.

One of the principles I try to get across to students in my classes is that regular expressions are by design greedy. They will match the longest possible string. So if you need to match fewer characters, you need to supply a constraint of some kind. As you saw, that took the form of additional text that followed the .* part of the pattern.

That being said, however, it is easy to forget details like that when we're busy with a million other things ... cheers, makyo (26)

twantrd 09-13-2006 01:15 PM

Thanks everyone!

Druuna, I understand your syntax. Makes sense. Thanks!

Spirit receiver, can you explain how yours works? I understand that [[:space:]] means an actual space but what does [^[:space:]]* mean? I thought [^ ]* will match everything EXCEPT a space. So, wouldn't the syntax be like this:

sed 's/ U=*[^[:space:]]//' - Match everything after U= but do not match space.

Obviously, this doesn't work as I've just tried it. Thanks.

-twantrd

druuna 09-13-2006 01:30 PM

Hi,

Strange it doesn't work (it does at my end).

About the [^[:space:]]* construct: This will allow anything except (the ^) whitespace.

And yes, you could also write: [^ ]*. But if there's a tab between root and P= instead of a 'real' space, [^ ]* will be greedy and the P= part will also be targeted (wherever the first 'real' space is found).

Hope this clears things up a bit.

soggycornflake 09-13-2006 01:34 PM

Quote:

Originally Posted by twantrd
Thanks everyone!

Druuna, I understand your syntax. Makes sense. Thanks!

Spirit receiver, can you explain how yours works? I understand that [[:space:]] means an actual space but what does [^[:space:]]* mean? I thought [^ ]* will match everything EXCEPT a space. So, wouldn't the syntax be like this:

sed 's/ U=*[^[:space:]]//' - Match everything after U= but do not match space.

Obviously, this doesn't work as I've just tried it. Thanks.

-twantrd

I think you're confusing the shell meta-character * with the regex repetition operator *.

spirit receiver's pattern '...[^[:space:]]*...' says

[ match a character class (matches 1 character)
^ not (i.e. a character that doesn't match one of the following)
[:space:] match a space character
] closing outer bracket
* repeated zero or more times.

i.e. match any sequence of characters (including nothing) that do not contain a space (or newline/cr/etc).

Whereas your string '...U=*[^[:space:]]...' matches U followed by any number of = characters, followed by any character except a space.

Bear in mind that in regular expressions, * doesn't match anything on its own (nor does ?, + etc), these are repetition operators which apply to the preceding pattern.

twantrd 09-13-2006 03:28 PM

Ahh ok that makes sense. Thanks for the clarification!

-twantrd


All times are GMT -5. The time now is 09:42 AM.