One sed command covers 3 patterns ?

thesunlover · 04-05-2018, 07:54 AM

Hi,

I need to write a script to delete the following pattern lines from some files.

johnsmith
johnsmith,
johnsmith:

"sed -i /^$username/d $file" doesn't work because users like johnsmith2 will be removed as well.

"sed -i /^$username[,:]/d $file" works for the "johnsmith," and "johnsmith:" cases.

"sed -i /^$username[$,:]/d $file" and "sed -i /^$username[\n,:]/d $file" don't work. The "johnsmith" line cannot be removed.

How can I use one sed command (or something else) to cover all these 3 cases?

Thank you in advance!

syg00 · 04-05-2018, 08:01 AM

sed will honour "|" (or) as well as "$" as an anchor.

hazel · 04-05-2018, 08:05 AM

Use the question mark. It's a wild multiplier that means once or not at all. Also there is a code for punctuation marks as a class. Look it up. You already seem to know enough about regular expressions to combine these two notations to give you what you want.

TB0ne · 04-05-2018, 08:10 AM

Quote:

Originally Posted by thesunlover

Hi,
I need to write a script to delete the following pattern lines from some files.

johnsmith
johnsmith,
johnsmith:

"sed -i /^$username/d $file" doesn't work because users like johnsmith2 will be removed as well.
"sed -i /^$username[,:]/d $file" works for the "johnsmith," and "johnsmith:" cases.
"sed -i /^$username[$,:]/d $file" and "sed -i /^$username[\n,:]/d $file" don't work. The "johnsmith" line cannot be removed.

How can I use one sed command (or something else) to cover all these 3 cases?

I'd suggest you pick up a book on regex's. They are complex, but handy, and in your case can be used to accommodate punctuation. Try:

Code:

sed -i -e '/^johnsmith[[:punct:]]$/d' -e '/^johnsmith$/d'

You don't say if these are just separate lines, or if they're part of larger lines (like "user johnsmith has logged in at noon"), but sed can accommodate multiple -e statements. The first will get anything that starts with johnsmith, followed by ANY punctuation. The second is just johnsmith with a newline. That will leave you with johnsmith2, johnsmithallen, etc.

syg00 · 04-05-2018, 08:17 AM

And in case you haven't guessed, there just might be a few ways of achieving your desired outcome.

How like linux ...

MadeInGermany · 04-05-2018, 08:28 PM

\| and \? are GNU extensions.
The latter is short for the more universal \{0,1\}
Here it must be followed by $ to ensure it's at the end of the line.
The [ ] should be within quotes to ensure the shell does not try a glob. Best have all sed code in quotes.

Code:

sed -i "/^$username[,:]\{0,1\}$/d" $file

thesunlover · 04-06-2018, 08:44 AM

Hi, Thank you very much all for your prompt replies that are very helpful !!

This compact one works very well:

sed -i "/^$username[[

unct:]]\{0,1\}$/d" $file

The only case that it can't do is "johnsmith ,". Any idea?

thesunlover · 04-06-2018, 09:01 AM

This one is really good "sed -i "/^$username[,:]\{0,1\}$/d" $file". Thanks MadeInGermany !

MadeInGermany · 04-06-2018, 09:12 AM

Ooh simileys. Please wrap your code in code tags (=> the # button at the top of the Wiki editor).

Quote:

Originally Posted by thesunlover

Hi, Thank you very much all for your prompt replies that are very helpful !!

This compact one works very well:

sed -i "/^$username[[:punct:]]\{0,1\}$/d" $file

The only case that it can't do is "johnsmith ,". Any idea?

You mean: allow an additional space before the punctuation character?

Code:

sed -i "/^$username[[:space:]]*[[:punct:]]\{0,1\}$/d" $file

Like [[:punct:]] that is a class of punctuation characters,
the [[:space:]] is a class of space characters (including Space, TAB, CR, vertical TAB).
There is also [[:blank:]] that is only a space or TAB character.
Followed by a * means it can occur zero or once or many times.

thesunlover · 04-06-2018, 09:17 AM

Hi,

A similar question: How to make the following line shorter, or put the three grep to one?

if grep -q ^$username$ $file || grep -q ^$username, $file || grep -q ^$username: $file ; then

Thanks.

thesunlover · 04-06-2018, 09:18 AM

Thank you much again MadeInGermany !

MadeInGermany · 04-06-2018, 10:09 AM

The same RE applies for grep (but no / / that in sed delimits the RE from other code):

Code:

if grep -q "^$username[:,]\{0,1\}$" $file; then

or

Code:

if grep -q "^$username[[:punct:]]\{0,1\}$" $file; then

or

Code:

if grep -q "^$username[[:blank:]]*[:,]\{0,1\}$" $file; then

or ...

thesunlover · 04-06-2018, 12:06 PM

Thanks MadeInGermany! Glad to know this good format working for grep as well.

Sorry for my mistake. I failed to make it clearer. Actually the patterns to remove are

johnsmith
johnsmith,John Smith...
johnsmith:John Smith...

So the following code doesn't fulfill the requests perfectly:

Code:

sed -i "/^$username[[:punct:]]\{0,1\}$/d" $file

TB0ne · 04-06-2018, 12:21 PM

Quote:

Originally Posted by thesunlover

Thanks MadeInGermany! Glad to know this good format working for grep as well.

Sorry for my mistake. I failed to make it clearer. Actually the patterns to remove are

johnsmith
johnsmith,John Smith...
johnsmith:John Smith...

So the following code doesn't fulfill the requests perfectly:

Code:

sed -i "/^$username[[:punct:]]\{0,1\}$/d" $file

I had asked you in post #4 for details about the input strings, and you failed to provide them. You've been given a lot of advice here, along with the solution to this particular problem, but you will have to think about it.

The "$" means "end-of-line". Therefore, "johnsmith,John Smith" won't match "johnsmith$", will it? The solution I gave you in post #4 can easily be modified to include the references about blank spaces as well. You should experiment and find your solution, since all the pieces have already been given to you.

MadeInGermany · 04-06-2018, 12:54 PM

This can be handled by a  group marker. A following quantifier handles the whole group.

Code:

sed -i "/^$username\([,:].*\)\{0,1\}$/d" $file

It becomes easier if we switch the RE type from BRE (basic regular expression) to ERE (extended regular expression, that is also in egrep or grep -E and in awk and in perl and ...)

Code:

sed -r -i "/^$username([,:].*){0,1}$/d"$file

or even shorter

Code:

sed -r -i "/^$username([,:].*)?$/d" $file

Last but not least, nothing speaks against two simple commands, as TB0ne posted already:

Code:

sed -i "/^$username$/d; /^$username[,:].*$/d" file