[SOLVED] Using sed to substitute repetition -> single occurrence

rm_-rf_windows · 03-06-2012, 05:00 PM

Hi,

I want to use sed to substitute an occurrence of "a a a a a" (a,space,a,space,a,space,a,space,a ... any number of a's like this) with "a".

I just can't figure it out. I've googled it...

Thanks in advance.

PS. - I think it'd be cool to have a special forum section for command line stuff... I wasn't sure where to post this.

sycamorex · 03-06-2012, 05:19 PM

Not sure if that'll suit your needs. You'd have to be more specific in your example. Ideally provide some sample input.

Code:

sed 's/\(\<a\>\).*/\1/' infile

EDIT: My example is not good for you. I can think of a few situations where it'll fail.

crts · 03-06-2012, 05:25 PM

Quote:

Originally Posted by rm_-rf_windows

Hi,

I want to use sed to substitute an occurrence of "a a a a a" (a,space,a,space,a,space,a,space,a ... any number of a's like this) with "a".

I just can't figure it out. I've googled it...

Thanks in advance.

PS. - I think it'd be cool to have a special forum section for command line stuff... I wasn't sure where to post this.

Hi,

how about this:

Code:

$ echo 'a a a a a hab' |sed -r 's/(a *)*/a/'
ahab
$ echo 'a a a a ahab' |sed -r 's/(a *)*/a/'
ahab

Is this what you mean? I have to agree with sycamorex that your example is a bit vague.

PS: The command line stuff is usually fine in Linux General or in Programming if the problem is a bit more complex.

Dark_Helmet · 03-06-2012, 05:31 PM

There are lots of things sed can do, but I don't know that this problem is best suited for it. Though, some sed guru may come along with a quick one liner.

That said, my immediate thought for this problem would use tr and uniq. If the line of text is in a shell variable for instance, substitute newlines for the spaces, then send the result through uniq, and then substitute spaces for the newlines.

The result should be a line where any repeated block of non-space characters is reduced to a single occurrence.

There may be some input "sanitizing" that may need to be done (i.e. convert multiple spaces to a single space). Also, this approach would not properly handle a repeated sequence that spans multiple lines. Then again, neither would sed without some additional complication.

EDIT:
Oh... yeah, I'm looking at this from the more general perspective that you do not know the exact text that will be repeated beforehand.

EDIT2:
Since my response feels naked without an example:

Code:

user@localhost$ echo "they practically practically sell themselves themselves themselves" | \
tr ' ' '\n' | \
uniq | \
tr '\n' ' ' | \
sed 's@ $@\n@'
they practically sell themselves
user@localhost$

romagnolo · 03-06-2012, 10:40 PM

I think you are looking exactly for this:

Code:

sed -r 's/(a )+(a| )?/a/g' your_file >tmp; mv tmp your_file

If your_file contains this:

Quote:

I'm practica a a a ally selling myself.

your command will do:

Code:

$ sed -r 's/(a )+(a| )?/a/g' your_file >tmp; mv tmp your_file
I'm practically selling myself.

For reference, the only True manual of sed is the one written by Lee E. McMahon in 1978, here.

rm_-rf_windows · 03-07-2012, 05:15 AM

Ottimo Romagnolo, grazie mille!

It's exactly what I wanted (last post, the one before this). Is it possible to make this more generic, so that any repetition, the repetition of any character + space is replaced by the one character?

God, I spent 4 hours trying to figure it out.

Sei italiano?

Ciao.

romagnolo · 03-07-2012, 06:26 AM

This works like a charm:

Code:

sed -r 's/(([[:graph:]]) )+(\2){1}/\2/g' your_file >tmp; mv tmp your_file

Quote:

Originally Posted by rm_-rf_windows

Sei italiano?

Romagnolo!

rm_-rf_windows · 03-07-2012, 07:09 AM

Ciao Romagnolo,

Grazie per la risposta, ma da me non funziona! (I haven't said anything obscene here, only that his last code snippet doesn't work on my end)...

Romagnolo, eh? Mi piacerebbe essere in Italia ora (I'd like to be in Italy now).

A dopo.

rm_-rf_windows · 03-07-2012, 07:11 AM

Incidentally, thanks for everybody who replied to this post. I'll try your suggestions too.

rm_-rf_windows · 03-18-2012, 10:12 AM

A solution to this simple problem has been found at http://www.linuxquestions.org/questi...rrence-933087/ at message #3.

The solution uses the command line program "uniq".

Many thanks to all contributors to this and the linked thread.