LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Linux - General (https://www.linuxquestions.org/questions/linux-general-1/)
-   -   Using sed to substitute repetition -> single occurrence (https://www.linuxquestions.org/questions/linux-general-1/using-sed-to-substitute-repetition-single-occurrence-933087/)

rm_-rf_windows 03-06-2012 05:00 PM

Using sed to substitute repetition -> single occurrence
 
Hi,

I want to use sed to substitute an occurrence of "a a a a a" (a,space,a,space,a,space,a,space,a ... any number of a's like this) with "a".

I just can't figure it out. I've googled it...

Thanks in advance.

PS. - I think it'd be cool to have a special forum section for command line stuff... I wasn't sure where to post this.

sycamorex 03-06-2012 05:19 PM

Not sure if that'll suit your needs. You'd have to be more specific in your example. Ideally provide some sample input.

Code:

sed 's/\(\<a\>\).*/\1/' infile
EDIT: My example is not good for you. I can think of a few situations where it'll fail.

crts 03-06-2012 05:25 PM

Quote:

Originally Posted by rm_-rf_windows (Post 4620278)
Hi,

I want to use sed to substitute an occurrence of "a a a a a" (a,space,a,space,a,space,a,space,a ... any number of a's like this) with "a".

I just can't figure it out. I've googled it...

Thanks in advance.

PS. - I think it'd be cool to have a special forum section for command line stuff... I wasn't sure where to post this.

Hi,

how about this:
Code:

$ echo 'a a a a a hab' |sed -r 's/(a *)*/a/'
ahab
$ echo 'a a a a ahab' |sed -r 's/(a *)*/a/'
ahab

Is this what you mean? I have to agree with sycamorex that your example is a bit vague.

PS: The command line stuff is usually fine in Linux General or in Programming if the problem is a bit more complex.

Dark_Helmet 03-06-2012 05:31 PM

There are lots of things sed can do, but I don't know that this problem is best suited for it. Though, some sed guru may come along with a quick one liner.

That said, my immediate thought for this problem would use tr and uniq. If the line of text is in a shell variable for instance, substitute newlines for the spaces, then send the result through uniq, and then substitute spaces for the newlines.

The result should be a line where any repeated block of non-space characters is reduced to a single occurrence.

There may be some input "sanitizing" that may need to be done (i.e. convert multiple spaces to a single space). Also, this approach would not properly handle a repeated sequence that spans multiple lines. Then again, neither would sed without some additional complication.

EDIT:
Oh... yeah, I'm looking at this from the more general perspective that you do not know the exact text that will be repeated beforehand.

EDIT2:
Since my response feels naked without an example:
Code:

user@localhost$ echo "they practically practically sell themselves themselves themselves" | \
tr ' ' '\n' | \
uniq | \
tr '\n' ' ' | \
sed 's@ $@\n@'
they practically sell themselves
user@localhost$


romagnolo 03-06-2012 10:40 PM

I think you are looking exactly for this:
Code:

sed -r 's/(a )+(a| )?/a/g' your_file >tmp; mv tmp your_file
If your_file contains this:
Quote:

I'm practica a a a ally selling myself.
your command will do:
Code:

$ sed -r 's/(a )+(a| )?/a/g' your_file >tmp; mv tmp your_file
I'm practically selling myself.

For reference, the only True manual of sed is the one written by Lee E. McMahon in 1978, here.

rm_-rf_windows 03-07-2012 05:15 AM

Ottimo Romagnolo, grazie mille!

It's exactly what I wanted (last post, the one before this). Is it possible to make this more generic, so that any repetition, the repetition of any character + space is replaced by the one character?

God, I spent 4 hours trying to figure it out.

Sei italiano?

Ciao.

romagnolo 03-07-2012 06:26 AM

This works like a charm:
Code:

sed -r 's/(([[:graph:]]) )+(\2){1}/\2/g' your_file >tmp; mv tmp your_file
Quote:

Originally Posted by rm_-rf_windows (Post 4620670)
Sei italiano?

Romagnolo!

rm_-rf_windows 03-07-2012 07:09 AM

Ciao Romagnolo,

Grazie per la risposta, ma da me non funziona! (I haven't said anything obscene here, only that his last code snippet doesn't work on my end)...

Romagnolo, eh? Mi piacerebbe essere in Italia ora (I'd like to be in Italy now).

A dopo.

rm_-rf_windows 03-07-2012 07:11 AM

Incidentally, thanks for everybody who replied to this post. I'll try your suggestions too.

rm_-rf_windows 03-18-2012 10:12 AM

A solution to this simple problem has been found at http://www.linuxquestions.org/questi...rrence-933087/ at message #3.

The solution uses the command line program "uniq".

Many thanks to all contributors to this and the linked thread.


All times are GMT -5. The time now is 09:58 PM.