Using sed to substitute repetition -> single occurrence
Hi,
I want to use sed to substitute an occurrence of "a a a a a" (a,space,a,space,a,space,a,space,a ... any number of a's like this) with "a". I just can't figure it out. I've googled it... Thanks in advance. PS. - I think it'd be cool to have a special forum section for command line stuff... I wasn't sure where to post this. |
Not sure if that'll suit your needs. You'd have to be more specific in your example. Ideally provide some sample input.
Code:
sed 's/\(\<a\>\).*/\1/' infile |
Quote:
how about this: Code:
$ echo 'a a a a a hab' |sed -r 's/(a *)*/a/' PS: The command line stuff is usually fine in Linux General or in Programming if the problem is a bit more complex. |
There are lots of things sed can do, but I don't know that this problem is best suited for it. Though, some sed guru may come along with a quick one liner.
That said, my immediate thought for this problem would use tr and uniq. If the line of text is in a shell variable for instance, substitute newlines for the spaces, then send the result through uniq, and then substitute spaces for the newlines. The result should be a line where any repeated block of non-space characters is reduced to a single occurrence. There may be some input "sanitizing" that may need to be done (i.e. convert multiple spaces to a single space). Also, this approach would not properly handle a repeated sequence that spans multiple lines. Then again, neither would sed without some additional complication. EDIT: Oh... yeah, I'm looking at this from the more general perspective that you do not know the exact text that will be repeated beforehand. EDIT2: Since my response feels naked without an example: Code:
user@localhost$ echo "they practically practically sell themselves themselves themselves" | \ |
I think you are looking exactly for this:
Code:
sed -r 's/(a )+(a| )?/a/g' your_file >tmp; mv tmp your_file Quote:
Code:
$ sed -r 's/(a )+(a| )?/a/g' your_file >tmp; mv tmp your_file |
Ottimo Romagnolo, grazie mille!
It's exactly what I wanted (last post, the one before this). Is it possible to make this more generic, so that any repetition, the repetition of any character + space is replaced by the one character? God, I spent 4 hours trying to figure it out. Sei italiano? Ciao. |
This works like a charm:
Code:
sed -r 's/(([[:graph:]]) )+(\2){1}/\2/g' your_file >tmp; mv tmp your_file Quote:
|
Ciao Romagnolo,
Grazie per la risposta, ma da me non funziona! (I haven't said anything obscene here, only that his last code snippet doesn't work on my end)... Romagnolo, eh? Mi piacerebbe essere in Italia ora (I'd like to be in Italy now). A dopo. |
Incidentally, thanks for everybody who replied to this post. I'll try your suggestions too.
|
A solution to this simple problem has been found at http://www.linuxquestions.org/questi...rrence-933087/ at message #3.
The solution uses the command line program "uniq". Many thanks to all contributors to this and the linked thread. |
All times are GMT -5. The time now is 09:58 PM. |