Squeeze out repeated characters
This post pertains to a learning exercise. Just for "funsies."
Have: a file with one word per line. Example: Code:
success Example: Code:
suces Code:
tr -s "[a-z]" <$InFile >$OutFile Code:
sed 's/\(.\)\1/\1/g' <$InFile >$OutFile Daniel B. Martin |
gsub does not allow back referencing, so you can either try gensub (which does) or set FS to null and loop over word removing repetition.
|
Quote:
Code:
sed 's/\(.\)\1/\1/g' $InFile >$OutFile Code:
gawk '{$0=gensub(/\(.\)\1/,"\\1","g"); print $0}' $InFile >$OutFile I thought this variation ... Code:
gawk '{$0=gensub(/\(.\)\1/,"","g"); print $0}' $InFile >$OutFile Daniel B. Martin |
gawk doesn't use backslashes before grouping parens. But note that gensub supports referencing captures in the replacement, but still doesn't support backreferences in the pattern so you can't really solve this nicely. For example the following squeezes multiple c and s, but not other letters:
Code:
gawk '{ print(gensub(/(c)c|(s)s/, "\\1\\2", "g")) }' |
My bad there. Just was thinking of what does do referencing and not where it was being applied. ntubski is on the money :)
You will need to stick with my second option :) You could of course try Perl or Ruby as alternatives :) |
The Original Post asked for a way to perform the "squeeze" with awk and gsub. The best minds on this forum say it's not possible. That makes the question resolved. Not truly solved, but resolved. Thanks to all.
Daniel B. Martin |
All times are GMT -5. The time now is 02:04 PM. |