LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (https://www.linuxquestions.org/questions/programming-9/)
-   -   Squeeze out repeated characters (https://www.linuxquestions.org/questions/programming-9/squeeze-out-repeated-characters-4175551898/)

danielbmartin 08-27-2015 10:01 AM

Squeeze out repeated characters
 
This post pertains to a learning exercise. Just for "funsies."

Have: a file with one word per line.
Example:
Code:

success
failure

Want: the same file with repeats of same character "squeezed out."
Example:
Code:

suces
failure

This may be done with tr ...
Code:

tr -s "[a-z]" <$InFile >$OutFile
... or with sed ...
Code:

sed 's/\(.\)\1/\1/g' <$InFile >$OutFile
I tried to perform the same "squeeze" with awk and gsub but could not get the syntax right. Please advise.

Daniel B. Martin

grail 08-27-2015 11:09 AM

gsub does not allow back referencing, so you can either try gensub (which does) or set FS to null and loop over word removing repetition.

danielbmartin 08-27-2015 01:57 PM

Quote:

Originally Posted by grail (Post 5411942)
... try gensub ...

This sed works ...
Code:

sed 's/\(.\)\1/\1/g' $InFile >$OutFile
... so I "borrowed" the RegEx for use with gensub ...
Code:

gawk '{$0=gensub(/\(.\)\1/,"\\1","g"); print $0}' $InFile >$OutFile
... but this doesn't change the InFile at all. It behaves as if the RegEx never matches.

I thought this variation ...
Code:

gawk '{$0=gensub(/\(.\)\1/,"","g"); print $0}' $InFile >$OutFile
... would remove both letter pairs, changing success to sue but it doesn't.

Daniel B. Martin

ntubski 08-27-2015 03:44 PM

gawk doesn't use backslashes before grouping parens. But note that gensub supports referencing captures in the replacement, but still doesn't support backreferences in the pattern so you can't really solve this nicely. For example the following squeezes multiple c and s, but not other letters:

Code:

gawk '{ print(gensub(/(c)c|(s)s/, "\\1\\2", "g")) }'

grail 08-28-2015 04:30 AM

My bad there. Just was thinking of what does do referencing and not where it was being applied. ntubski is on the money :)

You will need to stick with my second option :)

You could of course try Perl or Ruby as alternatives :)

danielbmartin 08-30-2015 11:09 AM

The Original Post asked for a way to perform the "squeeze" with awk and gsub. The best minds on this forum say it's not possible. That makes the question resolved. Not truly solved, but resolved. Thanks to all.

Daniel B. Martin


All times are GMT -5. The time now is 02:04 PM.