LinuxQuestions.org
Help answer threads with 0 replies.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 08-27-2015, 10:01 AM   #1
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Squeeze out repeated characters


This post pertains to a learning exercise. Just for "funsies."

Have: a file with one word per line.
Example:
Code:
success
failure
Want: the same file with repeats of same character "squeezed out."
Example:
Code:
suces
failure
This may be done with tr ...
Code:
tr -s "[a-z]" <$InFile >$OutFile
... or with sed ...
Code:
sed 's/\(.\)\1/\1/g' <$InFile >$OutFile
I tried to perform the same "squeeze" with awk and gsub but could not get the syntax right. Please advise.

Daniel B. Martin
 
Old 08-27-2015, 11:09 AM   #2
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
gsub does not allow back referencing, so you can either try gensub (which does) or set FS to null and loop over word removing repetition.
 
1 members found this post helpful.
Old 08-27-2015, 01:57 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by grail View Post
... try gensub ...
This sed works ...
Code:
sed 's/\(.\)\1/\1/g' $InFile >$OutFile
... so I "borrowed" the RegEx for use with gensub ...
Code:
gawk '{$0=gensub(/\(.\)\1/,"\\1","g"); print $0}' $InFile >$OutFile
... but this doesn't change the InFile at all. It behaves as if the RegEx never matches.

I thought this variation ...
Code:
gawk '{$0=gensub(/\(.\)\1/,"","g"); print $0}' $InFile >$OutFile
... would remove both letter pairs, changing success to sue but it doesn't.

Daniel B. Martin
 
Old 08-27-2015, 03:44 PM   #4
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Debian, Arch
Posts: 3,780

Rep: Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081Reputation: 2081
gawk doesn't use backslashes before grouping parens. But note that gensub supports referencing captures in the replacement, but still doesn't support backreferences in the pattern so you can't really solve this nicely. For example the following squeezes multiple c and s, but not other letters:

Code:
gawk '{ print(gensub(/(c)c|(s)s/, "\\1\\2", "g")) }'
 
1 members found this post helpful.
Old 08-28-2015, 04:30 AM   #5
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,005

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
My bad there. Just was thinking of what does do referencing and not where it was being applied. ntubski is on the money

You will need to stick with my second option

You could of course try Perl or Ruby as alternatives

Last edited by grail; 08-28-2015 at 04:33 AM.
 
1 members found this post helpful.
Old 08-30-2015, 11:09 AM   #6
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
The Original Post asked for a way to perform the "squeeze" with awk and gsub. The best minds on this forum say it's not possible. That makes the question resolved. Not truly solved, but resolved. Thanks to all.

Daniel B. Martin
 
  


Reply

Tags
awk, gsub, text processing


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to define a string of repeated characters lynne007 Linux - Newbie 6 02-09-2011 03:52 AM
[SOLVED] Waiting for Squeeze stable vs installing Squeeze right now and keeping it up to date alanv Linux - Newbie 2 11-03-2010 02:43 AM
Repeated characters with Logitech MX 5000 Zoombie Linux - Hardware 5 11-30-2006 09:21 PM
Repeated Characters with KDE Desktop Sharing chipfiev Linux - Software 2 12-14-2005 09:19 PM
find repeated characters in a string mcshen Programming 9 02-02-2004 05:43 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 12:25 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration