ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Have: a file of character strings, one string per line.
Want: a way to eliminate repeated letter pairs.
Example: AARDVARK has a repeated letter pair (AR).
AARDVARK should be transformed to ADVK.
This grep ...
Code:
grep -e '\(..\).*\1' < $InFile > $Work1
... identifies strings which have repeated letter pairs. They are candidates for the second step, elimination of repeated letter pairs. How may this be done?
#!/bin/bash
#variable
var="AARDVARK"
#variable length
var_len=${#var}
#array index
p=0;
#create possible combinations and search them against original word
for ((subt=1;subt<="$var_len"-2;subt++));do
subt1=$((subt+1))
for ((counter=0;counter<"$var_len"-"$subt";counter++));do
var1=${var:counter:subt1}
num=`echo "$var" | grep -o "$var1" | wc -l`
if (( $num>1 )); then
p=$((p+1))
a[$p]=`echo "$var" | sed "s/$var1//g"`
fi
done
done
#take the unique values
echo ${a[@]} | tr ' ' '\n' | sort | uniq
@grail .. you are right. It depends on the rules formed to remove those pairs. For example, for the word "PAARKEAAR", following combinations can form (not restricted to pairs only):
Quote:
AA
AAR
AR
It will depend on the rules formed that which combination should be removed first. One rule could be removing the longest one, other could be removing the first encountered combination. It depends on author how he/she wants to form rules.
The code that i submitted before will generate separate words after removing one particular combination at a time. Hence for word "PAARKEAAR", code will spit out three results:
Quote:
1) PAKEA (after substituting AA with "")
2) PKE (after substituting AAR with "")
and
3) PRKER (after substituting AR with "")
That being said code can be easily modified for pairs only, rules can be integrated and then it can substitute all the pairs and result only one word as final answer.
If one prefers 'ABAAABBB' -> '', then % symbol (and the last substitution command) should be removed.
The p(rint) command is only there to illustrate how the script works.
#1 has a loop which identifies the first instance of a repeated letter pair, prefixes it with a colon, and replaces the second instance with a percent sign. An intermediate result is printed, the operation is tested, and based on the test, another iteration of the loop labeled "a" is performed.
#2 eliminates all colons and the following two characters (i.e. the first instance of the repeated letter pair).
#3 eliminates all percent signs which are "ghosts" of the second instance of the letter pair.
Step #1 clearly does all the "heavy lifting". Steps #2 and #3 are cosmetic clean-ups.
I can see what step #1 does but am not clear on how it works.
Please review, correct, and expand this narrative.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.