[SOLVED] Select words with alphabetical-order character strings
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
$ echo airstrip | sed -rn 's/$/;abcdefghijklmnopqrstuvwxyz/; /(.{3}).*;.*\1/{s/;.*//;p}'
airstrip
Lengthy, but also works.
Algorithm is as follows:
First, we append a semicolon and an alphabet to the string.
If there are common 3-character long substring in both left and right part of a string (about semicolon), then remove alphabet and print.
Thank you, lucmove and firstfire, for your suggestions.
Lucmove, your code has the advantage of running slightly faster.
Firstfire, your code has the advantage of convenience if the user wishes to specify the alphabet as a parameter, as shown.
Code:
# Find words containing three consecutive alphabetical-order letters.
# Method of LQ member firstfire.
# In this version the alphabet is a parameter.
AL='abcdefghijklmnopqrstuvwxyz'
cat < $WrdLst \
|sed -rn 's/$/;'$AL'/; /(.{3}).*;.*\1/{s/;.*//;p}' \
> $Work15
This could be significant in an application where the "alphabet" is some character string other than the standard a-to-z alphabet, and that "alphabet" is created and manipulated under program control.
Essentially I used the same approach as lucmove: search for any of the 3 letter sub-sequences of the alphabet. I used grep instead of sed because it's a bit faster.
The grep command would be
Code:
grep -F 'abc
bcd
cde
...
xyz'
But instead of writing out the sequences by hand, I used awk to generate them.
The -F means the pattern is a list fixed strings (instead of regular expressions) which I thought would be faster, but I just checked now and it turns out using a regular expression is faster still!
Code:
# Here is grep with awk that does the equivalent of
# grep -E 'abc|bcd|cde|...|xyz'
< $WrdLst \
grep -E "$(awk -v AL="$AL" 'BEGIN{for(i=0;;){printf("%s",substr(AL, i+1,3)); if(++i<=length(AL)-3)printf("|");else break}}')" \
> $Work15
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.