LinuxQuestions.org
Help answer threads with 0 replies.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 09-22-2015, 04:09 PM   #16
astrogeek
Moderator
 
Registered: Oct 2008
Distribution: Slackware [64]-X.{0|1|2|37|-current} ::12<=X<=15, FreeBSD_12{.0|.1}
Posts: 6,263
Blog Entries: 24

Rep: Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194Reputation: 4194

Quote:
Originally Posted by millgates View Post
OK, let's try sed:

Code:
key=deront
sed -rn "h; s/.*/&#$key/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" <$InFile
Now that is just elegant simplicity!

I use sed every day and think I am proficient, but I had to pull the book off the shelf and dust off the brain cell to follow this - and it isn't even obscure!

Code:
sed -rn 
h;
s/.*/&#$key/;
:a 
   s/(.)(.*#.*)\1/\2/;
   ta;
/[^#]/!{g;p}
My compliments sir! Simple sed well applied!

Last edited by astrogeek; 09-22-2015 at 04:14 PM. Reason: Added indented code block
 
Old 09-22-2015, 06:11 PM   #17
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
I concocted this problem as a learning exercise, and used a dictionary file as the InFile. Now, having a closer look at it, I realize it identifies all anagrams of the Key Word which are English words.

I integrated the superb solution posted by millgates and am pleased with the brevity and speed. For those who might like to play with it, here is my program in its entirety.
Code:
#!/bin/bash   Daniel B. Martin   Sep15  

# To execute this program, launch a terminal session and enter:
# bash /home/daniel/Desktop/LQfiles/dbm1502.bin
#
# Find all anagrams of a user-specified Key Word which are English words.
 
# Keywords: anagram anagrams

# File identification
    Path=${0%%.*}
 OutFile=$Path"out.txt"

# This European Scrabble word list was downloaded from:
#   http://www.freescrabbledictionary.com/sowpods/download/sowpods.txt
WordList="/home/daniel/Desktop/LQfiles/sowpods.txt"

# Prompt for user input.
echo; echo -n "Enter a Key Word ==> "; read KW 
# For debugging convenience: the default value of KW is "lotipac".
if [ "$KW" == "" ]; then KW='lotipac'; fi

# Method of LQ member millgates.
grep "^$(tr "a-z" "." <<<$KW)$" $WordList  \
|sed -rn "h; s/.*/&#$KW/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" >$OutFile
echo "Anagrams of" $KW "are:"; cat $OutFile; echo "End Of File ("$(wc -l <$OutFile)" lines)"

echo; echo "Normal end of job."; echo; exit
Suggested improvements are welcomed.

This is a sample execution ...
Code:
daniel@daniel-desktop:~$ bash /home/daniel/Desktop/LQfiles/dbm1502.bin

Enter a Key Word ==> lotipac
Anagrams of lotipac are:
capitol
coalpit
optical
topical
End Of File (4 lines)

Normal end of job.

daniel@daniel-desktop:~$
The original problem statement specified 6-character words. This implementation is flexible in that respect. Here is a sample execution with a 5-character Key Word.
Code:
daniel@daniel-desktop:~$ bash /home/daniel/Desktop/LQfiles/dbm1502.bin

Enter a Key Word ==> redoc
Anagrams of redoc are:
coder
cored
credo
decor
End Of File (4 lines)

Normal end of job.

daniel@daniel-desktop:~$
Thanks to all who contributed ideas and code.

Daniel B. Martin
 
1 members found this post helpful.
Old 09-23-2015, 02:10 AM   #18
HMW
Member
 
Registered: Aug 2013
Location: Sweden
Distribution: Debian, Arch, Red Hat, CentOS
Posts: 773
Blog Entries: 3

Rep: Reputation: 369Reputation: 369Reputation: 369Reputation: 369
Quote:
Originally Posted by millgates View Post
OK, let's try sed:

Code:
key=deront
sed -rn "h; s/.*/&#$key/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" <$InFile
I hereby join the choir of praise. My hat is off.

Although I use sed (almost) daily, this is completely incomprehensible in its brilliance, and perhaps also the reason why so many geeks have long and unruly beards.

Good one sir!
HMW
 
Old 09-23-2015, 04:08 AM   #19
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
brevity ... yes
speed ... not actually the fastest
Code:
Perl:

real	0m0.136s
user	0m0.133s
sys	0m0.000s

Sed:

real	0m1.276s
user	0m1.273s
sys	0m0.000s
Perl and others are still quicker

I will agree though, great solution
 
Old 09-24-2015, 10:12 AM   #20
millgates
Member
 
Registered: Feb 2009
Location: 192.168.x.x
Distribution: Slackware
Posts: 852

Rep: Reputation: 389Reputation: 389Reputation: 389Reputation: 389
Quote:
Originally Posted by danielbmartin View Post
Please give us a step-by-step. Thanks!
You've probably figured it out by now, but anyway...
astrogeek nicely split the code into lines, so let's just add a few comments:

Code:
sed -rn 
h;                # store the original pattern in hold space; we will need it later
s/.*/&#$key/;     # append a # and the key to the pattern
:a                # start loop
   s/(.)(.*#.*)\1/\2/;  # find pairs of the same character that have a # between them,
                        # i.e. one is in the pattern and the other one is in the key
   ta;           # end loop when no match is found
/[^#]/!{g;p}     # at this point, if the string still contains anything else than a #
                 # it means the characters in both parts (the key and the pattern) did
                 # not match up, If that is not the case, copy the original pattern
                 # back from the holding space and print it.

Quote:
Originally Posted by danielbmartin
[code]
grep "^$(tr "a-z" "." <<<$KW)$" $WordList \
|sed -rn "h; s/.*/&#$KW/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" >$OutFile

[code]
I wonder whether the grep line is actually necessary.
 
1 members found this post helpful.
Old 09-24-2015, 12:53 PM   #21
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by millgates View Post
I wonder whether the grep line is actually necessary.
It isn't necessary but having it makes execution time shorter. Much shorter.

Daniel B. Martin
 
1 members found this post helpful.
Old 09-24-2015, 08:36 PM   #22
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
This post describes an exploration of performance enhancers.

Program dbm1503A is the excellent one-liner posted by millgates.

Program dbm1503B is the same sed preceded by a grep which eliminates all InFile lines which are not of the same length as the Key Word.

Program dbm1503C is the same as dbm1503B with code added to weed out InFile lines which contain letters not present in the Key Word. There ought to be a way to combine the tr and grep into a single command but I wasn't able to figure out the syntax. Suggestions are invited.

The time for a single execution is not perfectly repeatable so I tried to even things out by using a "do it 5 times" loop in each program.

The programs are ...
Code:
#!/bin/bash   Daniel B. Martin   Sep15   dbm1503A
    Path=${0%%.*}
 OutFile=$Path"out.txt"
WordList="/home/daniel/Desktop/LQfiles/sowpods.txt"
KW='lotipac'
echo "Program dbm1503A... Method of LQ member millgates as originally posted."
COUNTER=0
until [  $COUNTER -eq 5 ]; do
sed -rn "h; s/.*/&#$KW/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" $WordList >$OutFile 
let COUNTER++
done
echo "Normal end of job."; echo; exit


#!/bin/bash   Daniel B. Martin   Sep15   dbm1503B
    Path=${0%%.*}
 OutFile=$Path"out.txt"
WordList="/home/daniel/Desktop/LQfiles/sowpods.txt"
KW='lotipac'
echo "Program dbm1503B... Method of LQ member millgates with one improvement."
COUNTER=0
until [  $COUNTER -eq 5 ]; do
grep "^$(tr "a-z" "." <<<$KW)$" $WordList  \
|sed -rn "h; s/.*/&#$KW/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" >$OutFile
let COUNTER++
done
echo "Normal end of job."; echo; exit

#!/bin/bash   Daniel B. Martin   Sep15   dbm1503C 
    Path=${0%%.*}
 OutFile=$Path"out.txt"
WordList="/home/daniel/Desktop/LQfiles/sowpods.txt"
KW='lotipac'
echo "Program dbm1503C... Method of LQ member millgates with two improvements."
COUNTER=0
until [  $COUNTER -eq 5 ]; do
grep "^$(tr "a-z" "." <<<$KW)$" $WordList  \
|tr "$(tr -d "$KW" <<<"abcdefghijklmnopqrstuvwxyz")" "~"  \
|grep -v "~"                                              \
|sed -rn "h; s/.*/&#$KW/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" >$OutFile
let COUNTER++
done
echo "Normal end of job."; echo; exit
These are the timings ...
Code:
Program dbm1503A... Method of LQ member millgates as originally posted.
Normal end of job.


real	1m22.372s
user	1m22.333s
sys	0m0.028s
daniel@daniel-desktop:~$ time bash /home/daniel/Desktop/LQfiles/dbm1503B.bin
Program dbm1503B... Method of LQ member millgates with one improvement.
Normal end of job.


real	0m7.382s
user	0m10.229s
sys	0m0.048s
daniel@daniel-desktop:~$ time bash /home/daniel/Desktop/LQfiles/dbm1503C.bin
Program dbm1503C... Method of LQ member millgates with two improvements.
Normal end of job.


real	0m1.358s
user	0m1.404s
sys	0m0.072s
Daniel B. Martin
 
Old 09-26-2015, 01:17 PM   #23
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 10,006

Rep: Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191Reputation: 3191
My personal feel would be that once you start cobbling together multiple commands, you are better off just using the perl example (IMHO)
 
Old 09-26-2015, 02:17 PM   #24
Rinndalir
Member
 
Registered: Sep 2015
Posts: 733

Rep: Reputation: Disabled
FYI this is an anagram finder. Which solution did you choose?
 
Old 09-26-2015, 03:02 PM   #25
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Rinndalir View Post
FYI this is an anagram finder.
Yes, this was noted in post #17.

Quote:
Which solution did you choose?
DBM1503C, as given in post #22, because it generates correct results and is the fastest variation (so far).

Daniel B. Martin
 
Old 09-26-2015, 03:26 PM   #26
Rinndalir
Member
 
Registered: Sep 2015
Posts: 733

Rep: Reputation: Disabled
Quote:
Originally Posted by danielbmartin View Post
DBM1503C, as given in post #22, because it generates correct results and is the fastest variation (so far).
In your original post you said you don't like loops but that solution has loops.

Also you say "(so far)" but this thread is marked solved.

I didn't see the code but the perl version is listed as the fastest solution.

I do not see the wordlist??? Did I miss it?
 
Old 09-26-2015, 03:29 PM   #27
Rinndalir
Member
 
Registered: Sep 2015
Posts: 733

Rep: Reputation: Disabled
Quote:
Originally Posted by millgates View Post
Code:
key=deront
sed -rn "h; s/.*/&#$key/;:a s/(.)(.*#.*)\1/\2/;ta;/[^#]/!{g;p}" <$InFile
I know it works and some people like it but most programmers would consider this to be inscrutable.
 
Old 09-26-2015, 04:26 PM   #28
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,881

Original Poster
Rep: Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660Reputation: 660
Quote:
Originally Posted by Rinndalir View Post
In your original post you said you don't like loops but that solution has loops.
Post #1 said, "As a matter of personal coding style I strive to avoid explicit loops." That's true. Strive = Try, and I tried. I was unable to create a no-loop solution.
Quote:
Also you say "(so far)" but this thread is marked solved.
True. I (reluctantly) accepted the idea that there is no no-loop solution, and marked the thread SOLVED. However I will be delighted if a no-loop solution is posted.
Quote:
I didn't see the code but the perl version is listed as the fastest solution.
I don't know perl. I tried to run (and time) the posted perl solution but failed with a syntax error. Maybe I'll learn perl and python some day. At present I am still learning awk and the many powerful Linux commands.
Quote:
I do not see the wordlist??? Did I miss it?
This was given in post #17, in the code. To repeat it here ...
Code:
# This European Scrabble word list was downloaded from:
#   http://www.freescrabbledictionary.com/sowpods/download/sowpods.txt
WordList="/home/daniel/Desktop/LQfiles/sowpods.txt"
Three variations based on the excellent sed solution posted by millgates, together with timings, are shown in post #22. Note that my timings used a "do it 5 times" loop. Keep this in mind if you make timings on your machine.

If you come up with something even better please post it here. We learn from each other!

Daniel B. Martin
 
  


Reply

Tags
awk, grep, sed, text processing, tr



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Text processing -- UPPER CASE doubled letters in second word of each line danielbmartin Programming 34 09-29-2014 06:55 AM
[SOLVED] Delete lines from a text file not starting with "http://" or "https://" georgi Programming 4 10-04-2013 03:00 AM
[SOLVED] grep text from a line in between "start" and "end" word deepakdeore2004 Programming 7 08-07-2013 09:45 AM
Can't stop fast typing double letters from "skipping" one of the letters. Lola Kews Ubuntu 3 04-20-2013 03:21 PM
Openldap Authentication error 'send_ldap_result: err=49 matched="" text=""' mahao Linux - Server 1 03-07-2011 12:56 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 08:43 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration