LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 01-19-2013, 07:30 PM   #1
Stanley_212
LQ Newbie
 
Registered: Feb 2011
Posts: 6

Rep: Reputation: 0
Generate SPECIAL alphanumeric WORDLIST - no characters appearing more than X times


Hi,

Firstly, I'd like to thank those who helped solve my first question (almost 2 years ago):

http://www.linuxquestions.org/questi...y-side-862473/

It had to do with generating a SPECIAL wordlist containing all possible combinations of alphanumeric characters without having the same characters appearing side-by-side.

Code:
#!/usr/bin/clisp

(defparameter *character-set* "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")
;(defparameter *character-set* "ABC")     ; < --- this line is for testing

(defparameter *word-length* 10)
;(defparameter *word-length* 4)           ; < --- this line is for testing

(defparameter *character-list*
   (coerce *character-set* 'list))

(defun final-char (in-string)
   (cond
      ((> (length in-string) 0)
         (elt in-string (1- (length in-string))))
      (t
         nil)))

(defun new-char-list (in-string)
   (let ((result))
      (mapcar
         (lambda (candidate)
            (cond
               ((not (eql candidate (final-char in-string)))
                  (push candidate result))))
         *character-list*)
      (nreverse result))
      )

(defun extend-string (in-string desired-length)
   (mapcar
      (lambda (new-char)
         (let ((new-string (concatenate 'string in-string (string new-char))))
            (cond
               ((>  (length new-string) desired-length))
               ((>= (length new-string) desired-length)
                  (format t "~a~%" new-string))
               (t
                  (extend-string new-string desired-length)))))
      (new-char-list in-string)))

(extend-string "" *word-length*)
It works great.

Since then I've noticed crunch (a popular alphanumeric wordlist generator) has added this feature - it was about time - e.g. ./crunch -d 1 <= eliminates dupes side-by-side.

However, there is a feature that I have noticed missing from every wordlist generator out there that I think would be very useful.

I'm looking for the ability to specify the maximum number of times the same character will appear in a sequence.

For example:

char set ABCD
max dupe appear 2
word length 5

ABBCD - acceptable - Bx2
BACBD - acceptable - Bx2
BAACB - acceptable - Bx2 Ax2
ACBBB - not acceptable - Bx3
DDDBB - not acceptable - Bx2 Dx3
BBDBB - not acceptable - Bx4 ,<= appears in crunch
for ./crunch -d 2


I'm looking for the ability to generate an alphanumeric wordlist containing all possible combinations but allowing a unique character to only appear x number of times in a sequence - identical characters can appear side-by-side.

Basically, there has to be a counter on each unique character and once the counter hits the maximum value allowed a different character must be used.

The only alternative solution to writing a program is to remove patterns from an existing wordlist - but this would mean generating lists containing AADCAABA and CCCDABBD just to remove the unwanted ones.

I'd prefer to do things in one step - rather than 10 forward and 9 back.

Any suggestions would be greatly appreciated.

And, if anybody can find an existing solution to this problem I'm eager to find out.

I'll continue my search... but so far I haven't found a solution.

Thanks so much.

Last edited by Stanley_212; 01-19-2013 at 07:35 PM. Reason: Typing Mistake
 
Old 01-20-2013, 12:46 PM   #2
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,819

Rep: Reputation: 640Reputation: 640Reputation: 640Reputation: 640Reputation: 640Reputation: 640
Quote:
Originally Posted by Stanley_212 View Post
I'm looking for the ability to specify the maximum number of times the same character will appear in a sequence.
InFile ...
Code:
aardvark
abilities
accountant
additional
artistic
aristocratic
affordable
agreeable
allocation
amount
anoint
appreciate
aquatic
aromatic
assistant
atrophied
autumn
aviator
awaiting
This code ...
Code:
awk -F "" '{
  for(i=1; i<=NF; i++) {
  if (i==1) delete a
  if (++a[$i]>2) {print $0 " has >2 occurrences of \"" $i "\""; break}}
  if (a[$i]<=2)   print $0 " is acceptable"}' < $InFile
Generates this result ...
Code:
aardvark has >2 occurrences of "a"
abilities has >2 occurrences of "i"
accountant is acceptable
additional is acceptable
artistic is acceptable
aristocratic is acceptable
affordable is acceptable
agreeable has >2 occurrences of "e"
allocation is acceptable
amount is acceptable
anoint is acceptable
appreciate is acceptable
aquatic is acceptable
aromatic is acceptable
assistant has >2 occurrences of "s"
atrophied is acceptable
autumn is acceptable
aviator is acceptable
awaiting is acceptable
Daniel B. Martin
 
Old 01-20-2013, 03:49 PM   #3
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Debian + kde 4 / 5
Posts: 6,849

Rep: Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024Reputation: 2024
I'm no expert in anything except the shell, but I'd imagine that you wouldn't want to do something like this in one. Any fully interpreted language would probably be really slow, as you'd have to do it character by character.

I have no idea it this is a practical solution, but my idea would be to work backwards instead. Set up an array "pool" for each character, or even just a counter, starting at the max. Each time a letter is used it would remove one from its pool. When a pool reaches zero it would be ignored from then on. This would be reset for each line.
 
Old 01-20-2013, 06:15 PM   #4
Stanley_212
LQ Newbie
 
Registered: Feb 2011
Posts: 6

Original Poster
Rep: Reputation: 0
I have found a solution to this problem.

After more searching on the internet I have found bruteforge at:

http://masterzorag.blogspot.com/

It's like crunch but with more features - including the feature I was looking for...

Check it out if you're interested in generating alphanumeric word lists with restrictions on the
number that each character is used:

The configuration file in brugeforge actually lets you specify how many times each individual character
in the character set is used - i.e., say use "a" 3 times and "b" 5 times ... and only have the same charaters
side-by-side only twice... like "aa" yes and "aaa" no.

Looks like a great program.

Thanks for the feedback from all of you.
 
Old 01-20-2013, 07:46 PM   #5
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,819

Rep: Reputation: 640Reputation: 640Reputation: 640Reputation: 640Reputation: 640Reputation: 640
Quote:
Originally Posted by Stanley_212 View Post
I have found a solution to this problem.
Okay, it is a solved problem. I offer a homegrown solution as evidence that the problem is not as hard as it first appears.
Code:
seq -w 999999               \
|tr '012345' 'ABCDEF'       \
|awk -F "" '{for(i=1; i<=NF; i++) {
  if (i==1) delete a
  if (++a[$i]>2) break}
  if (a[$i]<=2) print $0}'  \
|shuf                       \
> $Good
Daniel B. Martin

Last edited by danielbmartin; 01-21-2013 at 07:54 AM. Reason: Correction
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side Stanley_212 Programming 30 08-17-2013 11:39 AM
Generate SPECIAL alphanumeric wordlist!!! Output! sapto Programming 6 02-14-2012 08:08 AM
Generate SPECIAL alphanumeric WORDLIST with a total amount of consonants, number, ut0ugh1 Programming 2 10-26-2011 02:59 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 10:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration