LinuxQuestions.org

LinuxQuestions.org (/questions/)
-   Programming (http://www.linuxquestions.org/questions/programming-9/)
-   -   Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side (http://www.linuxquestions.org/questions/programming-9/generate-special-alphanumeric-wordlist-no-repeating-characters-side-by-side-862473/)

Stanley_212 02-13-2011 05:55 PM

Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side
 
Hi,

I'm running Ubuntu 10.10 and Windows XP - getting more comfortable with linux everyday.

I'm looking for a way to generate an alphanumeric sequence in linux containing letter-number combinations up to 10 characters long without having the same letters or numbers appearing side-by-side - and save it to a text file.

I found a very useful open-source linux program for generating alphanumeric sequences called crunch.

(http://sourceforge.net/projects/crun...unch-wordlist/)

It generates alphanumeric sequences containing all possible combinations of letters and numbers from a given a character set - great if you want all possible combinations - bad if you don't want a lot of CCCCDDDDD or TTTTEEEEE.

I only want to produce an alphanumeric sequence containing different letters and numbers side-by-side.

For example,

BADELF26 - Acceptable
3H8E5E81 - Acceptable
CTFFF29E - Not Acceptable
CLE3C77N - Not Acceptable

I've looked at the crunch code and modifying it is way out of my league - and not worth the time and effort.

I think implementing nested loops would make a simple solution - just don't know how in linux - my programming skills end with arduino microcontrollers.

Any help with this matter would be greatly appreciated.

Stanley - Linux Newbie

P.S. - If there's an obvious solution to this problem out there I apologize - I have spent a lot of time searching without success.

Snark1994 02-13-2011 06:25 PM

I think this bash script should work:

Code:

#!/bin/bash
ascii=
index=0
noNames=10                                              #No of names to generate
nameLength=10                                          #Length to generate (you said 10)
for(( i=65; i<=90; i++ ))                              #Add upper-case letters to 'ascii'
do
        ascii[$index]=$(echo $i | awk '{printf("%c",$1)}')
        index=$(( $index + 1 ))
done

for(( i=48; i<=57; i++ )) # Add numbers to 'ascii'
do
        ascii[$index]=$(echo $i | awk '{printf("%c",$1)}')
        index=$(( $index + 1))
done

for(( i=0; i<$noNames; i++))
do
        name=                                          #We'll store the name in here
        last=                                          #We'll store the index of the last
                                                        #  character generated here
        for(( j=0; j<$nameLength; j++))
        do 
                num=$(( $RANDOM % $index ))            # Pick a random character index
                while [[ $num -eq $last ]]              #If it's the same as the last
                                                        #  one...
                do
                        num=$(( $RANDOM % $index ))    #... pick a new one!
                done
                last=$num                              #Update "last" to current value
                name=${name}${ascii[$num]}              #Add the correct letter to our name
        done
        echo "${name}"                                  #Print name...
done > output                                          #...to our output file


grail 02-13-2011 07:26 PM

Here is an all awk one:
Code:

#!/usr/bin/awk -f

BEGIN{
    for(i = 48;i <=90;i++){
        if(i == 58)i+=7
        arr[++c] = sprintf("%c", i)
    }

    printf "Enter length of name to generate: "
    getline namelen < "-"

    srand()
    for(i = 1;i <= namelen;i++){
        do
            char = arr[1 + int(rand() * c)]
        while( char == substr(name, length(name)))

        name = name char
    }

    print name
}

To be more robust you would need to put error checking around the number being entered by the user.

kurumi 02-13-2011 08:09 PM

Ruby(1.9+)

Code:

while true
  s=rand(36**10).to_s(36)
  if s !~ /(.)\1/
    puts s
    break
  end
end


Kenhelm 02-13-2011 08:21 PM

'grep' can remove the unwanted strings from the output of 'crunch'
Code:

crunch [CRUNCH_OPTIONS] | grep -Ev '(.)\1'

Stanley_212 02-13-2011 08:48 PM

Thanks Snark,

I ran it a few times and it works correctly...

output test results:

JKXQ2S2V6H
57UB6TB817
048UI0GHJ1
5FLP02HMF3
LAU3SDK8KE
KUWJVXTKNK
HZ09RLDFXW
SUD9XVDZPW
BPEZ3SZMW9
K6D4JNH18G

This works great if I want to produce a few RANDOM bunches of 10 character long sequences.

However, I realized I wasn't clear enough in my post about what I was trying to generate.

I'm not just trying to generate a few sequences... oops, my mistake.

I want to generate a "list of alphanumeric sequences" containing ALL possible combinations of letters and/or numbers without having the same characters appearing side-by-side.

It would be a large list, but no where near as long if I included sequences with the same characters appearing side-by-side.

As I stated, I found a program that generates a so-called wordlist (not sure why they call it a word list if it doesn't contain a lot of actual "words") that generates a list of sequences including all possible combinations - which is not what I want.

(http://sourceforge.net/projects/crun...unch-wordlist/)


Unfortunately, it's not as simple a task as I had described.

I have looked for a solution for a while with no success - you would think this was done before?

I think a series of nested loops would be necessary to generate all possibilities.

Thanks.

P.S. I'd have no hesitation making a paypal donation to somebody's blog or charity.

Stanley

thanks grail too...

Stanley_212 02-13-2011 09:37 PM

Thanks Kenhelm for the idea.

You suggested piping crunch results into grep:

crunch [CRUNCH_OPTIONS] | grep -Ev '(.)\1'

Can you explain the grep option above - I don't quite understand it.

My grep reference:

http://linux.die.net/man/1/grep

Thanks.

Stanley

grail 02-13-2011 10:09 PM

The brackets save a backreference and the -v option says find results that don't have any character '.' followed by itself '\1'

wje_lq 02-14-2011 08:50 AM

First of all, I hope you realize that to send to a file all 10-character combinations of the following characters
Code:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
such that no two adjacent characters are of equal value, and placing each combination on its own line, will generate a file that's 31 petabytes long. I hope you have room for this.

The program listed below will generate that file. Just redirect standard output to a file in the normal manner.

No, I haven't tested it to completion. But you can test a crippled version of it by doing these four things:
  1. Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
  2. Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
  3. Comment out the first definition of *word-length*.
  4. Uncomment the second definition, which uses a word length of four.
If you do that and run the program, you'll get this output. That's the kind of output you're looking for, right?
Code:

ABAB
ABAC
ABCA
ABCB
ACAB
ACAC
ACBA
ACBC
BABA
BABC
BACA
BACB
BCAB
BCAC
BCBA
BCBC
CABA
CABC
CACA
CACB
CBAB
CBAC
CBCA
CBCB

So here's the code, all 42 lines of it.
Code:

#!/usr/bin/clisp

(defparameter *character-set* "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")
;(defparameter *character-set* "ABC")    ; < --- this line is for testing

(defparameter *word-length* 10)
;(defparameter *word-length* 4)          ; < --- this line is for testing

(defparameter *character-list*
  (coerce *character-set* 'list))

(defun final-char (in-string)
  (cond
      ((> (length in-string) 0)
        (elt in-string (1- (length in-string))))
      (t
        nil)))

(defun new-char-list (in-string)
  (let ((result))
      (mapcar
        (lambda (candidate)
            (cond
              ((not (eql candidate (final-char in-string)))
                  (push candidate result))))
        *character-list*)
      (nreverse result))
      )

(defun extend-string (in-string desired-length)
  (mapcar
      (lambda (new-char)
        (let ((new-string (concatenate 'string in-string (string new-char))))
            (cond
              ((>  (length new-string) desired-length))
              ((>= (length new-string) desired-length)
                  (format t "~a~%" new-string))
              (t
                  (extend-string new-string desired-length)))))
      (new-char-list in-string)))

(extend-string "" *word-length*)

Hope this helps.

Stanley_212 02-14-2011 10:41 PM

Thanks very much "wje_lq" for the program.

It works perfectly.

With your help the output lists will be without those irritating CCCBBBB or EEEYYYY combinations that just don't "appear" very random.

And, no wasting time with crunch generating sequences that will ultimately get discarded.

I never would have come up with something so simple and elegant myself.

Quick test results:

Using the character set "ABCD"

word length crunch output nodupes output

4 256 lines 108 lines
5 1024 lines 324 lines

8 65,536 lines 8,748 lines

Wow, what a reduction.

Thanks again.

Stanley

I'll mark this thread as [SOLVED] - as soon as I figure out how...

grail 02-15-2011 03:38 AM

Well it has been a hell of an age since I have done (c)lisp, very nice wje_lq.
Thought I would take the challenge too :)
Code:

#!/usr/bin/awk -f

BEGIN{
#    set = "0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
    set = "A B C D"

    n = split(set, chars)

    printf "Enter length of name to generate: "
    getline namelen < "-"

    if(namelen ~ /^[0-9]+$/)
        n = namelen

    print_string("", chars, n)

}

function print_string( string, array, len,      local_array, i, c)
{
    for(i in array)
        if(array[i] != substr(string,length(string)))
            local_array[++c] = array[i]

    for(i in local_array)
        if(length(string) + 1 == len)
            print string local_array[i]
        else
            print_string(string local_array[i], array, len)
}

Just change the sets at the beginning once you have finished testing :)

Snark1994 02-16-2011 05:17 PM

And SOLVED is under the "Thread tools" menu at the top of the page :)

EDIT: D'oh...

jumpingj 04-04-2011 10:36 AM

Hi Guys

Thanks for very nice scripts( specially wje_lq ,thought i am keen to run it on mac os 10.6, do you know the best way ? i can see it has been writen in clisp ,but i just cant find program for taht.
Second thing : i would like to generate :
10-character combinations of the following characters (lowercase) 23456789abcdef with no more then 3 same letters repeates no metter side by side or within one line (sequence) so lets say

abcdef1234 accept
fabcde1234 accept
ffabcd1234 accept
which is probably permutation with repetable string ( where abc is not equeal to cba etc .so ti speak position does metter)

fffabc1234 not acceptable -----(3 same characters)
ffabcf1234 not acceptable -----( 3 same characters event thought not side by side)

so generally we dont want 3 same characters apper in same line ,
Anybody would kindly challenge that (either linux or mac os scripts)
Thanks for any help and sorry for my bad english,hope you can know what i'm looking for.

grail 04-04-2011 11:00 AM

@jumingj - please don't hijack / resurrect a 2 month old question. Raise your own and if you feel the above is valid then reference it. You should also show what you are getting stuck on with either / both scripts. Remember, the idea is people are here to help not just do the work for you.

jumpingj 04-04-2011 05:33 PM

Sorry guys if i have ask something wrong ,didnt mean to ask anybody to do work for me,i have spotted that script (something similar that i am looking for) and i just dont know how to use it.I am not a programmer myself and probably no need for learning that just to create one script!only need your help to modify "wje_lq" script and run it.
thing like :
"Just redirect standard output to a file in the normal manner" ???

and all this

I.Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
II.Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
III.Comment out the first definition of *word-length*.
IV.Uncomment the second definition, which uses a word length of four.

Thanks


All times are GMT -5. The time now is 06:59 PM.