LinuxQuestions.org
Welcome to the most active Linux Forum on the web.
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 02-13-2011, 04:55 PM   #1
Stanley_212
LQ Newbie
 
Registered: Feb 2011
Posts: 6

Rep: Reputation: 0
Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side


Hi,

I'm running Ubuntu 10.10 and Windows XP - getting more comfortable with linux everyday.

I'm looking for a way to generate an alphanumeric sequence in linux containing letter-number combinations up to 10 characters long without having the same letters or numbers appearing side-by-side - and save it to a text file.

I found a very useful open-source linux program for generating alphanumeric sequences called crunch.

(http://sourceforge.net/projects/crun...unch-wordlist/)

It generates alphanumeric sequences containing all possible combinations of letters and numbers from a given a character set - great if you want all possible combinations - bad if you don't want a lot of CCCCDDDDD or TTTTEEEEE.

I only want to produce an alphanumeric sequence containing different letters and numbers side-by-side.

For example,

BADELF26 - Acceptable
3H8E5E81 - Acceptable
CTFFF29E - Not Acceptable
CLE3C77N - Not Acceptable

I've looked at the crunch code and modifying it is way out of my league - and not worth the time and effort.

I think implementing nested loops would make a simple solution - just don't know how in linux - my programming skills end with arduino microcontrollers.

Any help with this matter would be greatly appreciated.

Stanley - Linux Newbie

P.S. - If there's an obvious solution to this problem out there I apologize - I have spent a lot of time searching without success.

Last edited by Stanley_212; 02-14-2011 at 09:43 PM. Reason: Clearer Description
 
Old 02-13-2011, 05:25 PM   #2
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
I think this bash script should work:

Code:
#!/bin/bash
ascii=
index=0
noNames=10                                              #No of names to generate
nameLength=10                                           #Length to generate (you said 10)
for(( i=65; i<=90; i++ ))                               #Add upper-case letters to 'ascii'
do
        ascii[$index]=$(echo $i | awk '{printf("%c",$1)}')
        index=$(( $index + 1 ))
done

for(( i=48; i<=57; i++ )) # Add numbers to 'ascii'
do
        ascii[$index]=$(echo $i | awk '{printf("%c",$1)}')
        index=$(( $index + 1))
done

for(( i=0; i<$noNames; i++))
do
	name=                                           #We'll store the name in here
	last=                                           #We'll store the index of the last 
                                                        #   character generated here
	for(( j=0; j<$nameLength; j++))
	do  
		num=$(( $RANDOM % $index ))             # Pick a random character index
		while [[ $num -eq $last ]]              #If it's the same as the last 
                                                        #  one...
		do
			num=$(( $RANDOM % $index ))     #... pick a new one!
		done
		last=$num                               #Update "last" to current value
	        name=${name}${ascii[$num]}              #Add the correct letter to our name
	done
	echo "${name}"                                  #Print name...
done > output                                           #...to our output file

Last edited by Snark1994; 02-13-2011 at 05:30 PM. Reason: Added more code comments, and formatted them nicely :)
 
Old 02-13-2011, 06:26 PM   #3
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Here is an all awk one:
Code:
#!/usr/bin/awk -f

BEGIN{
    for(i = 48;i <=90;i++){
        if(i == 58)i+=7
        arr[++c] = sprintf("%c", i)
    }

    printf "Enter length of name to generate: "
    getline namelen < "-"

    srand()
    for(i = 1;i <= namelen;i++){
        do
            char = arr[1 + int(rand() * c)]
        while( char == substr(name, length(name)))

        name = name char
    }

    print name
}
To be more robust you would need to put error checking around the number being entered by the user.
 
Old 02-13-2011, 07:09 PM   #4
kurumi
Member
 
Registered: Apr 2010
Posts: 228

Rep: Reputation: 53
Ruby(1.9+)

Code:
while true
  s=rand(36**10).to_s(36)
  if s !~ /(.)\1/
    puts s
    break
  end
end
 
Old 02-13-2011, 07:21 PM   #5
Kenhelm
Member
 
Registered: Mar 2008
Location: N. W. England
Distribution: Mandriva
Posts: 360

Rep: Reputation: 170Reputation: 170
'grep' can remove the unwanted strings from the output of 'crunch'
Code:
crunch [CRUNCH_OPTIONS] | grep -Ev '(.)\1'
 
1 members found this post helpful.
Old 02-13-2011, 07:48 PM   #6
Stanley_212
LQ Newbie
 
Registered: Feb 2011
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks Snark,

I ran it a few times and it works correctly...

output test results:

JKXQ2S2V6H
57UB6TB817
048UI0GHJ1
5FLP02HMF3
LAU3SDK8KE
KUWJVXTKNK
HZ09RLDFXW
SUD9XVDZPW
BPEZ3SZMW9
K6D4JNH18G

This works great if I want to produce a few RANDOM bunches of 10 character long sequences.

However, I realized I wasn't clear enough in my post about what I was trying to generate.

I'm not just trying to generate a few sequences... oops, my mistake.

I want to generate a "list of alphanumeric sequences" containing ALL possible combinations of letters and/or numbers without having the same characters appearing side-by-side.

It would be a large list, but no where near as long if I included sequences with the same characters appearing side-by-side.

As I stated, I found a program that generates a so-called wordlist (not sure why they call it a word list if it doesn't contain a lot of actual "words") that generates a list of sequences including all possible combinations - which is not what I want.

(http://sourceforge.net/projects/crun...unch-wordlist/)


Unfortunately, it's not as simple a task as I had described.

I have looked for a solution for a while with no success - you would think this was done before?

I think a series of nested loops would be necessary to generate all possibilities.

Thanks.

P.S. I'd have no hesitation making a paypal donation to somebody's blog or charity.

Stanley

thanks grail too...
 
Old 02-13-2011, 08:37 PM   #7
Stanley_212
LQ Newbie
 
Registered: Feb 2011
Posts: 6

Original Poster
Rep: Reputation: 0
Thanks Kenhelm for the idea.

You suggested piping crunch results into grep:

crunch [CRUNCH_OPTIONS] | grep -Ev '(.)\1'

Can you explain the grep option above - I don't quite understand it.

My grep reference:

http://linux.die.net/man/1/grep

Thanks.

Stanley
 
Old 02-13-2011, 09:09 PM   #8
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
The brackets save a backreference and the -v option says find results that don't have any character '.' followed by itself '\1'
 
Old 02-14-2011, 07:50 AM   #9
wje_lq
Member
 
Registered: Sep 2007
Location: Mariposa
Distribution: FreeBSD,Debian wheezy
Posts: 811

Rep: Reputation: 179Reputation: 179
First of all, I hope you realize that to send to a file all 10-character combinations of the following characters
Code:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
such that no two adjacent characters are of equal value, and placing each combination on its own line, will generate a file that's 31 petabytes long. I hope you have room for this.

The program listed below will generate that file. Just redirect standard output to a file in the normal manner.

No, I haven't tested it to completion. But you can test a crippled version of it by doing these four things:
  1. Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
  2. Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
  3. Comment out the first definition of *word-length*.
  4. Uncomment the second definition, which uses a word length of four.
If you do that and run the program, you'll get this output. That's the kind of output you're looking for, right?
Code:
ABAB
ABAC
ABCA
ABCB
ACAB
ACAC
ACBA
ACBC
BABA
BABC
BACA
BACB
BCAB
BCAC
BCBA
BCBC
CABA
CABC
CACA
CACB
CBAB
CBAC
CBCA
CBCB
So here's the code, all 42 lines of it.
Code:
#!/usr/bin/clisp

(defparameter *character-set* "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ")
;(defparameter *character-set* "ABC")     ; < --- this line is for testing

(defparameter *word-length* 10)
;(defparameter *word-length* 4)           ; < --- this line is for testing

(defparameter *character-list*
   (coerce *character-set* 'list))

(defun final-char (in-string)
   (cond
      ((> (length in-string) 0)
         (elt in-string (1- (length in-string))))
      (t
         nil)))

(defun new-char-list (in-string)
   (let ((result))
      (mapcar
         (lambda (candidate)
            (cond
               ((not (eql candidate (final-char in-string)))
                  (push candidate result))))
         *character-list*)
      (nreverse result))
      )

(defun extend-string (in-string desired-length)
   (mapcar
      (lambda (new-char)
         (let ((new-string (concatenate 'string in-string (string new-char))))
            (cond
               ((>  (length new-string) desired-length))
               ((>= (length new-string) desired-length)
                  (format t "~a~%" new-string))
               (t
                  (extend-string new-string desired-length)))))
      (new-char-list in-string)))

(extend-string "" *word-length*)
Hope this helps.

Last edited by wje_lq; 02-14-2011 at 08:06 AM. Reason: removed from the code a function I wasn't using any more
 
1 members found this post helpful.
Old 02-14-2011, 09:41 PM   #10
Stanley_212
LQ Newbie
 
Registered: Feb 2011
Posts: 6

Original Poster
Rep: Reputation: 0
Smile

Thanks very much "wje_lq" for the program.

It works perfectly.

With your help the output lists will be without those irritating CCCBBBB or EEEYYYY combinations that just don't "appear" very random.

And, no wasting time with crunch generating sequences that will ultimately get discarded.

I never would have come up with something so simple and elegant myself.

Quick test results:

Using the character set "ABCD"

word length crunch output nodupes output

4 256 lines 108 lines
5 1024 lines 324 lines

8 65,536 lines 8,748 lines

Wow, what a reduction.

Thanks again.

Stanley

I'll mark this thread as [SOLVED] - as soon as I figure out how...
 
Old 02-15-2011, 02:38 AM   #11
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
Well it has been a hell of an age since I have done (c)lisp, very nice wje_lq.
Thought I would take the challenge too
Code:
#!/usr/bin/awk -f

BEGIN{
#    set = "0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
    set = "A B C D"

    n = split(set, chars)

    printf "Enter length of name to generate: "
    getline namelen < "-"

    if(namelen ~ /^[0-9]+$/)
        n = namelen

    print_string("", chars, n)

}

function print_string( string, array, len,      local_array, i, c)
{
    for(i in array)
        if(array[i] != substr(string,length(string)))
            local_array[++c] = array[i]

    for(i in local_array)
        if(length(string) + 1 == len)
            print string local_array[i]
        else
            print_string(string local_array[i], array, len)
}
Just change the sets at the beginning once you have finished testing
 
Old 02-16-2011, 04:17 PM   #12
Snark1994
Senior Member
 
Registered: Sep 2010
Distribution: Debian
Posts: 1,632
Blog Entries: 3

Rep: Reputation: 346Reputation: 346Reputation: 346Reputation: 346
And SOLVED is under the "Thread tools" menu at the top of the page

EDIT: D'oh...
 
Old 04-04-2011, 09:36 AM   #13
jumpingj
LQ Newbie
 
Registered: Apr 2011
Posts: 6

Rep: Reputation: 0
Hi Guys

Thanks for very nice scripts( specially wje_lq ,thought i am keen to run it on mac os 10.6, do you know the best way ? i can see it has been writen in clisp ,but i just cant find program for taht.
Second thing : i would like to generate :
10-character combinations of the following characters (lowercase) 23456789abcdef with no more then 3 same letters repeates no metter side by side or within one line (sequence) so lets say

abcdef1234 accept
fabcde1234 accept
ffabcd1234 accept
which is probably permutation with repetable string ( where abc is not equeal to cba etc .so ti speak position does metter)

fffabc1234 not acceptable -----(3 same characters)
ffabcf1234 not acceptable -----( 3 same characters event thought not side by side)

so generally we dont want 3 same characters apper in same line ,
Anybody would kindly challenge that (either linux or mac os scripts)
Thanks for any help and sorry for my bad english,hope you can know what i'm looking for.
 
Old 04-04-2011, 10:00 AM   #14
grail
LQ Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 9,999

Rep: Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190Reputation: 3190
@jumingj - please don't hijack / resurrect a 2 month old question. Raise your own and if you feel the above is valid then reference it. You should also show what you are getting stuck on with either / both scripts. Remember, the idea is people are here to help not just do the work for you.
 
Old 04-04-2011, 04:33 PM   #15
jumpingj
LQ Newbie
 
Registered: Apr 2011
Posts: 6

Rep: Reputation: 0
Sorry guys if i have ask something wrong ,didnt mean to ask anybody to do work for me,i have spotted that script (something similar that i am looking for) and i just dont know how to use it.I am not a programmer myself and probably no need for learning that just to create one script!only need your help to modify "wje_lq" script and run it.
thing like :
"Just redirect standard output to a file in the normal manner" ???

and all this

I.Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
II.Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
III.Comment out the first definition of *word-length*.
IV.Uncomment the second definition, which uses a word length of four.

Thanks
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: Comparing Kernel Dmesgs: Remove Timing Info and Diff Side by Side LXer Syndicated Linux News 0 08-04-2010 07:31 PM
X11 and TEXT mode dual monitor side by side marius_c Linux - General 0 10-30-2009 10:30 AM
Installing SUSE 11.1 side by side with Ubuntu already installed on a USB drive jjchavez Linux - Laptop and Netbook 2 09-12-2009 01:39 PM
LXer: Top Linux photo managers side-by-side LXer Syndicated Linux News 0 12-14-2006 06:33 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 05:51 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration