[SOLVED] Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side
ProgrammingThis forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side
Hi,
I'm running Ubuntu 10.10 and Windows XP - getting more comfortable with linux everyday.
I'm looking for a way to generate an alphanumeric sequence in linux containing letter-number combinations up to 10 characters long without having the same letters or numbers appearing side-by-side - and save it to a text file.
I found a very useful open-source linux program for generating alphanumeric sequences called crunch.
It generates alphanumeric sequences containing all possible combinations of letters and numbers from a given a character set - great if you want all possible combinations - bad if you don't want a lot of CCCCDDDDD or TTTTEEEEE.
I only want to produce an alphanumeric sequence containing different letters and numbers side-by-side.
For example,
BADELF26 - Acceptable
3H8E5E81 - Acceptable
CTFFF29E - Not Acceptable
CLE3C77N - Not Acceptable
I've looked at the crunch code and modifying it is way out of my league - and not worth the time and effort.
I think implementing nested loops would make a simple solution - just don't know how in linux - my programming skills end with arduino microcontrollers.
Any help with this matter would be greatly appreciated.
Stanley - Linux Newbie
P.S. - If there's an obvious solution to this problem out there I apologize - I have spent a lot of time searching without success.
Last edited by Stanley_212; 02-14-2011 at 10:43 PM.
Reason: Clearer Description
#!/bin/bash
ascii=
index=0
noNames=10 #No of names to generate
nameLength=10 #Length to generate (you said 10)
for(( i=65; i<=90; i++ )) #Add upper-case letters to 'ascii'
do
ascii[$index]=$(echo $i | awk '{printf("%c",$1)}')
index=$(( $index + 1 ))
done
for(( i=48; i<=57; i++ )) # Add numbers to 'ascii'
do
ascii[$index]=$(echo $i | awk '{printf("%c",$1)}')
index=$(( $index + 1))
done
for(( i=0; i<$noNames; i++))
do
name= #We'll store the name in here
last= #We'll store the index of the last
# character generated here
for(( j=0; j<$nameLength; j++))
do
num=$(( $RANDOM % $index )) # Pick a random character index
while [[ $num -eq $last ]] #If it's the same as the last
# one...
do
num=$(( $RANDOM % $index )) #... pick a new one!
done
last=$num #Update "last" to current value
name=${name}${ascii[$num]} #Add the correct letter to our name
done
echo "${name}" #Print name...
done > output #...to our output file
Last edited by Snark1994; 02-13-2011 at 06:30 PM.
Reason: Added more code comments, and formatted them nicely :)
This works great if I want to produce a few RANDOM bunches of 10 character long sequences.
However, I realized I wasn't clear enough in my post about what I was trying to generate.
I'm not just trying to generate a few sequences... oops, my mistake.
I want to generate a "list of alphanumeric sequences" containing ALL possible combinations of letters and/or numbers without having the same characters appearing side-by-side.
It would be a large list, but no where near as long if I included sequences with the same characters appearing side-by-side.
As I stated, I found a program that generates a so-called wordlist (not sure why they call it a word list if it doesn't contain a lot of actual "words") that generates a list of sequences including all possible combinations - which is not what I want.
First of all, I hope you realize that to send to a file all 10-character combinations of the following characters
Code:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
such that no two adjacent characters are of equal value, and placing each combination on its own line, will generate a file that's 31 petabytes long. I hope you have room for this.
The program listed below will generate that file. Just redirect standard output to a file in the normal manner.
No, I haven't tested it to completion. But you can test a crippled version of it by doing these four things:
Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
Comment out the first definition of *word-length*.
Uncomment the second definition, which uses a word length of four.
If you do that and run the program, you'll get this output. That's the kind of output you're looking for, right?
Well it has been a hell of an age since I have done (c)lisp, very nice wje_lq.
Thought I would take the challenge too
Code:
#!/usr/bin/awk -f
BEGIN{
# set = "0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z"
set = "A B C D"
n = split(set, chars)
printf "Enter length of name to generate: "
getline namelen < "-"
if(namelen ~ /^[0-9]+$/)
n = namelen
print_string("", chars, n)
}
function print_string( string, array, len, local_array, i, c)
{
for(i in array)
if(array[i] != substr(string,length(string)))
local_array[++c] = array[i]
for(i in local_array)
if(length(string) + 1 == len)
print string local_array[i]
else
print_string(string local_array[i], array, len)
}
Just change the sets at the beginning once you have finished testing
Thanks for very nice scripts( specially wje_lq ,thought i am keen to run it on mac os 10.6, do you know the best way ? i can see it has been writen in clisp ,but i just cant find program for taht.
Second thing : i would like to generate :
10-character combinations of the following characters (lowercase) 23456789abcdef with no more then 3 same letters repeates no metter side by side or within one line (sequence) so lets say
abcdef1234 accept
fabcde1234 accept
ffabcd1234 accept
which is probably permutation with repetable string ( where abc is not equeal to cba etc .so ti speak position does metter)
fffabc1234 not acceptable -----(3 same characters)
ffabcf1234 not acceptable -----( 3 same characters event thought not side by side)
so generally we dont want 3 same characters apper in same line ,
Anybody would kindly challenge that (either linux or mac os scripts)
Thanks for any help and sorry for my bad english,hope you can know what i'm looking for.
@jumingj - please don't hijack / resurrect a 2 month old question. Raise your own and if you feel the above is valid then reference it. You should also show what you are getting stuck on with either / both scripts. Remember, the idea is people are here to help not just do the work for you.
Sorry guys if i have ask something wrong ,didnt mean to ask anybody to do work for me,i have spotted that script (something similar that i am looking for) and i just dont know how to use it.I am not a programmer myself and probably no need for learning that just to create one script!only need your help to modify "wje_lq" script and run it.
thing like :
"Just redirect standard output to a file in the normal manner" ???
and all this
I.Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
II.Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
III.Comment out the first definition of *word-length*.
IV.Uncomment the second definition, which uses a word length of four.
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.