Generate SPECIAL alphanumeric WORDLIST - no repeating characters side-by-side
I'm running Ubuntu 10.10 and Windows XP - getting more comfortable with linux everyday.
I'm looking for a way to generate an alphanumeric sequence in linux containing letter-number combinations up to 10 characters long without having the same letters or numbers appearing side-by-side - and save it to a text file.
I found a very useful open-source linux program for generating alphanumeric sequences called crunch.
It generates alphanumeric sequences containing all possible combinations of letters and numbers from a given a character set - great if you want all possible combinations - bad if you don't want a lot of CCCCDDDDD or TTTTEEEEE.
I only want to produce an alphanumeric sequence containing different letters and numbers side-by-side.
BADELF26 - Acceptable
3H8E5E81 - Acceptable
CTFFF29E - Not Acceptable
CLE3C77N - Not Acceptable
I've looked at the crunch code and modifying it is way out of my league - and not worth the time and effort.
I think implementing nested loops would make a simple solution - just don't know how in linux - my programming skills end with arduino microcontrollers.
Any help with this matter would be greatly appreciated.
Stanley - Linux Newbie
P.S. - If there's an obvious solution to this problem out there I apologize - I have spent a lot of time searching without success.
I think this bash script should work:
Here is an all awk one:
'grep' can remove the unwanted strings from the output of 'crunch'
I ran it a few times and it works correctly...
output test results:
This works great if I want to produce a few RANDOM bunches of 10 character long sequences.
However, I realized I wasn't clear enough in my post about what I was trying to generate.
I'm not just trying to generate a few sequences... oops, my mistake.
I want to generate a "list of alphanumeric sequences" containing ALL possible combinations of letters and/or numbers without having the same characters appearing side-by-side.
It would be a large list, but no where near as long if I included sequences with the same characters appearing side-by-side.
As I stated, I found a program that generates a so-called wordlist (not sure why they call it a word list if it doesn't contain a lot of actual "words") that generates a list of sequences including all possible combinations - which is not what I want.
Unfortunately, it's not as simple a task as I had described.
I have looked for a solution for a while with no success - you would think this was done before?
I think a series of nested loops would be necessary to generate all possibilities.
P.S. I'd have no hesitation making a paypal donation to somebody's blog or charity.
thanks grail too...
Thanks Kenhelm for the idea.
You suggested piping crunch results into grep:
crunch [CRUNCH_OPTIONS] | grep -Ev '(.)\1'
Can you explain the grep option above - I don't quite understand it.
My grep reference:
The brackets save a backreference and the -v option says find results that don't have any character '.' followed by itself '\1'
First of all, I hope you realize that to send to a file all 10-character combinations of the following characters
The program listed below will generate that file. Just redirect standard output to a file in the normal manner.
No, I haven't tested it to completion. But you can test a crippled version of it by doing these four things:
Thanks very much "wje_lq" for the program.
It works perfectly.
With your help the output lists will be without those irritating CCCBBBB or EEEYYYY combinations that just don't "appear" very random.
And, no wasting time with crunch generating sequences that will ultimately get discarded.
I never would have come up with something so simple and elegant myself.
Quick test results:
Using the character set "ABCD"
word length crunch output nodupes output
4 256 lines 108 lines
5 1024 lines 324 lines
8 65,536 lines 8,748 lines
Wow, what a reduction.
I'll mark this thread as [SOLVED] - as soon as I figure out how...
Well it has been a hell of an age since I have done (c)lisp, very nice wje_lq.
Thought I would take the challenge too :)
And SOLVED is under the "Thread tools" menu at the top of the page :)
Thanks for very nice scripts( specially wje_lq ,thought i am keen to run it on mac os 10.6, do you know the best way ? i can see it has been writen in clisp ,but i just cant find program for taht.
Second thing : i would like to generate :
10-character combinations of the following characters (lowercase) 23456789abcdef with no more then 3 same letters repeates no metter side by side or within one line (sequence) so lets say
which is probably permutation with repetable string ( where abc is not equeal to cba etc .so ti speak position does metter)
fffabc1234 not acceptable -----(3 same characters)
ffabcf1234 not acceptable -----( 3 same characters event thought not side by side)
so generally we dont want 3 same characters apper in same line ,
Anybody would kindly challenge that (either linux or mac os scripts)
Thanks for any help and sorry for my bad english,hope you can know what i'm looking for.
@jumingj - please don't hijack / resurrect a 2 month old question. Raise your own and if you feel the above is valid then reference it. You should also show what you are getting stuck on with either / both scripts. Remember, the idea is people are here to help not just do the work for you.
Sorry guys if i have ask something wrong ,didnt mean to ask anybody to do work for me,i have spotted that script (something similar that i am looking for) and i just dont know how to use it.I am not a programmer myself and probably no need for learning that just to create one script!only need your help to modify "wje_lq" script and run it.
thing like :
"Just redirect standard output to a file in the normal manner" ???
and all this
I.Comment out the first definition of *character-set*, by adding a semicon at the beginning of the line.
II.Uncomment the second definition, which just uses "ABC", by removing the semicolon from the beginning of the line.
III.Comment out the first definition of *word-length*.
IV.Uncomment the second definition, which uses a word length of four.
|All times are GMT -5. The time now is 01:35 AM.|