Sorry, it's still not clear to me. You'll have to break it down in more detail. First, as I asked, please give us a real-life example of the text, including both lines that you want, and lines that you don't want. Put it in code tags, to keep the formatting (this would also allow us to test possible solutions).
Then, please detail exactly what criteria constitute a desired line, and what constitutes excluded lines. Break it down into simple steps or sections, if possible (e.g. each line must first have ..., then ...., but not ....), with examples. And please separate your points with more whitespace. The solid blocks of text you're using are hard to read.
Also, lets make sure your terms are correct.
Subsequent means "following", or "coming after" If you have "
AB CD", then "CD" is subsequent to "AB".
Consecutive means "in a continuous, unbroken string". "
AAACCC" is three consecutive "A"s followed by three consecutive "C"s.
I do hope you realize that a chain of greps like you posted causes each one to filter the output of the previous command. It doesn't directly analyze the sequence inside each line.
For example, this appears to be what your grep commands do now (and actually, you should be using
egrep/grep -E):
Input file (file.txt):
Code:
bcddddeffghklmnn0aezxwsw
dvq7umsylnrfzdd2qgmgofmt
wgammjawtnedivjxpgzcynx9
qicqulsmcrbmuampwatk7hih
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
1) Your first grep matches and prints out strings of 16-24 consecutive lowercase consonants:
Code:
$ egrep '[bcdfghjklmnpqrstvzxyw]{16,24}' file.txt
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
Notice that only the last three lines that I added match, because only they have 16+
consecutive consonants. None of the strings you gave above match this rule.
2) From the output of the last grep, match 0-6 consecutive vowels:
Code:
$ ...| egrep '[aeiou]{0,6}'
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
All three previous lines match, but probably not in the way you want them to. The last one actually matches twice.
3) Finally, find strings of 0-6 digits from the previous output:
Code:
$ ...| egrep '[0123456789]{0,6}'
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
Again, all three match, but the first one matches because it has
zero digits in it, and the second one again has multiple matches.
It seems to me that what you really want is a context-sensitive match, with each section depending on what comes before it in the string. Now if you could explain exactly what a single line pattern should be then perhaps you can build it into a single regex. That is, if you wanted something like the following:
[16-24 consonants] followed by
[0-6 vowels] followed by
[0-6 digits]
Then a single grep like this would match the above text like so:
Code:
$ egrep '[bcdfghjklmnpqrstvzxyw]{16,24}[aeiou]{0,6}[0123456789]{0,6}' g_file.txt
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
But this still wouldn't fulfill your final requirement of no more than 15 different consonants. That's not something grep/regex can do on its own. You'd need some kind of function to go through the string and count the number of different characters in it, then test that number for compliance.
So I think that you really need to do as
jhwilliams suggested and use a real lexical parser, something that can analyze the whole string in context according to your desired rules. Or at least to use a full-featured text-processing language like perl. What you want is probably too complex for a few simple grep commands.