LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Closed Thread
  Search this Thread
Old 10-26-2011, 03:41 PM   #1
ut0ugh1
Member
 
Registered: Oct 2011
Posts: 59

Rep: Reputation: Disabled
conditional greping


i would obtain a filter with multiple conditional greping some like:
echo -n "whatever24characterlong" | grep '[bcdfghjklmnpqrstvzxyw]\{16,24\}'| grep '[aeiou]\{0,6\}'| grep '[0123456789]\{0,6\}'|
but i would like to exclude more than 4 subsequent vowels, 4 subsequent consonants, 3 subsequent numbers and exclude words with more than 15 different consonants. can u help me, plz.thx.
 
Old 10-26-2011, 03:49 PM   #2
jhwilliams
Senior Member
 
Registered: Apr 2007
Location: Portland, OR
Distribution: Debian, Android, LFS
Posts: 1,168

Rep: Reputation: 211Reputation: 211Reputation: 211
You do not at any point ask a question, and your writing is poorly legible, make it difficult to respond adequately.

Look at lex if you're doing regexes more seriously than a string of greps can provide.

http://en.wikipedia.org/wiki/Lex_(so..._of_a_lex_file
 
0 members found this post helpful.
Old 10-26-2011, 04:36 PM   #3
ut0ugh1
Member
 
Registered: Oct 2011
Posts: 59

Original Poster
Rep: Reputation: Disabled
i would drop out all 24 long words from a file with multiple grep as above:
grep '[bcdfghjklmnpqrstvzxyw]\{16,24\}'| grep '[aeiou]\{0,6\}'| grep '[0123456789]\{0,6\}'|
but i would like to exclude more than 4 subsequent vowels, 4 subsequent consonants, 3 subsequent numbers and exclude words with more than 15 different consonants.
so acceptable words would have from 16 to 24 consonants, 0 to 6 vowels, 0 to 6 numbers, not more than 4 subsequent vowels, 4 subsequent consonants, 3 subsequent numbers and exclude words with more than 15 different consonants.
accepted ex.:
bcddddeffghklmnn0aezxwsw
i am an almost completly newbie. thx 1more time.

Last edited by ut0ugh1; 10-26-2011 at 06:17 PM.
 
Old 10-26-2011, 05:46 PM   #4
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
First of all, please use [code][/code] tags around your code, to preserve formatting and to improve readability.

Second, please give us a real-life example of the input text, and what kind of output you want from it. Also, could you explain your purpose for wanting to do this, so we can understand the context better?

And what exactly is the problem you're having with the code you have already?

Your first grep pattern in particular seems off to me. Does your input really have lines with strings of 16-24 consecutive consonants in them?

Last edited by David the H.; 10-26-2011 at 05:50 PM. Reason: minor mod
 
Old 10-26-2011, 06:34 PM   #5
ut0ugh1
Member
 
Registered: Oct 2011
Posts: 59

Original Poster
Rep: Reputation: Disabled
Code:
| grep '[bcdfghjklmnpqrstvzxyw]\{16,24\}'| grep '[aeiou]\{0,6\}'| grep '[0123456789]\{0,6\}'|
sorry by "not more than 4 subsequent vowels, 4 subsequent consonants, 3 subsequent numbers and exclude words with more than 15 different consonants." i mean "not more than 4 same subsequent vowels (eg.: not xxxxaaaaaxxxxxxxxxxxxxxx, xxxxxxxxxxeeeeexxxxxxxxx), 4 same subsequent consonants (eg: not xxbbbbbxxxxxxxxxxxxxxxxx, xxxxxxxxxxxxxxxzzzzzxxxx), 3 same subsequent numbers (eg: xxxxxxxxxx1111xxxxxxxxxx, xxxxxxxxxxxxxxxxxxxx0000) and exclude words with more than 15 different consonants (eg: not 1bcdfufghlmnkpkqarrstv).
accepted eg.:
bcddddeffghklmnn0aezxwsw
dvq7umsylnrfzdd2qgmgofmt
wgammjawtnedivjxpgzcynx9
qicqulsmcrbmuampwatk7hih
 
Old 10-27-2011, 12:05 PM   #6
David the H.
Bash Guru
 
Registered: Jun 2004
Location: Osaka, Japan
Distribution: Arch + Xfce
Posts: 6,852

Rep: Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037Reputation: 2037
Sorry, it's still not clear to me. You'll have to break it down in more detail. First, as I asked, please give us a real-life example of the text, including both lines that you want, and lines that you don't want. Put it in code tags, to keep the formatting (this would also allow us to test possible solutions).

Then, please detail exactly what criteria constitute a desired line, and what constitutes excluded lines. Break it down into simple steps or sections, if possible (e.g. each line must first have ..., then ...., but not ....), with examples. And please separate your points with more whitespace. The solid blocks of text you're using are hard to read.

Also, lets make sure your terms are correct. Subsequent means "following", or "coming after" If you have "AB CD", then "CD" is subsequent to "AB". Consecutive means "in a continuous, unbroken string". "AAACCC" is three consecutive "A"s followed by three consecutive "C"s.


I do hope you realize that a chain of greps like you posted causes each one to filter the output of the previous command. It doesn't directly analyze the sequence inside each line.

For example, this appears to be what your grep commands do now (and actually, you should be using egrep/grep -E):

Input file (file.txt):
Code:
bcddddeffghklmnn0aezxwsw
dvq7umsylnrfzdd2qgmgofmt
wgammjawtnedivjxpgzcynx9
qicqulsmcrbmuampwatk7hih
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
1) Your first grep matches and prints out strings of 16-24 consecutive lowercase consonants:

Code:
$ egrep '[bcdfghjklmnpqrstvzxyw]{16,24}' file.txt
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
Notice that only the last three lines that I added match, because only they have 16+ consecutive consonants. None of the strings you gave above match this rule.

2) From the output of the last grep, match 0-6 consecutive vowels:

Code:
$ ...| egrep '[aeiou]{0,6}'
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
All three previous lines match, but probably not in the way you want them to. The last one actually matches twice.

3) Finally, find strings of 0-6 digits from the previous output:
Code:
$ ...| egrep '[0123456789]{0,6}'
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
Again, all three match, but the first one matches because it has zero digits in it, and the second one again has multiple matches.


It seems to me that what you really want is a context-sensitive match, with each section depending on what comes before it in the string. Now if you could explain exactly what a single line pattern should be then perhaps you can build it into a single regex. That is, if you wanted something like the following:

[16-24 consonants] followed by [0-6 vowels] followed by [0-6 digits]

Then a single grep like this would match the above text like so:

Code:
$ egrep '[bcdfghjklmnpqrstvzxyw]{16,24}[aeiou]{0,6}[0123456789]{0,6}' g_file.txt
bcdfghjklmnpqrstvwxaeiou
bcdfghjklmnpqrst234aeiou
aei12bcdfghjklmnpqrst01a
But this still wouldn't fulfill your final requirement of no more than 15 different consonants. That's not something grep/regex can do on its own. You'd need some kind of function to go through the string and count the number of different characters in it, then test that number for compliance.

So I think that you really need to do as jhwilliams suggested and use a real lexical parser, something that can analyze the whole string in context according to your desired rules. Or at least to use a full-featured text-processing language like perl. What you want is probably too complex for a few simple grep commands.
 
1 members found this post helpful.
Old 10-27-2011, 01:31 PM   #7
ut0ugh1
Member
 
Registered: Oct 2011
Posts: 59

Original Poster
Rep: Reputation: Disabled
i am a newbie so tell me you how to obtain from alphanumeric 24 character long words words such as
dvq7umsylnrfzdd2qgmgofmt
wgammjawtnedivjxpgzcynx9
qicqulsmcrbmuampwatk7hih
with grep or sed. thx
 
Old 10-27-2011, 02:12 PM   #8
crabboy
Senior Member
 
Registered: Feb 2001
Location: Atlanta, GA
Distribution: Slackware
Posts: 1,821

Rep: Reputation: 121Reputation: 121
Seems to me like you are still up to no good, but this time omitting your intent.

http://www.linuxquestions.org/questi...number-910006/

Closing thread.
 
  


Closed Thread



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Need help sorting large number of files by timestamp and then greping those files scottjn Linux - Newbie 4 01-14-2011 10:42 AM
Greping special character and removing it goelvish Programming 4 07-01-2010 09:52 AM
greping sendmail tekmann33 Linux - Server 1 09-13-2007 03:23 PM
About greping the "day string" in cal xxmustainexx Programming 2 10-20-2006 05:53 PM
Picking out the right file when ls | greping XST1 Programming 4 02-15-2005 08:32 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:13 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration