LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices

Reply
 
Search this Thread
Old 01-29-2011, 09:31 AM   #1
gvanto
Member
 
Registered: Oct 2009
Posts: 40

Rep: Reputation: 0
Question regex problem to find min,max length words in file


Hi i am trying to find all 3 and 4-character length words in my file (which is huge and has alot of entries in it, a big fat wordlist!).

My attempt with this regular expression (which I thought should work, found something on length search here: http://www.gammon.com.au/mushclient/regexp.htm {n,m} )

cat sorted_noapostrophe.txt| grep '.{3,4}'

but it returns no results?

Also to find any words starting with 'f' which are between 3 and 5 characters (inclusive) long, how can this be done?

Any help would be greatly appreciated!

gvanto
regex noob :-)
 
Old 01-29-2011, 09:45 AM   #2
grail
Guru
 
Registered: Sep 2009
Location: Perth
Distribution: Manjaro
Posts: 7,517

Rep: Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896Reputation: 1896
You are on sort of the right track. The use of {3,4} is actually an extend regular expression syntax which means you either need to use egrep or grep -E.
The problem you will find is that as dot (.) matches anything your idea will also match the following:
Code:
there
th e
on3
it)
All of these have 3 - 4 of any character.
 
Old 01-29-2011, 09:52 AM   #3
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

Code:
grep -owE "[[:alpha:]]{3,4}" sorted_noapostrophe.txt
Hope this helps.
 
Old 01-30-2011, 04:48 PM   #4
gvanto
Member
 
Registered: Oct 2009
Posts: 40

Original Poster
Rep: Reputation: 0
Thanks much guys! This ended up working fine (druuna slight mod to yours, I need stuff like this too: 'état' which :alpha doesn't return)

Code:
grep -owE ".{3,4}" sorted_noapostrophe.txt
I actually used grep -e before (thinking it was extended grep issue) but should have been -E (and -o and -w)

oh hang on, just noticed, the above command also returns 2-letter words?!
(the file contains only single words, all terminated by a newline ... is newline then also considered a character?

This seems to work same as druuna's: grep -owE "\w{3,4}" sorted_noapostrophe.txt but now I dont have 'état' ... its OK)

Thanks a million!
gvanto
 
Old 01-30-2011, 05:34 PM   #5
druuna
LQ Veteran
 
Registered: Sep 2003
Posts: 10,532
Blog Entries: 7

Rep: Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374Reputation: 2374
Hi,

\w equals [[:alnum:]] which equals [0-9A-Za-z]. That explains the inclusion of normal chars and not état.

You might want to look into equivalence classes, but I do believe they can only be assigned 1 letter at the time.... You end up with something like this:
Code:
grep -owE "([[:alnum:]]|[[=e=]]|[[=a=]]){3,4}" sorted_noapostrophe.txt
Needs to be expanded for the o/u/c/etc, but you get the idea, I hope

BTW the "pipes" (..|..|..) OR all the reg exp together.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
[SOLVED] Looping through csv file to calculate avg/min/max HuJo Linux - Newbie 10 09-02-2010 09:25 AM
regular expressions using flex: max and min length neioo Programming 1 07-30-2009 08:51 PM
C++: Extract numbers from an input file, find av & max, min programmernew Programming 4 10-27-2008 10:00 AM
Please Help Change Password Min Length from 6 to 8 mccartjd Linux - Newbie 3 05-08-2008 06:00 AM
Can't Change Min Password Length in RHL7.2 MikHud Linux - Security 2 04-16-2002 05:36 AM


All times are GMT -5. The time now is 05:13 AM.

Main Menu
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration