LinuxQuestions.org
Register a domain and help support LQ
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-09-2013, 01:43 PM   #1
thegloaming
LQ Newbie
 
Registered: Jul 2013
Posts: 5

Rep: Reputation: Disabled
Match specific number of character sets using regexp in find


Hi,

I am trying to find all of the 3-6 letter .php filenames in a specific directory (the 3-6 letters excludes the .php).

I've tried the following along with many variations of this:

find . -maxdepth 1 -regextype posix-egrep -regex ".*[a-zA-Z]\{9-12\}.php"

(9-12 should match the entire filename including the .php and the ./ at the start of the path which I believe is correct:S)

Even without the .php on the end I can't get my command to return any files at all that match any character number I set in the curly brackets. I was using the default regex type but after extensive googling thought maybe the regex type needed to be changed.

Please help
 
Old 10-09-2013, 02:53 PM   #2
Keith Hedger
Senior Member
 
Registered: Jun 2010
Location: Wiltshire, UK
Distribution: Linux From Scratch, Slackware64, Partedmagic
Posts: 2,252

Rep: Reputation: 559Reputation: 559Reputation: 559Reputation: 559Reputation: 559Reputation: 559
Try
Code:
find /path/to/folder -iname "*.php"
 
Old 10-09-2013, 02:53 PM   #3
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,483

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
With this InFile ...
Code:
1.php
12.php
123.php
1234.php
12345.php
123456.php
1234567.php
12345678.php
... this awk ...
Code:
awk '{if (length($0)>6 && length($0)<11) print}' $InFile >$OutFile
... produced this OutFile ...
Code:
123.php
1234.php
12345.php
123456.php
Daniel B. Martin
 
1 members found this post helpful.
Old 10-09-2013, 02:55 PM   #4
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,483

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
With this InFile ...
Code:
1.php
12.php
123.php
1234.php
12345.php
123456.php
1234567.php
12345678.php
... these seds ...
Code:
sed -n '/^.\{7\}/p' $InFile |sed -n '/^.\{11\}/!p' >$OutFile
... produced this OutFile ...
Code:
123.php
1234.php
12345.php
123456.php
Daniel B. Martin
 
1 members found this post helpful.
Old 10-09-2013, 03:05 PM   #5
thegloaming
LQ Newbie
 
Registered: Jul 2013
Posts: 5

Original Poster
Rep: Reputation: Disabled
Appreciate the responses.

Is anyone aware of a way for me to do it with regular expressions in a find command? The filenames must not contain any numbers.

Just for my own education:-)
 
Old 10-09-2013, 04:02 PM   #6
Keith Hedger
Senior Member
 
Registered: Jun 2010
Location: Wiltshire, UK
Distribution: Linux From Scratch, Slackware64, Partedmagic
Posts: 2,252

Rep: Reputation: 559Reputation: 559Reputation: 559Reputation: 559Reputation: 559Reputation: 559
finds support for regex's is not good try piping a basic find to to grep or sed like so
Code:
find -iname "*.php"|grep ...
this gives just files ending in .php to grep and you can then use grep or sed's regex handling which is MUCH better than find.
 
Old 10-09-2013, 04:05 PM   #7
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Arch
Posts: 3,013

Rep: Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225
Code:
find . -maxdepth 1 -regextype posix-egrep -regex ".*[a-zA-Z]\{9-12\}.php"
The posix-egrep regextype doesn't use backslashes for intervals (also comma instead of dash):

Code:
find . -maxdepth 1 -regextype posix-egrep -regex ".*[a-zA-Z]{9,12}.php"
Quote:
(9-12 should match the entire filename including the .php and the ./ at the start of the path
The {9,12} only applies to the [a-zA-Z] part.
 
Old 10-09-2013, 05:54 PM   #8
thegloaming
LQ Newbie
 
Registered: Jul 2013
Posts: 5

Original Poster
Rep: Reputation: Disabled
Match specific number of character sets using regexp in find

Hey ntubski

Thanks for tips but command still didn't pickup character length parameter. Should I just pipe to sed or awk?
 
Old 10-09-2013, 05:59 PM   #9
chrism01
LQ Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.8, Centos 5.10
Posts: 17,240

Rep: Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324Reputation: 2324
The 1st post said 3-6 letters+'.php, so I'd expect something like
Code:
[a-zA-Z]{3,6}.php

# more limiting, using sed/perl notation
/^[a-zA-Z]{3,6}.php$/
Untested
 
Old 10-09-2013, 06:10 PM   #10
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Arch
Posts: 3,013

Rep: Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225
Quote:
command still didn't pickup character length parameter.
Works here, but as I mentioned, the {9,12} only applies to the [a-zA-Z] part. Furthermore, the leading .* basically means only the lower bound is enforced.

Code:
$ ls
abcdefgh.php   abcdefghij.php   abcdefghijkl.php
abcdefghi.php  abcdefghijk.php  abcdefghijklm.php
$ printf '%s\n' * | awk -F. '{printf("|%s| = %d\n", $1, length($1))}'
|abcdefgh| = 8
|abcdefghi| = 9
|abcdefghij| = 10
|abcdefghijk| = 11
|abcdefghijkl| = 12
|abcdefghijklm| = 13
$ find . -maxdepth 1 -regextype posix-egrep -regex ".*[a-zA-Z]{9,12}.php"
./abcdefghi.php
./abcdefghij.php
./abcdefghijk.php
./abcdefghijkl.php
./abcdefghijklm.php
$ find . -maxdepth 1 -regextype posix-egrep -regex "\./[a-zA-Z]{9,12}.php"
./abcdefghi.php
./abcdefghij.php
./abcdefghijk.php
./abcdefghijkl.php
chrism01's suggestion to use ^$ works with find too:
Code:
$ find . -maxdepth 1 -regextype posix-egrep -regex "^\./[a-zA-Z]{9,12}\.php$"
./abcdefghi.php
./abcdefghij.php
./abcdefghijk.php
./abcdefghijkl.php
Makes no difference with the set of files I have, but it would rule out stuff like abcdefghijk.php.gotcha
 
1 members found this post helpful.
Old 10-09-2013, 09:43 PM   #11
danielbmartin
Senior Member
 
Registered: Apr 2010
Location: Apex, NC, USA
Distribution: Mint 17.3
Posts: 1,483

Rep: Reputation: 411Reputation: 411Reputation: 411Reputation: 411Reputation: 411
Quote:
Originally Posted by thegloaming View Post
The filenames must not contain any numbers.
My solutions used a sample input file with numbers as a convenient way to demonstrate that the selection criteria had been satisfied. I might just as well have used
Code:
a.php
ab.php
abc.php
abcd.php
abcde.php
abcdef.php
abcdefg.php
abcdefgh.php
Daniel B. Martin
 
Old 10-10-2013, 05:07 PM   #12
thegloaming
LQ Newbie
 
Registered: Jul 2013
Posts: 5

Original Poster
Rep: Reputation: Disabled
Appreciate all the responses guys.

I have got the command to work with your recommendation ntubski:
find . -maxdepth 1 -regextype posix-egrep -regex "\./[a-zA-Z]{3,6}.php"

You say in your ticket:
Quote:
the leading .* basically means only the lower bound is enforced.
Why is this the case?
 
Old 10-10-2013, 05:54 PM   #13
ntubski
Senior Member
 
Registered: Nov 2005
Distribution: Arch
Posts: 3,013

Rep: Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225Reputation: 1225
Quote:
Originally Posted by thegloaming View Post
You say in your ticket:

Quote:
the leading .* basically means only the lower bound is enforced.
Why is this the case?
. means match anything, * means 0 or more. The .* will "eat" as many characters as it can while letting the next part succeed, so as long as there are more characters than the lower bound there will be a match, eg:
Code:
matching ./abcdefghijklm.php to .*[a-zA-Z]{9,12}.php
.* "eats" 4 of the letters leaving 9 for the next part
.* can't "eat" less than 0 characters, so if there are too few (less than the lower bound) the match fails. That's why only the lower bound has effect.
 
1 members found this post helpful.
Old 10-10-2013, 05:58 PM   #14
thegloaming
LQ Newbie
 
Registered: Jul 2013
Posts: 5

Original Poster
Rep: Reputation: Disabled
Thank you very much:-)
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
PHP + MYSQL: Invalid parameter number: number of bound variables does not match... OrangeGrover Linux - Software 6 05-08-2013 06:36 PM
sed replacing a specific character with a specific number ieatbunnies Linux - Newbie 2 11-04-2010 11:14 AM
awk regexp for one character match nemobluesix Linux - General 7 02-16-2009 11:50 PM
REGEXP Match * through multiple lines ? ALInux Linux - Software 12 08-14-2007 08:39 AM
find certain character and replace with a number yongitz Programming 1 01-18-2007 08:40 AM


All times are GMT -5. The time now is 07:58 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration