LinuxQuestions.org
View the Most Wanted LQ Wiki articles.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices



Reply
 
Search this Thread
Old 05-24-2006, 02:30 PM   #1
Isotonik
LQ Newbie
 
Registered: Nov 2005
Posts: 9

Rep: Reputation: 0
Regex nightmare


I've been investigating the filtering capabilities of Squid using very simple regex patterns - a text file with one rule per one line.

I've read many tutorials but they seem to be rather confusing. To take a few imaginary examples.

Q: How do you block any website that has the exact string "/badword.php" in the URL.

Some tutorials claim that the regex should be "/badword\.php" ("/" is not a reserved word). Some tutorials claim that it should be "\/badword\.php" ("/" IS a reserved word). I've tried the first pattern and it appears to work.

Q: How do you block any address that has the exact string "http://badwords." in the URL?

Again, either "http:\/\/badwords\.", "http:\/\/badwords.* or simply "http://badwords\." (The third pattern appears to work).

I'm confused. I've tested and tested and everything works....but what am I doing wrong?
 
Old 05-25-2006, 12:18 AM   #2
vls
Member
 
Registered: Jan 2005
Location: The grassy knoll
Distribution: Slackware,Debian
Posts: 192

Rep: Reputation: 31
The '/' character isn't special in regexes. The '\' is the escape character so you can use specials (such as the .) as a literal character match. Putting \/ in a regex doesn't hurt or help. Well, it makes the regex ugly as you have noticed.

But some progs (like sed) use the construct '/regex/command' to match and manipulate the string. Then you need to escape that 'http://'. Still Confused? So am I.

Hopefully, a real regexp guru will stop by.
 
Old 05-25-2006, 03:10 AM   #3
chrism01
Guru
 
Registered: Aug 2004
Location: Sydney
Distribution: Centos 6.6, Centos 5.10
Posts: 16,324

Rep: Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041Reputation: 2041
Most langs use a start & end char eg '/' to determine where the match begins & ends.
For substituions, you need to use it 3 times eg
match
/patternhere/
substitute
s/matchstring/replacestring/
Note however, that each tool/lang has a slightly different regex engine, so consult the manual for that lang. Some langs even have their own regex engnine, plus the ability to specify using Perl-compatible regex engine (eg php does this: pcreg() ).
The definitve guide (apart from lang/tool specific books) is:
Mastering Regular Expressions
http://www.oreilly.com/catalog/regex/
or
http://www.amazon.com/gp/product/156...lance&n=283155
aka the 'Owl' book
Highly recommended.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regex help siyisoy Programming 4 04-07-2006 06:32 AM
Regex Help cmfarley19 Programming 5 03-31-2005 11:13 PM
Help with Sed and regex cmfarley19 Programming 6 11-18-2004 02:09 PM
ip address REGEX Robert0380 Programming 16 08-15-2003 02:00 PM
regex stumper Silly22 Programming 4 07-07-2002 06:10 PM


All times are GMT -5. The time now is 10:02 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
identi.ca: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration