Regular expression to match a valid URL string
I want a regular expression to validate an input with the format for a URL (http only).
What should I check for? I know that a valid HTTP URL should begin with http:// But what other condition should I check for? I can build the expression myself, but I need some pointers on what are the rules for checking for a valid URL that should be a web page. Allowed chars? Format? And so on. Regards. |
Is this a sufficient regular expression?
Code:
http://[\d\w][-._\d\w]*[\d\w] NOTE: There is another question. The eregi function in PHP looks for a match within the string. that is even if a substring matches the regexp, it returns a valid result. How do I make sure that the whole string matches the regular expression. I tried comparing the return value of the function with the length of the total string, but that didn't work. |
Valid URL Suggestions
That looks good for a first stab, and I'm not able to come up with a fool-proof way to match a valid URL by myself, but I noticed a few things about your expression:
(1) URLs can contain an ending slash (2) URLs can contain escaped characters, like %20 for space (3) \w includes \d (\w matches [a-zA-Z_0-9] (4) You may need to escape / and : Also, do you want to consider GET data from CGI scripts? If so, you need to add at least ? = & This definitely isn't a complete list of everything you need, but hopefully it will help somewhat. |
Thanks for the tips. I am not a guru at regular expressions.
I used the visual KDE regular expression editor to create the regexp string. ;) Are you sure that the syntax has a problem there? Because an earlier regexp created using it worked fine with the eregi function. What are the potential pitfalls of the regexp you see there? |
You could also avoid regexp
PHP Code:
|
The real question here is not what characters are allowed but sequences in which they are allowed.
For example my current expression validates even: http://www............asdasd..asd///////////as.dsd.a????as If you get the idea... It's really tough building reg exp when you have only a vague idea of what you're trying to validate. If anybody can help me with this, I'd be really grateful.:D EDIT: I guess the LQ URL validator also has the same reg exp problem in recognizing valid URLs... ;) the string I typed was parsed as a URL. |
Quote:
I wanted a more "compatible" solution, but I guess your idea is quite good. I will use it as a last resort, thanks! |
If your web hosting provider does not allow fopen url's,
then it is not a good provider anyway, avoid it too ;) |
An Almost-Working Perl Idea
Here's a little idea I've been cooking up in Perl - the script parsing has an issue that I can't put my finger on, though.
Code:
if($url =~ m/http:\/\/ |
Quote:
|
For my part, I 'd put an indication that my program requires
allow_url_fopen = On in php.ini. Some php packages requires PEAR, some requires PHP compiled and installed as CGI, some requires GD library etc... |
Ok. In any case I need to check if the http:// is part of the URL because otherwise the <a href=""> tag in HTML treats it as a relative URL.
|
Heres a lovely little regexp allready submitted 5 years ago. It takes a URL and changes it into a link.
Check it out. http://aspn.activestate.com/ASPN/Coo...x/Recipe/59864 |
That is a monster of a regular expression. :eek: I really don't need that level of strictness in any case.
Moreover my aim is not to convert to a URL within a text. Just to validate the format in a URL inputted. Thanks for the link, though. I'll see if I can use that. |
All times are GMT -5. The time now is 03:10 AM. |