LinuxQuestions.org
Visit Jeremy's Blog.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-15-2009, 08:13 PM   #1
yuye811
LQ Newbie
 
Registered: Dec 2007
Posts: 17

Rep: Reputation: 0
Regular Expression Question


Any regular expression expert could lend a hand on this follow problem.



Php Code for converting any text URLs into HTML Links


Version #1:

$str = preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is', '\\1<a target=_blank href="\\2">\\2</a>', $str);


This one is nice, when URL is reasonable length. so
http://www.site.com ---> <a href="http://www.site.com">http://www.site.com</a>

But when the URL is extremely long, for example a download link with long verification key, the output link would break the layout and look ugly.




Version #2:
$str = preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is', '\\1<a target=_blank href="\\2">Link »</a>', $str);

So in attempt to fix the previous problem, we can also convert the actually link text to "Link >>"
http://www.site.com ---> <a href="http://www.site.com">Link >></a>

But this one doesn't look as intuitive for the users.



Version #3:

$str = preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is', '\\1<a target=_blank href="\\2" class=fix-width-diaplay-block-overflow-hidden>\\2</a>', $str);

This version I define the css to restrict the output html links to avoid layout breaks.
class=fix-width-diaplay-block-overflow-hidden

But this is not natural either. I am a bit picky on things.







Is there a simple regular expression to truncate and keep the first 30 characters of link text in the first version, something like this:
$str = preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is', '\\1<a target=_blank href="\\2">\\2 {0,30} </a>', $str);


I know the {0,30} is probably wrong syntax, any expert here could point out the right way of achieving this desired output ?




---------------

Here is function I am trying to improve:

PHP Code:
public static function hyperlink$str ){

    
// match protocol://address/path/file.extension?some=variable&another=asf%
    
$str preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is''\\1<a target=_blank href="\\2">Link »</a>'$str);
    
    
// match www.something.domain/path/file.extension?some=variable&another=asf%
    
$str preg_replace('#(^|\s)((www|ftp)\.([^\s\w/]?[\w/+=,])*)#is''\\1<a target=_blank href="http://\\2">Link »</a>'$str);
    
    
// match name@address
    
$str preg_replace('#(^|\s)(([a-z0-9._%+-]+)@(([.-]?[a-z0-9])*))#is''\\1<a href="mailto:\\2">\\2</a>'$str);
    
    return 
$str;

 
Old 06-15-2009, 09:59 PM   #2
grizly
Member
 
Registered: Nov 2006
Location: Melbourne Australia
Distribution: Centos, RHEL, Debian, Ubuntu, Mint
Posts: 128

Rep: Reputation: 16
This might help:

PHP Code:
/**
 * Allows you to output a large quantity of text, with a character limit.. useful for variable quantities.
 * @param string text
 * @param integer limit
 */
public static function makeSmaller($text$limit)
{
    if (
strlen($text) > $limit)
    {
        return 
substr($text0strrpos(substr($text0$limit), ' ')) . '...';
    } else
    {
        return 
$text;
    }

Then.. something like..

PHP Code:
public static function hyperlink$str ){
    
//Make output text presentably short
    
$display_str makeSmaller($str25);

    
// match protocol://address/path/file.extension?some=variable&another=asf%
    
$str preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is''\\1<a target=_blank href="\\2">'.$display_str.' »</a>'$str);
    
    
// match www.something.domain/path/file.extension?some=variable&another=asf%
    
$str preg_replace('#(^|\s)((www|ftp)\.([^\s\w/]?[\w/+=,])*)#is''\\1<a target=_blank href="http://\\2">'.$display_str.' »</a>'$str);
    
    
// match name@address
    
$str preg_replace('#(^|\s)(([a-z0-9._%+-]+)@(([.-]?[a-z0-9])*))#is''\\1<a href="mailto:\\2">\\2</a>'$str); 

    return 
$str;


Last edited by grizly; 06-15-2009 at 10:10 PM. Reason: Tested, works
 
Old 06-15-2009, 10:19 PM   #3
grizly
Member
 
Registered: Nov 2006
Location: Melbourne Australia
Distribution: Centos, RHEL, Debian, Ubuntu, Mint
Posts: 128

Rep: Reputation: 16
lmfao, actually, that code converts the entire text-string into one big link.. although, the link does go to the first URL found..

Hmm.. I actually understand the problem now.. don't have an answer, but found this RFC with some interesting details:

Quote:
Originally Posted by RFC 3986
Appendix B. Parsing a URI Reference with a Regular Expression

As the "first-match-wins" algorithm is identical to the "greedy"
disambiguation method used by POSIX regular expressions, it is
natural and commonplace to use a regular expression for parsing the
potential five components of a URI reference.

The following line is the regular expression for breaking-down a
well-formed URI reference into its components.


^(([^:/?#]+)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
12 3 4 5 6 7 8 9

The numbers in the second line above are only to assist readability;
they indicate the reference points for each subexpression (i.e., each
paired parenthesis). We refer to the value matched for subexpression
<n> as $<n>. For example, matching the above expression to

http://www.ics.uci.edu/pub/ietf/uri/#Related

results in the following subexpression matches:

$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related

where <undefined> indicates that the component is not present, as is
the case for the query component in the above example. Therefore, we
can determine the value of the five components as

scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
http://www.ietf.org/rfc/rfc3986.txt

Still needs that push between "grabbing value" and "using shorter value".. which I don't know if you can do without using two operations.

1. find matching URL, place URL in tmpVar
2. create shorter version, place in tmpVar2
3. construct URL and return <a href="tmpVar">tmpVar2</a> etc..
 
Old 06-15-2009, 10:20 PM   #4
grizly
Member
 
Registered: Nov 2006
Location: Melbourne Australia
Distribution: Centos, RHEL, Debian, Ubuntu, Mint
Posts: 128

Rep: Reputation: 16
wow.. just noticed the board does that automatically,

1. find out what sort of board this is
2. inspect code
3. ..?
4. Profit!
 
Old 06-16-2009, 07:39 AM   #5
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 510

Rep: Reputation: 53
Isn't there a PHP module or class splitting URLs into their parts e.g. host, tld, domain, query string etc. pp.?

You could than just grab the host.domain.tld-part to put between

Code:
<a href="http://veryverylongurlwithbellsandwhistles">http://some.host.domain.tld</a>
 
Old 06-16-2009, 07:45 AM   #6
Su-Shee
Member
 
Registered: Sep 2007
Location: Berlin
Distribution: Slackware
Posts: 510

Rep: Reputation: 53
Isn't there a PHP module or class splitting URIs into their parts e.g. host, tld, domain, query string etc. pp.?

You could than just grab the host.domain.tld-part to put between

Code:
<a href="http://veryverylongurlwithbellsandwhistles">http://some.host.domain.tld</a>
(I just know the Perl modules to archieve this without regex, so I just assume that there is something similar in PHP...)
 
Old 06-17-2009, 12:28 AM   #7
yuye811
LQ Newbie
 
Registered: Dec 2007
Posts: 17

Original Poster
Rep: Reputation: 0
I am looking for something "simplistic" -- one line of regular expression, run it once only.

Otherwise, there are quite a few ways to achieve the desired results, such as but not limited:

1. filter the same text twice, 1st run, create the hyper links, second run, make the link-text shorter and add ... after it.

2. use preg_replace_callback() instead of preg_replace()

Thank you for the interests, time in replying

$str = preg_replace('#(^|\s)([a-z]+://([^\s\w/]?[\w/+=,])*)#is', '\\1<a target=_blank href="\\2">\\2[<--I only want the first 50 letters from this \\2--] ...</a>', $str);
 
Old 06-19-2009, 01:55 AM   #8
angrybanana
Member
 
Registered: Oct 2003
Distribution: Archlinux
Posts: 147

Rep: Reputation: 21
If you're set on limiting it to only regex, you could do something like this:
Code:
$str = preg_replace('#(^|\s)(([a-z]+://([^\s\w/]?[\w/+=,]){0,50})([^\s\w/]?[\w/+=,])*)#is', '\\1<a target=_blank href="\\2">\\3</a>', $str);
Basically it wraps everything up in another group, one group containing the {0,50} cutoff version, another encompassing both {0,50} and * combined.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Regular Expression Question supasharp Linux - General 3 06-12-2007 12:58 PM
Regular Expression Question windisch Programming 8 05-22-2007 03:27 PM
Regular expression question gauge73 Linux - General 5 10-28-2005 11:33 AM
regular expression question zero79 Linux - Software 1 07-11-2005 07:03 PM
regular expression question Gantrep Linux - Software 2 04-20-2003 04:24 PM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:37 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration