LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Non-*NIX Forums > Programming
User Name
Password
Programming This forum is for all programming questions.
The question does not have to be directly related to Linux and any language is fair game.

Notices


Reply
  Search this Thread
Old 06-02-2011, 06:53 AM   #1
guest
Member
 
Registered: May 2003
Distribution: CentOS 5 64 bit
Posts: 255

Rep: Reputation: 30
Parse final domain from redirect link.


I have a collection of redirect links I need to grab the final domain from (indicated after "->"):

http://www.shareasale.com/u.cfm?d=60...28411&u=503118 -> priceangels.com
http://scripts.affiliatefuture.com/A...tracking=&url= -> letsgostrolling.com

I'm thinking PHP/cURL is the way to do it. I've searched the net but failed to find a solution that works for all redirection links. Any help is greatly appreciated!

Last edited by guest; 06-06-2011 at 02:03 AM.
 
Old 06-03-2011, 12:21 AM   #2
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
I recommend using wget for this. Supply the URL or URLs as command-line parameters for this scriptlet:
Code:
#!/bin/bash
while [ $# -gt 0 ]; do
    wget -vO /dev/null "$1" 2>&1 | sed -ne 's|Connecting to \([^|]*\).*$|\1|p' | tail -n 1
    shift 1
done
 
Old 06-06-2011, 12:07 AM   #3
guest
Member
 
Registered: May 2003
Distribution: CentOS 5 64 bit
Posts: 255

Original Poster
Rep: Reputation: 30
Found a solution but the code spits out a bunch of other info among the final url. Is there a way to just get the final url destination out of this garble?
Code:
<?php
function getWebPage($url, $redirectcallback = null){
    $ch = curl_init($url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
    curl_setopt($ch, CURLOPT_HEADER, true);
    curl_setopt($ch, CURLOPT_NOBODY, false);
    curl_setopt($ch, CURLOPT_TIMEOUT, 10);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
    curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv:1.8.1) Gecko/20061024 

BonEcho/2.0");

    $html = curl_exec($ch);
    $http_code = curl_getinfo($ch, CURLINFO_HTTP_CODE);
    if ($http_code == 301 || $http_code == 302) {
        list($httpheader) = explode("\r\n\r\n", $html, 2);
        $matches = array();
        preg_match('/(Location:|URI:)(.*?)\n/', $httpheader, $matches);
        $nurl = trim(array_pop($matches));
        $url_parsed = parse_url($nurl);
        if (isset($url_parsed)) {
            if($redirectcallback){ // callback
                 $redirectcallback($nurl, $url);
            }
            $html = getWebPage($nurl, $redirectcallback);
        }
    }
    return $html;
}

function trackAllLocations($newUrl, $currentUrl){
    echo $currentUrl.' ---> '.$newUrl."\r\n";
}

print htmlentities(getWebPage('http://www.shareasale.com/u.cfm?d=60472&m=28411&u=503118'));

//$myUrlInfo = parse_url( $thisurl ):
//echo $myUrlInfo["url",track];
print $myUrlInfo[$url];
?>

Last edited by guest; 06-06-2011 at 04:26 AM.
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
DNS redirect unresolved domain ebros Slackware 12 01-29-2011 01:59 AM
ld: final link failed: Memory exhausted errigour Linux - Hardware 7 07-02-2010 06:27 PM
domain redirect squid sunlinux Linux - Server 1 03-15-2010 02:26 AM
exim redirect outgoing mail for one domain to another server RKris Linux - Server 4 10-02-2009 06:00 AM
Parse error: parse error, unexpected $ in /home/content/d/o/m/domain/html/addpuppy2.p Scooby-Doo Programming 3 10-25-2007 09:41 AM

LinuxQuestions.org > Forums > Non-*NIX Forums > Programming

All times are GMT -5. The time now is 06:01 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration