LinuxQuestions.org
Visit Jeremy's Blog.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 03-04-2010, 12:30 PM   #1
bahbahthelamb
Member
 
Registered: Jul 2006
Location: Fort Worth, Texas
Distribution: openSUSE 11, RHEL4 & 5, CentOS 5
Posts: 57

Rep: Reputation: 16
Using MySQL to manage rewrites for Apache


I am currently switching my dynamic virtual hosting from using static text map files and a bunch of minor rewrite rules to using PHP scripts. Starting with the script that manages all my 301's on the server. Inside httpd.conf:
Code:
RewriteEngine On
RewriteMap redirects prg:/path/to/redirects.php
RewriteCond ${redirects:%{HTTP_HOST}%{REQUEST_URI}} !^NO_REWRITE$
RewriteRule ^(.*)$ %1 [R=301,L]
Then inside redirects.php:
Code:
#!/usr/bin/php
<?php
set_time_limit(0);
$input = fopen('php://stdin','r');
while (1) {
	# -- Get Input Thread -->
	$original = strtolower(trim(fgets($input)));
	# -- Split Domain and Request and filter out WWW, if exists -->
	$request = preg_split("/\//", preg_replace("/^www\./", "", $original), 2);
	# -- Connect to database -->
	mysql_connect("localhost", "dbuser", "dbpass");
	# -- Select Database -->
	mysql_select_db("dbschema");
	# -- Run query to find if domain should get forwarded to another domain -->
	$r = mysql_query("SELECT `primary` FROM `forwards` WHERE `alias`='".$request[0]."' LIMIT 1;");
	# -- If forward is found, update domain name -->
	if ($domain = mysql_fetch_array($r)) $request[0] = $domain[0];
	# -- If request is being made -->
	if (isset($request[1])) {
		# -- Run query to find if request has moved -->
		$r = mysql_query("SELECT `newuri` FROM `redirects` WHERE `host`='".$request[0]."' AND `olduri`='".$request[1]."' LIMIT 1;");
		# -- If redirect is found, update request made -->
		if ($uri = mysql_fetch_array($r)) $request[1] = $uri[0];
		# -- Filter out direct request to directory indexes -->
		$request[1] = preg_replace("/index\.[a-z]*$/", "", $request[1]);
	}
	# -- Close Database Connection -->
	mysql_close();
	# -- Add www back in -->
	$request[0] = "www.".$request[0];
	# -- Merge domain and request -->
	$altered = implode ("/", $request);
	# -- Print NO_REWRITE if no change has been made -->
	if ($altered == $original) print "NO_REWRITE\n";
	# -- Otherwise, post updated URL -->
	else print "http://".$altered."\n";
}
?>
If I run this PHP script from the shell, it works flawlessly. I type in a resource and it prints out the proper request of the resource, I type in a proper request and it prints NO_REWRITE. When implemented with the server requests, the domain forwards work, as in I type www.aliasdomain.tld and it will update to www.primarydomain.tld (in accordance to the forwards table in MySQL). However nothing else works; the addition of www, the redirected resources, et cetera; all of them bring up a default Apache 301 page that says they have been permanently moved to http:///

I'm rather new to this, and I have searched a lot to get this far without posting to a forum for help. I feel confident that I know what I am doing with this, but now it is just that I am overlooking some minor detail.
 
Old 03-04-2010, 01:18 PM   #2
bahbahthelamb
Member
 
Registered: Jul 2006
Location: Fort Worth, Texas
Distribution: openSUSE 11, RHEL4 & 5, CentOS 5
Posts: 57

Original Poster
Rep: Reputation: 16
To test the proper input, I created a test.php:
Code:
#!/usr/bin/php
<?php
set_time_limit (0);
$input = fopen('php://stdin','r');
$file = fopen('data.txt','w');
while(1) {
	fwrite($file, fgets($input));
	print "done\n";
}
fclose($file);
?>
and I setup a dummy rewrite in httpd.conf:
Code:
RewriteMap test prg:/path/to/test.php
RewriteCond ${test:%{HTTP_HOST}%{REQUEST_URI}} ^DUMMY_CONDITION$
RewriteRule ^(.*)$ $1
This was just to test to directly see how the input thread is coming in. It was as I expected, www.domain.tld/path/to/resource.html; so now I am even more baffled, because, as far as I know, RewriteRule ^(.*)$ http://www.domain.com/path/to/resource.html [R=301,L] is the proper way to do a 301 redirect.

Last edited by bahbahthelamb; 03-04-2010 at 01:24 PM.
 
Old 03-08-2010, 01:19 PM   #3
bahbahthelamb
Member
 
Registered: Jul 2006
Location: Fort Worth, Texas
Distribution: openSUSE 11, RHEL4 & 5, CentOS 5
Posts: 57

Original Poster
Rep: Reputation: 16
Esoteric issue, I suppose. Ended up figuring out the solution on my own, and I will post it here in case someone else wants it. First, in MySQL:

Code:
CREATE DATABASE `site_registry`;
USE `site_registry`;

CREATE TABLE  `site_registry`.`forwards` (
  `id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `alias` varchar(100) NOT NULL,
  `primary` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=8 DEFAULT CHARSET=latin1;

CREATE TABLE  `site_registry`.`redirects` (
  `id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `host` varchar(100) NOT NULL,
  `olduri` varchar(200) NOT NULL,
  `newuri` varchar(200) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=5 DEFAULT CHARSET=latin1;

CREATE TABLE  `site_registry`.`robots` (
  `id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `domain` varchar(50) NOT NULL,
  `spiders` tinyint(1) NOT NULL DEFAULT '1',
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=2 DEFAULT CHARSET=latin1;

CREATE TABLE  `site_registry`.`vhosts` (
  `id` int(10) unsigned zerofill NOT NULL AUTO_INCREMENT,
  `host` varchar(100) NOT NULL,
  `path` varchar(100) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=81 DEFAULT CHARSET=latin1;
These are the basic tables (eventually, I will probably reorganize this data into a broader table and site related schemas, but for now I will KISS).

Now the rewrites in httpd.conf:
Code:
# -- Request Rewrites -->
  <IfModule mod_rewrite.c>
    RewriteEngine On
    # -- Rewrites for Domain Uniformity, Domain Forwards, and Permanently Moved Resources -->
    RewriteMap redirects prg:/path/to/redirects
    RewriteCond ${redirects:%{HTTP_HOST}%{REQUEST_URI}} !^GOOD$
    RewriteRule ^(.*)$ ${redirects:%{HTTP_HOST}%{REQUEST_URI}} [R=301]
    # -- Rewrite rule for newsletter requests -->
    RewriteRule ^/newsletters/(.*) /newsletters/index.php?id=$1
    # -- Rewrite for Non-spidered sites -->
    RewriteMap robots prg:/path/to/robots
    RewriteCond %{REQUEST_URI} ^/robots\.txt$
    RewriteCond ${robots:%{HTTP_HOST}} !^GOOD$
    RewriteRule ^/(.*)$ ${robots:%{HTTP_HOST}} [L]
    # -- Rewrites to find Document Root -->
    RewriteMap docroots prg:/path/to/docroots
    RewriteCond %{REQUEST_URI} !^/error/
    RewriteRule ^/(.*)$ ${docroots:%{HTTP_HOST}}$1
  </IfModule>
The first map, redirects, will take the variable www.domain.tld/resource.html and ensure that this doesn't need a HTTP 301 to redirect the request, here's the script:
Code:
#!/usr/bin/php
<?php
  # -- Turn off script execution limit -->
  set_time_limit (0);
  # -- Create input/output threads -->
  $input = fopen ('php://stdin', 'r');
  $output = fopen ('php://stdout', 'w');
  # -- Begin infinite loop -->
  while (1) {
    # -- Get data from input thread -->
    $original = strtolower (trim (fgets ($input)));
    # -- Strip www. if it exists and explode path -->
    $request = preg_split ("/\//", preg_replace ("/^www\./", "", $original), 2);
    # -- Connect to database -->
    mysql_connect ("hostname", "dbuser", "dbpass");
    mysql_select_db ("site_registry");
    # -- Check domain forward table to see if a rewrite is needed -->
    $r = mysql_query ("SELECT `primary` FROM `forwards` WHERE `alias`='".$request[0]."' LIMIT 1;");
    if ($domain = mysql_fetch_array ($r)) $request[0] = $domain[0];
    # -- Check redirect table for static rewrites -->
    if (isset ($request[1])) {
      $r = mysql_query ("SELECT `newuri` FROM `redirects` WHERE `host`='".$request[0]."' AND `olduri`='".$request[1]."' LIMIT 1;");
      if ($uri = mysql_fetch_array ($r)) $request[1] = $uri[0];
      # -- Filter request direct to directory index files -->
      $request[1] = preg_replace ("/index\.(html|php|htm|asp?)$/", "", $request[1]);
    }
    # -- Disconnect from database -->
    mysql_close ();
    # -- Add the www. (back) in -->
    $request[0] = "www.".$request[0];
    # -- Merge Request -->
    $altered = implode ("/", $request);
    # -- Check if host has been altered and output properly -->
    if ($altered == $original) fwrite ($output, "GOOD\n");
    else fwrite ($output, "http://".$altered."\n");
  # -- Despite the paradox, terminate the infinite loop -->
  }
?>
The second rewrite is for the newsletter requests to be able to use SEO friendly URLs. The third rewrite is to see if this site needs to block spider bots from indexing:
Code:
#!/usr/bin/php
<?php
  # -- Turn off script execution limit -->
  set_time_limit (0);
  # -- Create input/output threads -->
  $input = fopen ('php://stdin', 'r');
  $output = fopen ('php://stdout', 'w');
  # -- Begin infinite loop -->
  while (1) {
    # -- Get data from input thread -->
    $original = strtolower (trim (fgets ($input)));
    # -- Strip WWW and invert domain name into array -->
    $request = array_reverse(explode('.', preg_replace ("/^www\./", "", $original)));
    # -- Switch between highest domain name below TLD -->
    switch ($request[1]) {
      case "mycompany":
        # -- If mycompany base site, no robots -->
        if (!isset ($request[2])) {
          fwrite ($output, "GOOD\n");
          break;
        }
        # -- If mycompany sub-domain (internal management scripts), continue -->
      case "previewsite":
        # -- If on site preview domain or mycompany subdomain, redirect request to centralized robots.txt file -->
        fwrite ($output, "/path/to/robots.txt\n");
        break;
      default:
        # -- If it is a live site, look up in database if site should be spidered -->
        mysql_connect ("hostname", "dbuser", "dbpass");
        mysql_select_db ("site_registry");
        $r = mysql_query ("SELECT `spiders` FROM `robots` WHERE `domain`='".implode (".", array_reverse ($request))."' LIMIT 1;");
        if ($robot = mysql_fetch_array ($r)) {
          if (!$robot[0]) fwrite ($output, "/path/to/robots.txt\n");
          else fwrite ($output, "GOOD\n");
        } else fwrite ($output, "GOOD\n");
        mysql_close ();
    }
  # -- Despite the paradox, terminate the infinite loop -->
  }
?>
The final rewrite is the map to find the document root -->
Code:
#!/usr/bin/php
<?php
  # -- Turn off script execution limit -->
  set_time_limit (0);
  # -- Create input/output threads -->
  $input = fopen ('php://stdin', 'r');
  $output = fopen ('php://stdout', 'w');
  # -- Begin infinite loop -->
  while (1) {
    # -- Get data from input thread -->
    $original = strtolower (trim (fgets ($input)));
    # -- Strip www. and invert domain name into array -->
    $request = array_reverse(explode('.', preg_replace ("/^www\./", "", $original)));
    # -- Switch between highest domain name below TLD -->
    switch ($request[1]) {
      case "previewsite":
        # -- If it is a development/preview site, print out dynamic root -->
        fwrite ($output, "/var/www/clients/".$request[2]."/dev/\n");
        break;
      case "mycompany":
        # -- If it is a internal site, print out dynamic root -->
        if (!isset($request[2])) $request[2] = "default"; // For primary/root site
        fwrite ($output, "/var/www/internal/".$request[2]."/\n");
        break;
      default:
        # -- If it is a live site, look up in database for dynamic root -->
        mysql_connect ("hostname", "dbuser", "dbpass");
        mysql_select_db ("site_registry");
        $r = mysql_query ("SELECT `path` FROM `vhosts` WHERE `host`='".implode (".", array_reverse($request))."' LIMIT 1;");
        if ($root = mysql_fetch_array ($r)) fwrite ($output, "/var/www/".$root[0]."/\n");
        else fwrite ($output, "/var/www/internal/error/\n");
        mysql_close ();
    }
  # -- Despite the paradox, terminate the infinite loop -->
  }
?>
So now, from MySQL, I can mange all the 301 redirected resources, which sites should allow spider bots, and also manage document roots. I can also ensure standards and uniformity on requests to avoid calls for duplicate content (www versus non-www and ./index.html versus ./). Also all changes in the database occur immediately without an Apache restart. Now, I am sure that I will probably be rewriting these scripts to streamline policy definitions, and even write it where the policies are written to a database. Essentially, when this is done, I want to be able to mange the hundreds of sites that this company hosts through a centralized site registry.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Apache trailing slash problem for multiple rewrites deesto Linux - Server 6 01-16-2011 02:01 AM
Apache rewrites screwing up images deesto Linux - Server 7 01-22-2010 04:58 PM
LXer: Manage MySQL remotely with phpMyAdmin LXer Syndicated Linux News 0 03-25-2008 08:11 PM
Where can I get the ndbd binary to manage my mysql cluster? abefroman Programming 1 07-19-2007 03:28 PM
How do I manage my mysql database in Mandrake 10? Sapphiron Linux - Software 2 07-15-2004 01:16 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 12:27 AM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration