LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Home Forums Tutorials Articles Register
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Software
User Name
Password
Linux - Software This forum is for Software issues.
Having a problem installing a new program? Want to know which application is best for the job? Post your question in this forum.

Notices


Reply
  Search this Thread
Old 11-07-2011, 07:57 AM   #31
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208

Hello Nominal Animal

While studying Rewrite Rules as an alternative way to set OMEGA_CONFIG_FILE, I stumbled on RewriteCond HTTP_REFERER. Subject to testing, it could be used to rewrite URLs generated by the user clicking on a link in the search result hit list, rewriting /dir/file to /srv/docoll/dir/file when the referrer matches "^/omega/cgi-bin/omega\?" (if the query string is stripped before the rewrite engine sees the URL, the regex would be "^/omega/cgi-bin/omega$").

If it proves feasible, there would be no need for /var/www/docoll-results and no need for the user to see the /docoll-results prefix on links in the search result hit list. Perhaps the same technique could be extended to support multiple collations of files instead of only /srv/docoll by building the instance name into the reported omega CGI executable path. Both of those are desirable objectives.

An example of the current referrer strings:
Code:
http://192.168.168.51/omega/cgi-bin/omega?P=meeting&DEFAULTOP=or&START=&END=&COLLAPSE=&B=Edoc&B=Edocx&B=Eodp&B=Eods&B=Eodt&B=Epdf&B=Epps&B=Eppsx&B=Eppt&B=Ertf&B=Etxt&B=Exls&B=Exlsx&DB=docoll&FMT=docoll&xDB=docoll&xFILTERS=--O
EDIT: now successfully setting OMEGA_CONFIG_FILE by ...
Code:
RewriteRule ^ - [env=OMEGA_CONFIG_FILE:/etc/opt/docoll/search-0.2/omega.conf]
... so exploring the RewriteCond HTTP_REFERER idea.

EDIT 2: without success so far. Here's conf.d/docoll
Code:
RewriteEngine On
RewriteLog /var/log/apache2/rewrite.log
RewriteLogLevel 7

LogLevel debug
LogFormat "%h %l %u %t \"%r\" %>s %O \"%{Referer}i\" \"%{User-Agent}i\" \"%e\"" combined

ScriptAlias /omega/cgi-bin/ /usr/lib/cgi-bin/omega/

<Directory /var/www/docoll>
    AllowOverride   FileInfo
    Options         FollowSymlinks ExecCGI
    Order           Deny,Allow
    Deny            From all
    Allow           From all    # Quick and dirty fix
    RewriteRule     ^ - [env=OMEGA_CONFIG_FILE:/etc/opt/docoll/search-0.2/omega.conf]
    #RewriteCond     HTTP_REFERER "^/omega/cgi-bin/omega\?"
    #RewriteCond     HTTP_REFERER "/omega/cgi-bin/omega"
    #RewriteCond     %{HTTP_REFERER} "/omega/cgi-bin/omega"
    RewriteCond     %{HTTP_REFERER} ".*/omega/cgi-bin/omega.*"
    RewriteRule     ^/*(.*)$ "/srv/docoll/$1"
    Redirect        "/docoll/" "/omega/cgi-bin/omega?DB=docoll&FMT=docoll"
</Directory>
Apache's error.log, after clicking on http://192.168.168.51/Meetings/All%20Member%20Meeting%20Minutes.xls (displayed as /Meetings/All%20Member%20Meeting%20Minutes.xls) in the search hits listing, has (part line)
Code:
File does not exist: /var/www/Meetings, referer: http://192.168.168.51/omega/cgi-bin/omega?P=meeting&DEFAULTOP=or&START=&END=&COLLAPSE=&B=Edoc&B=Edocx&B=Eodp&B=Eods&B=Eodt&B=Epdf&B=Epps&B=Eppsx&B=Eppt&B=Ertf&B=Etxt&B=Exls&B=Exlsx&DB=docoll&FMT=docoll&xP=Zmeet&xDB=docoll&xFILTERS=Edoc-Edocx-Eodp-Eods-Eodt-Epdf-Epps-Eppsx-Eppt-Ertf-Etxt-Exls-Exlsx---O
It might help diagnose the problem if there were anything in /var/log/apache2/rewrite.log but it remains empty although at least the RewriteRule ^ - is working.

Last edited by catkin; 11-07-2011 at 11:45 AM. Reason: corrected search_results to docoll-results
 
Old 11-07-2011, 01:38 PM   #32
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
Apache renames environment variables with a REDIRECT_ prefix when redirecting or rewriting URLs, you need a SetEnvIf directive to copy it when executing the CGI binary:
Code:
<Directory /usr/lib/cgi-bin/omega>
    SetEnvIf REDIRECT_OMEGA_CONFIG_FILE ^(.+)$ OMEGA_CONFIG_FILE=$1
</Directory>
Note that this has to done after the redirections have been done; doing it in the actual cgi-bin directory should work well.
__________________

Here is a yet another example configuration, even further simplified, and slightly more adaptable (to your needs -- any paths you wish can be used with this one).

I've only tested this with a dummy CGI program on top of default Apache (with only alias authz_host cgi dir env rewrite setenvif modules enabled), but it should work if you modify the query templates to match the URLs.

There is only one Apache configuration file, /etc/apache2/conf.d/docoll, shared by all search engine versions:
Code:
DirectorySlash      On
RewriteEngine       On

Alias /docoll/results /srv/docoll
Alias /docoll         /var/www/docoll

<Directory /srv/docoll>
    AllowOverride   None
    Options         FollowSymlinks
    Order           Allow,Deny
    Allow           From all
</Directory>

<Directory /var/www/docoll>
    AllowOverride   None
    Options         FollowSymlinks ExecCGI
    Order           Allow,Deny
    Allow           From all

    <Files omega>
        SetHandler  cgi-script
        SetEnvIf    REDIRECT_OMEGA_CONFIG_FILE ^(.+)$ OMEGA_CONFIG_FILE=$1
    </Files>

    RewriteEngine   On
    RewriteBase     /docoll

    RewriteRule     ^/*$               omega?DB=docoll&FMT=docoll [L,QSA,E=OMEGA_CONFIG_FILE:/etc/opt/docoll/search/omega.conf]
    RewriteRule     ^/*search$         omega?DB=docoll&FMT=docoll [L,QSA,E=OMEGA_CONFIG_FILE:/etc/opt/docoll/search/omega.conf]
    RewriteRule     ^/*search-([^/]+)$ omega?DB=docoll&FMT=docoll [L,QSA,E=OMEGA_CONFIG_FILE:/etc/opt/docoll/search-$1/omega.conf]
</Directory>
The /var/www/docoll directory contains only a symlink omega to /usr/lib/cgi-bin/omega/omega . You can use any directory you wish instead of /var/www/docoll, as long as you edit every occurrence of the directory name above.

Using a symlink for the omega CGI binary, rather than aliasing the entire omega cgi-bin directory, has two important benefits in my opinion. First, your package then refers to only the files in the xapian-omega package it uses, rather than blindly publishing the entire xapian-omega interface. There may be additional xapian-omega -derived packages installed by your end users that reside in the same /usr/lib/cgi-bin/omega/ directory, and it is better to not enable those by default. Second, it is easier for debugging (by changing the symlink to a CGI binary that just dumps the environment variables), and porting to custom-compiled versions of xapian-omega. For example, an end user might wish to use a specific version of xapian-omega for docoll. Changing the one symlink will achieve that.

/etc/opt/docoll/search should be a symlink to the current search version subdirectory, /etc/opt/docoll/search-0.2 . If you prefer to not have the symlink, just replace both occurrences of /etc/opt/docoll/search/omega.conf above with the desired default configuration file path.

The default search page URL is /docoll, /docoll/, and /docoll/search. If you drop the DirectorySlash On directive, you do not need the dir module, but the /docoll URL will not work.

If you replace the first RewriteRule with DirectoryIndex index.html then /docoll and /docoll/ URLs will load page /var/www/docoll/index.html instead of the search page. Note that it can contain a search form too. For example, you can use the HTML code from the actual search form (wget -O - http://localhost/docoll/search) and use it as a starting point for the page.

To use any specific version of the search engine, you can use any /docoll/search-version URL, where version does not contain a slash. It will run the same (symlinked) omega CGI binary, but with OMEGA_CONFIG_FILE set to /etc/opt/docoll/search-version/omega.conf . It does not verify that the configuration file actually exists. (However, you can catch those by changing /etc/omega.conf to use a template page that just redirects back to the proper default search page.)
If you want to enable only specific versions, replace the last RewriteRule with the ones that match the desired versions.

The search results are under URL /docoll/results/, published from the /srv/docoll tree. If you have file /srv/docoll/foo then it is published as /docoll/results/foo. You can use any tree you wish, simply by modifying every occurrence of /srv/docoll in the above configuration to match.
__________________

Now that I'm a bit more familiar with xapian-omega, I would like to modify my suggestion for the installation tree. Assuming the base package name is docoll, and it relies on the xapian-omega package, I would use
  • /var/www/docoll/ for the omega CGI binary symlink and the search index page (index.html).
    I would use the index page modification to above config.
    Static media such as images and stylesheets and so on I would put in one or more subdirectories there. If themed, I'd put the media for each theme under their own subdirectory here.
  • /etc/docoll/ for the configuration files, including templates.
  • /var/lib/docoll/files/ for the collected files (instead of /srv/docoll)
  • /var/lib/docoll/index/ (or perhaps omindex or something) for the indexing of the collected files
    It is usually best to replace a complex search index by replacing the entire directory. Having the index in a single subdirectory makes it easier.
  • /usr/share/docoll/ for the search engine scripts
  • /var/log/docoll for search engine log files
    Note that this does not include any Apache log files; Apache takes care of its own logging.

I think I might also just split the docoll package into semi-independent ones:
  1. docoll-base would provide the base tree structure, the omega CGI binary symlink, documentation files, and so on.
  2. docoll-search would provide the search engine. It would allow multiple parallel installations, so it would use versioned directory names. The contents it would provide would include /etc/docoll/search-version (default configuration files) and /usr/share/docoll/search-version (search engine scripts).
    The post-installation hook would create symlinks /etc/docoll/search and /usr/share/docoll/search pointing to the subdirectory owned by the latest stable installed version, and /etc/docoll/search-version/style pointing to ../style-default. The search engine configuration would use /etc/docoll/search-version/style as the template directory path.
  3. docoll-style-default would contain the default templates, under /etc/docoll/style-default/. Other templates (style packages) can be installed in parallel, with the symlink determining the variant.
    The base package installs /var/www/docoll/default.html, /var/www/docoll/index.html being a symlink to it. Then any style package can provide another search engine front page (style.html), and the user can pick the preferred one by changing the symlink. Any images, stylesheets and media the templates use would go under /var/www/docoll/style/ for example.
The dependency tracking is a bit complicated (the base package requires apache2-common, xapian-omega, any one docoll-search package version, and at least one docoll-style package).

The above would assume the objective is to eventually get docoll included in upstream Debian (or variant like Ubuntu) or Red Hat (or Red Hat variants like CentOS or Scientific Linux) distributions. As I see it, the same config is typical for other similar packages in all these distributions, and is FHS compliant. If you are unsure, you can always contact the Debian developers on the debian-devel mailing list, and ask if Debian Maintainers consider the scheme acceptable, or have better suggestions.

A point I missed earlier is well stated in the Debian New Maintainers' Guide, installation chapter:
Quote:
Originally Posted by Debian New Maintainers' Guide
On Debian [/usr/local] is reserved for private use by the system administrator, so packages must not use directories such as /usr/local/bin but should instead use system directories such as /usr/bin, obeying the Filesystem Hierarchy Standard (FHS).
This does mean that all Debian packages should use the system directories, as opposed to say /usr/local or /opt. The key point is system directories .. obeying FHS. Debian package guidelines therefore do not adopt FHS as-is, but point to a specific part of FHS.

As to the search scripts, /usr/bin is really intended for general-use scripts, those that users are expected to be able to run directly. Scripts and extensions that are part of a specific package belong under /usr/share/package , at least for architecture-independent files (like shell, Perl, and Python scripts). In your case, users should not be able to run most of the scripts docoll provides -- only some specific commands like database refresh et cetera --, so most of your scripts clearly belong under /usr/share/docoll.
__________________

Still, there is no single solution or convention that is "correct" or even "the best" for docoll. There are numerous choices acceptable to FHS, Debian, and users in general. You do have a lot of freedom here -- and really, very few solid rules. My "objections" to using opt are quite weak, more to the tune of "but it would be easier or more familiar for me as an end user if you do this instead" rather than actual objections.

My own approach to development is modular and adaptive rather than framework-based. This seems to be evident in the way I value guidelines and established practices higher than e.g. FHS, but will break either given even a halfway decent reason. As you can see in this thread alone, my "solutions" tend to evolve, sometimes through ridiculous, weird, unwieldy, or even stupid options, slowly converging to something I can be happy with. Testing and experimentation is very important. Everything I have written or stated here is my own opinion, and is based on limited experience with xapian-omega, and none with your docoll variant/configuration. With increased experience with it, my suggestions would likely evolve. Radical changes are quite possible. Based on your willingness to entertain other notions, the amount of background research you have done for this, and in general the amount of thought and effort you put into this, I'm absolutely convinced you will make the most suitable choices regarding docoll, even if they were totally different from anything I have suggested here.

One thing you might consider is to add a short chapter in the documentation, describing the reasons behind the choices you've made. In the future, you may wish to revisit your choices -- it is not terribly rare for a package to radically change its configuration scheme; as long as there is a good reason for that, users tend to not complain too much --, and having the background information is very useful then.
 
Old 12-02-2011, 12:12 AM   #33
catkin
LQ 5k Club
 
Registered: Dec 2008
Location: Tamil Nadu, India
Distribution: Debian
Posts: 8,578

Original Poster
Blog Entries: 31

Rep: Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208Reputation: 1208
Here's the final solution. Many thanks to Nominal Animal's invaluable help

The solution, using only a conf.d file (on Debian Squeeze), allows multiple instances of directory trees searchable by Xapian's Omega. Each instance:

* has its own directory tree.
* has its own omega.conf file and so its own Xapian database and Omega templates.
* can run a different version of Omega.

The file and directory structure used:
  • /etc/opt/docoll/<instance>/search/ to configure docoll's Omega usage. Has omega.conf and the templates sub-directory.
  • /srv/docoll/<instance>/ for the tree, indexed by omindex.
  • /usr/lib/cgi-bin/omega/ for the omega CGI executable. If more than one version of Omega is installed, the non-default version in /usr/lib/cgi-bin/omega/<version>/
  • /var/opt/docoll/omega/<instance>/ for the Xapian index of /srv/docoll/<instance> files.
  • /var/www/docoll/<instance>/, an empty directory for Apache to rewrite to an omega call.
  • /var/www/docoll/<instance>/hits for omega to generate links to. Symlinked to /srv/docoll/<instance>. Allows the tree to be outside Apache's DocumentRoot.
The conf.d file:
Code:
<Directory /var/www/docoll>
    # Ensure required settings in case defaults have been changed
    [snip]

    RewriteEngine  On
    RewriteBase    /
    RewriteRule    ^/*$ /cgi-bin/omega/omega?DB=default&FMT=docoll [L]
    RewriteRule    ^omega-(1.0.23)$ /cgi-bin/omega/$1/omega?DB=default&FMT=docoll [L]
    RewriteRule    ^([^/]*)/*$ /cgi-bin/omega/omega?DB=$1&FMT=docoll [L]
</Directory>

<Directory /usr/lib/cgi-bin/omega>
    # Ensure required settings in case defaults have been changed
    [snip]

    RewriteEngine  On
    RewriteBase    /
    RewriteCond    %{QUERY_STRING} (&|^)DB=([^&]*)&
    RewriteRule    .* - [env=OMEGA_CONFIG_FILE:/etc/opt/docoll/%2/search/omega.conf] [L]
</Directory>
Explanation

First the default case ...

For a clean user interface, the user can access the default instance by browsing http://<server ID>/docoll. This matches the first RewriteRule so calls /cgi-bin/omega/omega with DB=default&FMT=docoll (docoll is a lightly modified version of the Omega query template). The RewriteCond extracts the DB value (in this case "default") and uses it to set environment variable OMEGA_CONFIG_FILE to the instance-specific omega.conf.

omega.conf sets the default Omega database and templates directories.

The docoll template is generic but includes an instance-specific config file with:
Code:
$set{docoll_hits_dir,/docoll/default/hits}
The docoll_hits_dir variable is used to prefix hit URLs with /docoll/<instance>/hits so Apache follows the symlink and serves from /srv/docoll/default/ (where omindex indexed the files from).

Now the non-default case ...

For a non-default instance the user browses http://<server ID>/docoll/<instance>. The third RewriteRule extracts the instance name and uses it for the DB name. The remaining process is very similar to the default case.

For a different version of Omega and the default tree ...

The user browses for example http://<server ID>/docoll/omega-1.0.23. The second RewriteRule calls /cgi-bin/omega/1.0.23/omega. The scheme could be extended to other trees.

Best

Charles

Last edited by catkin; 12-02-2011 at 12:14 AM. Reason: prettification
 
Old 12-08-2011, 11:19 AM   #34
Nominal Animal
Senior Member
 
Registered: Dec 2010
Location: Finland
Distribution: Xubuntu, CentOS, LFS
Posts: 1,723
Blog Entries: 3

Rep: Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948Reputation: 948
This looks very nice and clean. As a user, and as a sysadmin, I'd be quite happy with it. Congratulations!
 
  


Reply



Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
getting cgi-bin and apache to play nice daviddutch1964 Linux - Server 3 07-07-2007 09:12 AM
Migrating Apache 1.3 configuration files to Apache 2 kaplan71 Linux - Software 0 03-28-2007 11:05 AM
Apache 2 & PHP5 Not playing nice SkippyBoy SUSE / openSUSE 1 05-19-2006 08:09 AM
apache configuration simcox1 Linux - Networking 14 10-25-2005 05:46 AM
apache configuration? jcsg Linux - Newbie 1 06-21-2004 07:08 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Software

All times are GMT -5. The time now is 02:14 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration