It may be possible to add these restrictive directives to the 'root' and let them filter 'down' to the rest of the sites under it.
on CentOS, I'd try them in /etc/httpd/conf/httpd.conf or on Ubuntu flavors in /etc/apache2/apache2.conf
If you find you can't get it to filter 'down' to the other vhosts, then putting them in /var/www/.htaccess is the only recourse. or the possible "Include" described below, for all your sites to pick them up since .htaccess is "top down"/hierarchical. But again this is resource intensive since it has to be read for every file/page requested.
Also, how are these bots identified? How you know they are bots?
I ask as it may be possible to construct a fail2ban solution to scan the apache logs and ban them using fail2ban that way.
and I believe your
Code:
SetEnvIfNoCase User-Agent "^BaiDuSpider" UnwantedRobot
SetEnvIfNoCase User-Agent "^HTTrack" UnwantedRobot
needs to go under the
Code:
<Directory "/var/www/vhosts/*">
stanza.
Code:
<VirtualHost ipa.ddr.ess:80>
...
DocumentRoot /var/www/html
...
CustomLog logs/dorkblog_access.log combined
...
DirectoryIndex index.php
ServerName domain.com
</VirtualHost>
<Directory "/var/www/html">
...
SetEnvIfNoCase User-Agent "^BaiDuSpider" UnwantedRobot
SetEnvIfNoCase User-Agent "^HTTrack" UnwantedRobot
Here's how mine is constructed:
Code:
<Directory "/var/www/html">
Options Indexes FollowSymLinks
AllowOverride All
Order allow,deny
allow from all
deny from ru
deny from ch
### Datashack - seems to be a proxy
### Nov. 3rd, 2014
deny from 192.151.144.0/20
### bots and spiders
BrowserMatchNoCase "bot" bots
BrowserMatchNoCase "spider" bots
BrowserMatchNoCase "heritrix" bots
BrowserMatchNoCase "Archive" bots
BrowserMatchNoCase "Baidu" bots
BrowserMatchNoCase "sniffer" bots
BrowserMatchNoCase "ltx" bots
BrowserMatchNoCase "seo" bots
BrowserMatchNoCase "crawl" bots
BrowserMatchNoCase "mechanize" bots
BrowserMatchNoCase "MetaIntelligence" bots
BrowserMatchNoCase "netcraft" bots
BrowserMatchNoCase "Quantfiy/2.0n" bots
...
Order Allow,Deny
Allow from ALL
Deny from env=bots
...
</Directory>
If you are asking if the statement
<Directory "/var/www/vhosts/*">
is correct, I've never seen it done with an asterisk.
This says RE can be used, but I'm not RE expert, so IDK if the asterisk in the fashion you showed us is correct or not, If it's working, it's likely good that way.
It may also be possible to use an "Include" statement in the site.confs to utilize these restrictions but again, I don't know how to achieve that other than experimentation or other LQ member input.
Hope that helps.