complete apache newb: advice ?
hey guys...
i am completely new to apache, and my machine will soon be hosting my website. im running ONLY apache2, no other web serverers, and my firewall is as restricitve as possible. oh, and ove put my web page in /var/www/localhost/htdocs and im running the very latest version of apache in the gentoo portage tree. what further strps should i follow to secure the system ? im still using the default gentoo apache config files. apart from keeping ym server upto date, reading the logs, and running tripwire on a cron job, what else do i need to do ? |
If you keep your system updated with the latest patches, there is little more you can do with apache. I would suggest as a first step to turn off "Indexes" and "ServerSignature" from the config file (httpd.conf), so that your server does not list the contents of dirs without a default page and does not give it's version and DocumentRoot when it produces an error page. Also if you use cgi or other scripts check their permissions etc.
You can take a look at apacheweek (http://www.apacheweek.com) which has many articles about apache configuration and security. Regards. |
There will be always something to do... for example syn flood protection, usage of snort - especialy for http modules and finally a robots.txt file in the root to protect yourself from the web crawlers... By the way you should try a server stress tool depending on the users you expect by day...
|
robots.txt ?
Could you tell us more about that? |
Well if u check your web stats or logs u will see that a file called robots.txt seems to appear all the time. This is actually the search engines looking in your root domain for some particular file (http://www.mydomain.com/robots.txt). The file tells the web crawlers which pages they may spider (as download or leeching).
One based by examples guide could be : http://www.searchengineworld.com/robots/ its not hard to set it up for each domain because u just put it where u want - in the root folder - after you configure it in your favourite way... |
Well, robotots.txt tell which page should be stored by the searching machine?
|
You're right about robots.txt, but there are spiders that don't respect the exclusions you list in that file, so they also index the pages you want to hide.
|
Bathory : u r right but those kind of spiders are not too many... so here comes the advantage of iptables filtering on INPUT ;)
Dominant : yes, instead of being harvested on all your website pages and pictures and other stuff u have, u decide on what u want to be allowed for a search website to show from you. For example lets presume u have a page with 3 frames : main / menu / banner - what will be the sense in showing the menu frame link on ANY search engine? And this is not the only example... why letting an archive made by you directly available if the user doesn't visit your page? There are also many robots.txt tricks available try to understand them - you can find them on google ... |
OH i was forgetting something!
Try to extend your meta's content in the ALLOWED pages/folders to cover your area of presentation - because restricting some pages will indeed lower the results for your website... To be more specific lets assume that a page(p1) has as content : the "a, b, c, d" terms and one other(p2) "e, f, g". When a search will be made if {p2} is restricted to crawling, then only results "a,b,c,d" will be shown on search engine! So try to bring them all in the accepted page(s) - in this case {p1}. |
You may also wanto to have a look at this site: http://www.robotstxt.org/wc/robots.html which has loads of info on robots.
|
All times are GMT -5. The time now is 07:09 AM. |