LinuxQuestions.org - complete apache newb: advice ?

- Linux - Security (https://www.linuxquestions.org/questions/linux-security-4/)

- - complete apache newb: advice ? (https://www.linuxquestions.org/questions/linux-security-4/complete-apache-newb-advice-230744/)

qwijibow

09-14-2004 10:53 PM

complete apache newb: advice ?

hey guys...

i am completely new to apache, and my machine will soon be hosting my website.
im running ONLY apache2, no other web serverers, and my firewall is as restricitve as possible.

oh, and ove put my web page in /var/www/localhost/htdocs
and im running the very latest version of apache in the gentoo portage tree.

what further strps should i follow to secure the system ?
im still using the default gentoo apache config files.

apart from keeping ym server upto date, reading the logs, and running tripwire on a cron job, what else do i need to do ?

bathory

09-15-2004 03:07 AM

If you keep your system updated with the latest patches, there is little more you can do with apache. I would suggest as a first step to turn off "Indexes" and "ServerSignature" from the config file (httpd.conf), so that your server does not list the contents of dirs without a default page and does not give it's version and DocumentRoot when it produces an error page. Also if you use cgi or other scripts check their permissions etc.
You can take a look at apacheweek (http://www.apacheweek.com) which has many articles about apache configuration and security.
Regards.

3spre10

09-15-2004 06:37 AM

There will be always something to do... for example syn flood protection, usage of snort - especialy for http modules and finally a robots.txt file in the root to protect yourself from the web crawlers... By the way you should try a server stress tool depending on the users you expect by day...

dominant

09-15-2004 07:35 AM

robots.txt ?
Could you tell us more about that?

3spre10

09-15-2004 07:47 AM

Well if u check your web stats or logs u will see that a file called robots.txt seems to appear all the time. This is actually the search engines looking in your root domain for some particular file (http://www.mydomain.com/robots.txt). The file tells the web crawlers which pages they may spider (as download or leeching).

One based by examples guide could be : http://www.searchengineworld.com/robots/ its not hard to set it up for each domain because u just put it where u want - in the root folder - after you configure it in your favourite way...

dominant

09-15-2004 08:23 AM

Well, robotots.txt tell which page should be stored by the searching machine?

bathory

09-15-2004 08:23 AM

You're right about robots.txt, but there are spiders that don't respect the exclusions you list in that file, so they also index the pages you want to hide.

3spre10

09-15-2004 08:36 AM

Bathory : u r right but those kind of spiders are not too many... so here comes the advantage of iptables filtering on INPUT ;)

Dominant : yes, instead of being harvested on all your website pages and pictures and other stuff u have, u decide on what u want to be allowed for a search website to show from you. For example lets presume u have a page with 3 frames : main / menu / banner - what will be the sense in showing the menu frame link on ANY search engine? And this is not the only example... why letting an archive made by you directly available if the user doesn't visit your page?

There are also many robots.txt tricks available try to understand them - you can find them on google ...

3spre10

09-15-2004 08:49 AM

OH i was forgetting something!
Try to extend your meta's content in the ALLOWED pages/folders to cover your area of presentation - because restricting some pages will indeed lower the results for your website...
To be more specific lets assume that a page(p1) has as content : the "a, b, c, d" terms and one other(p2) "e, f, g". When a search will be made if {p2} is restricted to crawling, then only results "a,b,c,d" will be shown on search engine! So try to bring them all in the accepted page(s) - in this case {p1}.

flaxius

09-16-2004 02:07 AM

You may also wanto to have a look at this site: http://www.robotstxt.org/wc/robots.html which has loads of info on robots.

All times are GMT -5. The time now is 07:09 AM.