LinuxQuestions.org
Download your favorite Linux distribution at LQ ISO.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 03-22-2015, 06:31 PM   #1
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Rep: Reputation: 16
Question Easy way to find bots/spiders and block them from Nginx access.log


Hi

I am looking for an easy way to find bots or wired user agents from my huge log file so i can block them but my access.log is very big and is not so easy to search each line

Any other way or any useful grep command?

At the moment i have this:

Code:
grep 'spider\|bot' access.log | sort -u -f >> bots.txt
Still trying to work out how to just print out the spider / bot name and remove the duplicates....

Looking also for some ideas for what else i can search for other than spider/bot as i don't know what else is bad or can cause huge load on my server....

Thanks
 
Old 03-23-2015, 01:49 PM   #2
joe_2000
Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 823

Rep: Reputation: 237Reputation: 237Reputation: 237
Quote:
Originally Posted by ASTRAPI View Post
Hi

I am looking for an easy way to find bots or wired user agents from my huge log file so i can block them but my access.log is very big and is not so easy to search each line

Any other way or any useful grep command?

At the moment i have this:

Code:
grep 'spider\|bot' access.log | sort -u -f >> bots.txt
Still trying to work out how to just print out the spider / bot name and remove the duplicates....

Looking also for some ideas for what else i can search for other than spider/bot as i don't know what else is bad or can cause huge load on my server....

Thanks
There is a technique called "honeypot". Create a link that is invisible for the human user which points to a subdirectory of the server. You can then consider all ips from which requests to that subdirectory are made as spiders.
Additionally, you can create a second honeypot and tell bots not to go there via robots.txt. Bots that go there anyway can be considered as "malicious" and blocked e.g. through htaccess.
 
Old 03-23-2015, 02:57 PM   #3
Habitual
LQ Addict
 
Registered: Jan 2011
Posts: 8,496
Blog Entries: 13

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
To show you "how" what Joe 2000 said

I'd create a custom entry robots.txt and then you would only have to grep the logs for the 'bait'.
Code:
User-agent: *
Disallow /Private_Idaho
NOTE: The "Private_Idaho" is the bait and does NOT exist on the server.

Wait a few and
Code:
grep Private_Idaho access.log
This can also be leveraged into a fail2ban solution to work them into the firewall.

Last edited by Habitual; 03-23-2015 at 02:58 PM.
 
Old 03-23-2015, 03:23 PM   #4
ASTRAPI
Member
 
Registered: Feb 2007
Posts: 210

Original Poster
Rep: Reputation: 16
Ok great thanks both of you

How can save the results from grep Private_Idaho access.log to another .txt file ?
 
Old 03-23-2015, 05:39 PM   #5
joe_2000
Member
 
Registered: Jul 2012
Location: Aachen, Germany
Distribution: Void, Debian
Posts: 823

Rep: Reputation: 237Reputation: 237Reputation: 237
Quote:
Originally Posted by ASTRAPI View Post
Ok great thanks both of you

How can save the results from grep Private_Idaho access.log to another .txt file ?
Use a redirect.

Code:
grep Private_Idaho access.log > /tmp/bait_hits.txt
 
Old 03-24-2015, 10:19 AM   #6
Habitual
LQ Addict
 
Registered: Jan 2011
Posts: 8,496
Blog Entries: 13

Rep: Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387Reputation: 2387
Quote:
Originally Posted by ASTRAPI View Post
Ok great thanks both of you
You are welcome.
Always glad to be part of the Team.

Last edited by Habitual; 03-24-2015 at 10:23 AM.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
LXer: How to block referrer spam bots LXer Syndicated Linux News 0 12-19-2014 10:41 AM
Strange entry in nginx access.log coralfang Linux - Security 2 10-26-2014 05:45 AM
Blocking Web crawlers, bots, spiders, proxies, etc from private site areas haydenyoung Linux - Security 3 11-15-2008 11:03 AM
Can Pidgin Block Spam Bots? BillyGalbreath Linux - Software 3 11-10-2007 05:34 PM
Does .htaccess block search engine spiders? MicahCarrick Programming 2 08-24-2006 11:16 AM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 11:27 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Facebook: linuxquestions Google+: linuxquestions
Open Source Consulting | Domain Registration