LinuxQuestions.org
Latest LQ Deal: Latest LQ Deals
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie
User Name
Password
Linux - Newbie This Linux forum is for members that are new to Linux.
Just starting out and have a question? If it is not in the man pages or the how-to's this is the place!

Notices


Reply
  Search this Thread
Old 10-11-2012, 04:24 PM   #1
techatlast
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Rep: Reputation: Disabled
I'm having a problem with my site TXT. file


Hi everyone here. Please help me with creating of robot.txt file for my website. The site is not been crawl ever since last 2 months.
 
Old 10-14-2012, 11:57 PM   #2
Rupadhya
Member
 
Registered: Sep 2012
Location: Hoffman Estates, IL
Distribution: Fedora 20
Posts: 167

Rep: Reputation: Disabled
I usually worry about the opposite. I usually try to block robots from scanning. You can block well behaved robots with a robots.txt at your webservers root of.
Code:
User-agent: *
Disallow: /
That says it doesn't matter what robot you are, I am disallowing you from scanning from the root '/'.

I think you can code something like this in your robots.txt. Make sure you save robots.txt as all lowercase.
Code:
User-agent: *
Disallow:
That says it doesn't matter what robot you are, I am not disallowing you at all. I think if you do not have a robots.txt, it will behave the same.

Could you post your robots.txt to see if we see any obvious errors in it?

- Raj
 
Old 10-22-2012, 05:30 AM   #3
techatlast
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
This is my site robot.txt's page (http://techatlast.com/robots.txt)

I tried to paste it inside the box to avoid penalization against rules, please sorry for that and here below is the content that I pasted in the robots.txt file.

Sitemap: http://techatlast.com/sitemap.xml

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-

I'd be glad to receive your response @Raj
 
1 members found this post helpful.
Old 10-22-2012, 07:21 AM   #4
Rupadhya
Member
 
Registered: Sep 2012
Location: Hoffman Estates, IL
Distribution: Fedora 20
Posts: 167

Rep: Reputation: Disabled
What this is saying is any agent should crawl down the /wp-admin/ directory tree the /wp-includes/ directory and the /wp-/ directory. If you enclose your post here with the [ code ] and the [ /code ] tag it will not reformat it much and it makes code much more readable.

For example your robots.txt is
Code:
#Code to not allow any search engines!
Sitemap: http://www.techatlast.com/sitemap.xml

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /uses/
I think your site it being indexed. I looked at Google and did a search. You can go to the webmaster tools on Google and verify this.

Check your webserver logs for Googlebot. That is the user agent for Google's web crawler. If you are being crawled you should see hits by that.

I guess now Google has a bunch of different crawlers. It has been about 10 years since I worked with this stuff so my info is a bit dated.

Check https://developers.google.com/webmas.../docs/crawlers for more info on Googles crawlers.

I found this page via Google. http://www.techatlast.com/lenovo-thinkpad-tablet-2 It is more recent than two months ago, so I think you are ok.

Regards,

Raj Upadhyaya
 
1 members found this post helpful.
Old 10-22-2012, 07:40 AM   #5
afasoas
LQ Newbie
 
Registered: Oct 2012
Location: UK
Distribution: Ubuntu, Lubuntu
Posts: 16

Rep: Reputation: Disabled
Quote:
Originally Posted by techatlast View Post
Hi everyone here. Please help me with creating of robot.txt file for my website. The site is not been crawl ever since last 2 months.
Is your host using the "If Modified Since" HTTP header?
http://www.feedthebot.com/ifmodified.html
 
1 members found this post helpful.
Old 10-22-2012, 10:33 AM   #6
techatlast
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by afasoas View Post
Is your host using the "If Modified Since" HTTP header?
http://www.feedthebot.com/ifmodified.html

I have just checked the "If Modified Since" HTTP header check tool and my site support it. Thanks for the link
 
Old 10-22-2012, 11:10 AM   #7
techatlast
LQ Newbie
 
Registered: Oct 2012
Posts: 5

Original Poster
Rep: Reputation: Disabled
Quote:
Originally Posted by Rupadhya View Post
What this is saying is any agent should crawl down the /wp-admin/ directory tree the /wp-includes/ directory and the /wp-/ directory. If you enclose your post here with the [ code ] and the [ /code ] tag it will not reformat it much and it makes code much more readable.

For example your robots.txt is
Code:
#Code to not allow any search engines!
Sitemap: http://www.techatlast.com/sitemap.xml

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /uses/
I think your site it being indexed. I looked at Google and did a search. You can go to the webmaster tools on Google and verify this.

Check your webserver logs for Googlebot. That is the user agent for Google's web crawler. If you are being crawled you should see hits by that.

I guess now Google has a bunch of different crawlers. It has been about 10 years since I worked with this stuff so my info is a bit dated.

Check https://developers.google.com/webmas.../docs/crawlers for more info on Googles crawlers.

I found this page via Google. http://www.techatlast.com/lenovo-thinkpad-tablet-2 It is more recent than two months ago, so I think you are ok.

Regards,

Raj Upadhyaya
Thanks Raj for your help.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
Compare file extension from two different txt file and find the differences. Neal000 Programming 6 08-28-2012 02:03 PM
cut first 10 lines of file master.txt and paste in ab1.txt and so on yogeshkumkar Programming 4 08-31-2011 07:23 AM
Copy the contents of a txt file to other txt files (with similar names) by cp command Aquarius_Girl Linux - Newbie 7 07-03-2010 12:54 AM
cat onelinefile.txt >> newfile.txt; cat twofile.txt >> newfile.txt keep newline? tmcguinness Programming 4 02-12-2009 06:38 AM
How can read from file.txt C++ where can save this file(file.txt) to start reading sam_22 Programming 1 01-11-2007 05:11 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Newbie

All times are GMT -5. The time now is 04:57 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration