I'm having a problem with my site TXT. file

techatlast · 10-11-2012, 04:24 PM

Hi everyone here. Please help me with creating of robot.txt file for my website. The site is not been crawl ever since last 2 months.

Rupadhya · 10-14-2012, 11:57 PM

I usually worry about the opposite. I usually try to block robots from scanning. You can block well behaved robots with a robots.txt at your webservers root of.

Code:

User-agent: *
Disallow: /

That says it doesn't matter what robot you are, I am disallowing you from scanning from the root '/'.

I think you can code something like this in your robots.txt. Make sure you save robots.txt as all lowercase.

Code:

User-agent: *
Disallow:

That says it doesn't matter what robot you are, I am not disallowing you at all. I think if you do not have a robots.txt, it will behave the same.

Could you post your robots.txt to see if we see any obvious errors in it?

- Raj

techatlast · 10-22-2012, 05:30 AM

This is my site robot.txt's page (http://techatlast.com/robots.txt)

I tried to paste it inside the box to avoid penalization against rules, please sorry for that and here below is the content that I pasted in the robots.txt file.

Sitemap: http://techatlast.com/sitemap.xml

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-

I'd be glad to receive your response @Raj

Rupadhya · 10-22-2012, 07:21 AM

What this is saying is any agent should crawl down the /wp-admin/ directory tree the /wp-includes/ directory and the /wp-/ directory. If you enclose your post here with the [ code ] and the [ /code ] tag it will not reformat it much and it makes code much more readable.

For example your robots.txt is

Code:

#Code to not allow any search engines!
Sitemap: http://www.techatlast.com/sitemap.xml

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /uses/

I think your site it being indexed. I looked at Google and did a search. You can go to the webmaster tools on Google and verify this.

Check your webserver logs for Googlebot. That is the user agent for Google's web crawler. If you are being crawled you should see hits by that.

I guess now Google has a bunch of different crawlers. It has been about 10 years since I worked with this stuff so my info is a bit dated.

Check https://developers.google.com/webmas.../docs/crawlers for more info on Googles crawlers.

I found this page via Google. http://www.techatlast.com/lenovo-thinkpad-tablet-2 It is more recent than two months ago, so I think you are ok.

Regards,

Raj Upadhyaya

afasoas · 10-22-2012, 07:40 AM

Quote:

Originally Posted by techatlast

Hi everyone here. Please help me with creating of robot.txt file for my website. The site is not been crawl ever since last 2 months.

Is your host using the "If Modified Since" HTTP header?
http://www.feedthebot.com/ifmodified.html

techatlast · 10-22-2012, 10:33 AM

Quote:

Originally Posted by afasoas

Is your host using the "If Modified Since" HTTP header?
http://www.feedthebot.com/ifmodified.html

I have just checked the "If Modified Since" HTTP header check tool and my site support it. Thanks for the link

techatlast · 10-22-2012, 11:10 AM

Quote:

Originally Posted by Rupadhya

What this is saying is any agent should crawl down the /wp-admin/ directory tree the /wp-includes/ directory and the /wp-/ directory. If you enclose your post here with the [ code ] and the [ /code ] tag it will not reformat it much and it makes code much more readable.

For example your robots.txt is

Code:

#Code to not allow any search engines!
Sitemap: http://www.techatlast.com/sitemap.xml

User-Agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-
Disallow: /uses/

I think your site it being indexed. I looked at Google and did a search. You can go to the webmaster tools on Google and verify this.

Check your webserver logs for Googlebot. That is the user agent for Google's web crawler. If you are being crawled you should see hits by that.

I guess now Google has a bunch of different crawlers. It has been about 10 years since I worked with this stuff so my info is a bit dated.

Check https://developers.google.com/webmas.../docs/crawlers for more info on Googles crawlers.

I found this page via Google. http://www.techatlast.com/lenovo-thinkpad-tablet-2 It is more recent than two months ago, so I think you are ok.

Regards,

Raj Upadhyaya

Thanks Raj for your help.