Simulate a robot request with wget
Hi;
I have read that most robots like googlebot didn't accept cookies, i would like to make a request which is similar to a robot request into my website using wget. So, does this commend simulate the real situation : wget --cookies=off -U 'googlebot' http://www.site.com Thank you |
I don't what a googlebot accepts or doesn't accept, but for testing purposes/learning, try this:
Code:
wget --random-wait -r -p -e robots=off -U googlebot http://www.site.com |
Thank you, i am doing test on my own website on localhost and everything is legit. I want to do this test because i have a multilanguage website (english and german), and i want to see if the robot who crawl the german version witch is http://site.com/de will get the content in german and when crawl in englis for http://site.com/en will get content in english.
The issue is that my website send a cookie into the browser in which its content is en-GB or de-DE depending on the version of the website version. So i am afraid that if a robot crawl the http://site.com/de will get the english version instead of german version for that i need to do this test. |
You're very welcome.
|
Hi again, i have tested your command for the german version http://site.com/de. It crawl its pages well in german language. In the folder which contain the crawled pages, i see some pages recognized as html pages but some pages are not recognized as html because the url of those pages contain non ASCII 7 characters. As you can see in this picture :
View. Does googlebot and other search engines will understood those files are html files and index them normally? because i can't open them (3 files in the picture) in my ubuntu, i can only open the others (9 in the picture) with the web icon on them. Thank you |
All times are GMT -5. The time now is 11:07 PM. |