PDA

View Full Version : robots.txt


James
23rd Feb 2005, 10:29 pm
Matrix Stats reports lots of failed attempts to access a file in my site root called robots.txt, which I do not have. I think this file would allow me to give instructions to search spiders. The big guns (Google, Yahoo, MSN etc) seem to spider my site regardless but I've got a hunch some smaller bots are turning away when they can't find a robots.txt file.

Anyone know if it's worth having robots.txt - does it make a difference?

Cheers

francis
23rd Feb 2005, 11:56 pm
robotstxt is a file that tells search engines what to and what not to index. In theory it's a good way of banning search engines from pages you don't want them to find (eg db admin pages) - but in practice, because they have to be in the public folder of your site where anyone can access them, using them to hide pages from spiders is very insecure. For example, this is The Whitehouse's robots file (http://www.whitehouse.gov/robots.txt) and this is Google's (http://www.google.com/robots.txt).

I don't think it reall matters if doesn't reallly matter if you have one or not, but if you want cleaner log filles, maybe bung one in there - you don't have to disallow anything, just bung in a "search my site" rule, although, as you say, you're doing okay with one. More info at robotstxt.org (http://www.robotstxt.org/)