|
Warmest Greetings,
:: Tuesday, August 20, 2002 ::
Robots. Spiders. And Google Bears. Oh My!
(The following is reprinted by permission of the author. Originally posted to the Cre8asite Forum.)
Okey dokey. The robots.txt file is a text that contains rules to exclude spiders from getting certain named files and the contents of certain named directories. There are no rules to tell spiders what they CAN do - just what they CANNOT do. We probably all know about the file and what it's for, but if you are not familiar with it, read http://www.searchengineworld.com/robots/robots_tutorial.htm. Ignore the second sentence on the page. It's wrong. The file tells spiders what they cannot index and not what they "can download". The robots.txt protocol, which is voluntary, tells spiders what they cannot index. It doesn't mean that they cannot download the files and look at them if they want to. In fact, robots.txt can be used as signpost to files and directories that spiders would otherwise have no way of knowing about. I mention this because it is possible for a spam hunting spider to examine the files that we didn't want it to see. But that's another story. The robots.txt file must be contained in the site's root directory, as that is the only place that spiders look for it. It's up to the webmaster to create the file. Servers don't do it. It isn't something that is automatically included with hosting. The spiders of major engines will always request the file in case it contains any rules for them. Many, if not most, sites don't make use of the file at all and, therefore, it doesn't exist. Many servers will return the default 403 page (Forbidden) when a requested page doesn't exist so, since the spiders are requesting a non-existant page, they, like everyone else, will get the 403 page. Until recently, Google's spider would not continue to crawl any website that returned a 403 instead of the robots.txt file. Now it does crawl those sites. There was never any reason for Google not to crawl sites when the robots.txt file didn't exist, but for some reason they made the decision to do it that way. When Kim asked why her client's sites were not being crawled, I said that some of mine had been waiting from 6 to 12 months. Google had sniffed at them by getting the index page and the robots.txt file but never crawled any further. I honestly thought that there was some sort of semi-ban on the IP address. Now it all makes sense. My sites were fully crawled a couple of months ago and Kim reported that her client's sites had also been crawled in the same cycle. A friend of mine (Grace) told me that exactly the same thing happened with some of her sites. Why on earth they would choose not to crawl sites where there was no robots.txt file is beyond reason. But, thankfully, they do crawl them now. It would have been nice to know about it in the past though.
Posted by Phil Craven of WebWorkshop.net
:: posted by Kim Krause Berg on 8/20/2002 08:57:12 PM
:: Today's Post Permalink |
Back to the BLOG Home ::
Website Evaluations
:: Email this Post :.................................
|
 |
Feed Bin











Usability Education
User Centered Design
Usability Industry
Research
Increase Website Conversions
Starter Ecommerce Checklist
Cre8pc's Squidoo Lenses
Web Design & The Usability Effect
Usability and SEO Humor

Crooked sunglasses |

My artistic friends love this picture.
|
Self-Esteem on Steroids
About Kim Krause Berg
My
resume (PDF)
July 2004 : Interview with Kim
My Partners
My Articles
Not About Me
Me Again (My Fave Blog Posts)
August 2005 : Expanding
on Usability - An Interview with Kim Krause Berg
Kim's Wish List
Recent Posts
JN says "Font You" to Microsoft
Ok. Jakob Niels...
You Get What They Paid For
The FTC (US Federal ...
Playing Catch Up
At long last I've updated and ...
Is it SEO or just sneezing and hoping somebody say...
Why ads on the net don't work
If you want to sy...
Lycos follows Google Ads success
Lycos likes th...
You're Busted!
Did you know there's search engi...
10 Things to Consider When Designing Your E-Commer...
Need SEO News NOW?
Try SEO NEWS.net
Why both...
Blips
"To reach the goal of making technology t...
Monthly Archives
It's That Book Again

Conversions Topic is New York Times
Best Seller (Seriously)
Kim is a Member of the Usability Professionals
Association

About Kim's Web Site Usability Reviews
"This report exceeded my expectations. After reading it a
few times, I went through and highlighted those parts of your actionable
advice that I want to implement right away... I ended up highlighting
most of the report. Stellar job. I won't hesitate to recommend you
to one of my own clients." -- Andy
Hagans of AndyHagans.com
"I have implemented the most obvious changes and I suppose the fact that we've seen an immediate increase
in sales/conversions is no coincidence. I'd highly recommend your service to anyone running a serious web based business."
-- Steve Clay, Plumeriabay.com
"As soon as we get our hands on one of her usability
studies, my clients and I have a better understanding of what needs
to be done with their sites to make them the best they can be from their
site visitors' perspective."
-- Jill Whalen,
HighRankings.com
"This is an exemplary piece of work."
-- Rand Fishkin, SEOMoz.org
View more Testimonials | Clients
Learn about Website improvement services.
|