ROBOTS.TXT Primer
by Alan Webb
There is often confusion as to the role and usage of the robots.txt file. I
thought it would be a good idea to dispel some myths and highlight what
robots.txt files are all about.
Firstly, a robots.txt file is NOT to let search engine robots and other
crawlers know which pages they are allowed to spider (enter), it is primarily to
tell them what pages (and directories) they can NOT spider.
The majority of websites do not have a robots.txt, and do not suffer from not
having one. The robots.txt file does not influence ranking in any way. Its goal
is to disallow certain spiders from visiting and taking back with them pages you
do not wish for it to do so.
Below are a few reasons why one would use the robots.txt file.
1. Not all robots which visit your website have good intentions! There are
many, many robots out there whose sole purpose is to scan your website and
extract your email address for spamming purposes! A list of the "evil" ones
later.
2. You may not be finished building your website (under construction) or
sections may be date/ sensitive. I for example excluded all robots from any page
of my website whilst I was designing it. I did not want a half complete
un-optimized page with an incomplete link structure to be indexed, as if found,
it would reflect badly on myself and ABAKUS. I only let the robots in when the
site was ready. This is not only useful for new websites being built but also
for old ones getting re-launched.
3. You may well have a membership area that you do not wish to be visible in
Google's cache. Not letting the robot in is one way to stop this.
4. There are certain things you may wish to keep private. If you have a look
at the abakus robots.txt file (http://www.abakus-internet-marketing.de/robots.txt)
You will notice I use it to stop indexation of unnecessary forum files/profiles
for privacy reasons. Some webmasters also block robots from their cgi-bin or
image directories.
So let's analyse a very simple robots.txt syntax.
User-agent: EmailCollector
Disallow: /
If you were to copy and paste the above into notepad, save the file as
robots.txt and then upload it to the root directory of your server (where you
will find your home page) what you have done, is told a nasty email collector to
keep out of your website. Which is good news as it may mean less spam!
I do not have the space here for a fully fledged robots.txt tutorial, however
there is a good one at
http://www.robotstxt.org/wc/exclusion-admin.html
Or simply use the robotsbeispiel.txt I have uploaded for you. Simply copy and
paste it into notepad, save it as robots.txt and upload it to your server root
directory.
http://www.abakus-internet-marketing.de/robotsbeispiel.txt
Alan Webb is CEO of ABAKUS
Internet Marketing, a professional search engine marketing company.
« Back