free-articles-zone.com

תפריט Free Articles

Free Articles Authors

Publishers Zone

מאמרים
Free Articles


Free Articles DB search

The Oft-Overlooked Robots.txt File


Category: Internet and Online Businesses  >>  SEO

By Aaron Turpen   [ 27/07/2005 ]
 | [ viewed 482 times ] Article word count: 600  

Publishing Free Articles Zone articles is subject to our Publisher's Terms Of Service

 Add to Favorites
 Email to a friend
 Publish this Article
 Print this article
 Article direct link
 email Article Author
 Report this article
                                                                                         

When a search engine spider accesses your website, it will usually look first for a file in the root directory of your site (where your website begins) called "robots.txt." The robots.txt file tells the spider what it may spider (index/parse). The standard for all of this is called "The Robots Exclusion Standard."

The format for this standard is very simple. It consists of records in a text file, each record consisting of two fields: a user-agent line and one or more disallow lines. These fields are formatted in a specific way so that the spider program can read them. You'll see examples of this formatting later in this article.

The first field is the "User-agent" field, which his used to specify which robot the "Disallow" lines in the next field apply to. Usually, this contains the wildcard character "*" to specify all robots. In some cases, however, you may wish to only exclude specific robots, such as the googlebot.

The second field is the "Disallow" field, which can actually contain several records. You can specify that robots are to ignore specific files, whole directories, or combinations of these. Password protected directories (such as those on a Unix system using .htaccess files) are usually excluded by robots, but it's a good idea to include them in the "disallow" anyway.

To create or edit your robots.txt file, you'll need a text editor such as Notepad. Whatever you use, just make sure it saves in pure text and in no other format. Your HTML editor usually has this function.

Comments can be done using the "#" character to specify that a comment follows. Since the file's contents are pretty self-explanatory, comments are rarely used. The first line of your robots.txt file is the User-agent line, so the first line will probably look like this:

User-agent: *

You can replace the "*" with any robot's name, if you wish. For a complete and up-to-date list of spider names, visit http://www.searchenginedictionary.com/spider-names.shtml.

The next line or lines will consist of those directories you wish to disallow access to the spider or spiders you've specified in the User-agent line:

Disallow: dontindexthis.html

This would block spiders from indexing the file "dontindexthis.html" in your root directory. To disallow a whole directory, just use the same format:

Disallow: /cgi-bin/

To disallow specific files in sub-directories, you would use a combination of these:

Disallow: /cgi-bin/dontindexthis.html

Wildcards can be used in several ways. You can specify a file AND directory of the same name in the same line like this:

Disallow: /notthisone

This blocks both the directory /notthisone/ and any files named "notthisone." (such as "notthisone.html" or "notthisone.cgi"). You can also include all files on the site by just putting a "/" in the Disallow line:

Disallow: /

A completed robots.txt file will look something like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /members/
Disallow: emailme.cgi
Disallow: secretfile.doc

If you want to get really complicated with your robots.txt file, I'd suggest you look at some of the robots.txt files of the big boys of the Internet like Amazon.com or eBay. You can find these by simply typing in the URL followed by "/robots.txt" (as in: http://www.amazon.com/robots.txt). These files are universally accessible via the Web as a rule.

The absence of a robots.txt file or a blank robots.txt file are the same and result in the spider indexing everything on your site, whether you want it to or not. So implementing a robots.txt file is important to your site's success.

About the author:
Aaron Turpen is the proprietor of Aaronz WebWorkz, a web services company providing consultation, development, and more to small businesses online. Aaron publishes several newsletters regularly and is the author of many ebooks, including "The Layman's Guide to Doing Business Online" and "The eBay PowerSeller's Book of Knowledge." Visit him online at http://www.AaronzWebWorkz.com


Article Source: http://www.Free-Articles-Zone.com


Article tags: No tags.
 

     Recent articles about SEO

     Most popular articles about SEO

     More articles by Aaron Turpen

Recent article RSS  |  Business | Finance | Computers and Technology | Arts and Entertainment | Internet and Online Businesses | Health and Fitness | Self improvement | Sports and Recreation | Education and Reference | Fashion | Automotive | Legal | Home and Family | Travel | Food and Drink | News and Society | Shopping and Product Reviews | Communications | Insurance | Real Estate | Home Improvement | Pets | Cancer |
© 2008 All Rights Reserved. Free Articles | online marketing
Israel Travel | Israel Spa