Robots.txt

From BlueFur.com Support Wiki

Jump to: navigation, search

A robots.txt file is a text file that informs Internet crawlers (commonly referred to as robots, bots or spiders) where they can and/or cannot crawl on your website. Not all Internet crawlers follow the robots.txt file, so it may be better to block bad-intended Internet crawlers using .htaccess.

A robots.txt files should be named robots.txt and be placed at the root directory of your website. A robots.txt file may be blank if you do not want to limit any Internet crawlers.

Examples

This example allows all robots to visit all files because the wildcard "*" specifies all robots:

User-agent: *
Disallow:

This example keeps all robots out:

User-agent: *
Disallow: /

The next is an example that tells all crawlers not to enter into four directories of a website:

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/

Example that tells a specific crawler not to enter one specific directory:

User-agent: BadBot
Disallow: /private/

Example that tells all crawlers not to enter one specific file:

User-agent: *
Disallow: /directory/file.html

Example that tells all crawlers not to crawl files with .php. You can replace .php with any extension if you want to block other file types:

Disallow: /*.php$

You can also use a robots.txt file to inform bots of where your website's Sitemap is located. The following example tells bots that your site map is located at www.example.com/sitemap.xml.

Sitemap: http://www.example.com/sitemap.xml

Personal tools