Robots.txt
From BlueFur.com Support Wiki
A robots.txt file is a text file that informs Internet crawlers (commonly referred to as robots, bots or spiders) where they can and/or cannot crawl on your website. Not all Internet crawlers follow the robots.txt file, so it may be better to block bad-intended Internet crawlers using .htaccess.
A robots.txt files should be named robots.txt and be placed at the root directory of your website. A robots.txt file may be blank if you do not want to limit any Internet crawlers.
Examples
|
This example allows all robots to visit all files because the wildcard "*" specifies all robots: |
|
|
User-agent: * |
|
|
This example keeps all robots out: |
|
|
User-agent: * |
|
|
The next is an example that tells all crawlers not to enter into four directories of a website: |
|
|
User-agent: * |
|
|
Example that tells a specific crawler not to enter one specific directory: |
|
|
User-agent: BadBot |
|
|
Example that tells all crawlers not to enter one specific file: |
|
|
User-agent: * |
|
|
Example that tells all crawlers not to crawl files with .php. You can replace .php with any extension if you want to block other file types: |
|
|
Disallow: /*.php$ |
|
|
You can also use a robots.txt file to inform bots of where your website's Sitemap is located. The following example tells bots that your site map is located at www.example.com/sitemap.xml. |
|
|
Sitemap: http://www.example.com/sitemap.xml |
|
