Journals

Robots.txt – An Introduction Part V

How and what can you control using Robots.txt?

Ans :: You can control how search engines "crawl" your website by using the right commands on Robots.txt. For learning more about commands, let's first take a look at what the general syntax's are.

Robots.txt syntax and commands ::

1. User-agent :: This command on the Robots.txt file specifies the general specification for the search engine robot.

For example ::

User-agent: Google   (Means robots/crawlers from Google)
User-agent: Ask (Means robots/crawlers from Ask.com)

On the Robots.txt file, you can specify each user agent specifically, or invoke/address them generally by using the asterisk command.

User-agent: *  (Means all search engine robots/crawlers)

2. Allow/Disallow :: This command specifies the condition where it instructs a user agent to crawl/not crawl certain parts/all parts (as specified with the command) of the website.  You can specify the directories within the website to be crawled/not crawled using the command.

For example ::

User-agent: *
Disallow: /     (Means all the robots are not allowed to crawl everything that comes under the root folder, which is the entire website)

User-agent: *
Disallow: /temp/   (Means all the robots are not allowed to crawl the folder named "temp", while other parts are allowed to crawl)

Also Read ::

Be The First One To Comment

Add A Comment