Jump to content
xisto Community
Sign in to follow this  
iGuest

Test Your Robots.txt File With Google

Recommended Posts

As we all know, Robots are programs that traverse the Web automatically. Some people call them Crawlers or Spiders.

 

Quite often, you need to restrict a robot lke (GoogleBOT) from crawling specific portion of your website.

You can do it in two different ways.

Firstly, it is done by including a specially formatted file on his site, namely robots.txt, in http://www.yourdomain.com/robots.txt.

 

Also Robots META tag(" special HTML META tag") may also be used to indicate if a page may or may not be indexed, or analysed for links by a crawler.

 

Usually, a combination of Robots META TAG and robots.txt file is used to get the best result.

 

In a nutshell, when a Robot ( like GoogleBOT,msnBOT) visits a Web site, say http://www.yourdomain.com/, it firsts checks for http://forums.xisto.com/no_longer_exists/. If it can find this document, it will analyse its contents for records.

An example of a simple robots.txt file is shown below:

 

User-agent: *

Disallow: /login.php

The above line directs all the robots ( Google, msn, Yahoo) not to scroll, login.php file located in the root directory of your website.

 

A detailed discussion on robots.txt file is available at Robotstxt.org

 

Google has added a new feature to check for URLs excluded in robots.txt file. That is, you can check whether the GoogleBOT is complying with the instructions of the robots.txt file or not in a matter of few seconds. You need to have a Google Account to check for the robots.txt file of your website.

 

Only thing is that, it is in BETA form, ( like Google Sitemap), but it may turn out to be pretty effective in near future.

Share this post


Link to post
Share on other sites

I wonder why its a txt and not an XML. XML would be a far better choice.

Share this post


Link to post
Share on other sites

Awesome, I've never really thought about it but was always curious how those bots worked and if you could influence them directly in any way, I'll have to read up a bit on that site you posted when I have some more free time, thanks for the link B)

Share this post


Link to post
Share on other sites

It is not an xml file, because txt is much more simpler to use and besides robots.txt files are available for a really long time and XML is not so old.. not everyone know how to use XML and might not use it, this is one of the simplest things to do, but I agree that an addition could be made, but most robots would not support it, only the ones which are updated..

Share this post


Link to post
Share on other sites

I agree. besides, a lot of people dont know xml (like me) and it would be hard to make, and we might screw it up. (like i did with google sitemaps lol). You can also create a .htaccess file which will force bots to comply with ur commands but most free hosts do not allow it.

Share this post


Link to post
Share on other sites

hmm... sounds interesting.Maybe I'll try that- it would be fun to mess around with the famous Google bots. hahaBut I'll have to look it over again some other time.

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.