Jump to content
xisto Community
Sign in to follow this  
Mich

Question About Robots.txt File What exactly does it do?

Recommended Posts

I ran a search about this file and found a link that took me to a site to generate a robots.txt file. It also told me where to place it in my file manager. However, I just don't understand exactly what this file does. I do get that it prevents some search engines from poking around in all my files. Is this all it does? What are the consequences of not preventing this from happening? My folders are most all graphic sets that I offer for free viewing. I do not use my cgi-bin nor my mail. I am only using my public_html folder.

 

I do see in my errors log that someone or something has been looking for a robots.txt file. :(

Share this post


Link to post
Share on other sites

The file name utilized by the robot exclusion protocol. Web robots download this file from the servers document root and parse it for instructions on what to index and not to index. The case of the file name does not matter, but it must exist in the document root.

for example:

User-agent: GooglebotDisallow : /User-agent: *Disallow: /forum/

Where user-agent is the name of the robot or spider, in this case this instruction disallow to Googlebot (google robot) to index all the site.

In the next instruction, the "wildcard" belongs to ALL the robots or spiders, disallowing the indexing only to "/forum/" directory.
Is a good idea to set up the robots.txt file, avoiding that search engines crawdle into your gallery, anyway, if you want to share the content, simply put a flat .txt file with this inside:
User-agent: *Disallow :

Edited by keysmaker (see edit history)

Share this post


Link to post
Share on other sites

The file name utilized by the robot exclusion protocol. Web robots download this file from the server?s document root and parse it for instructions on what to index and not to index. The case of the file name does not matter, but it must exist in the document root.

OK, so define "Parse". What is considered the "document root"? My public_html folder? Or the one up from there that appears in my ftp window when I connect to upload?

 

So if I put this in my robots.txt

User-agent: Mediapartners-Google*Disallow: /cgi-bin/Disallow: /_*/User-agent: *Disallow: /

 

Google cannot get into my cgi-bin or any folder that starts with an underline and any other robot cannot get in any folder or file?

Edited by Mich (see edit history)

Share this post


Link to post
Share on other sites

Mich,
The most likely reason you are getting the robots.txt file in your error logs is if you do not have one. It would be an error of type "404" if that were the case. To avoid that error, place at least an empty robots.txt file in the directory "the one up from there that appears in my ftp window when I connect to upload".

As to the correctness of the robots.txt file you list, I will let others answer because I am not up-to-date on them.

parse= read and interpret the instructions contained in the file

http://www.cricketwalker.com/

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
Sign in to follow this  

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.