longtimeago 0 Report post Posted April 4, 2009 I am sure that every person in this world who use Google service will be aware of the Google Caching system, especially those who have hosted websites must have had a detailed study about that, Google caching service is one where its so called ?Google Spider? crawls the web and takes snap shots ( a kind of snap shot I would call it ) and stores it in its cache, not only stores it, but it also gives it to the public to view it, so every one must have notices it, whenever you search something in Google the search results display the links too, and near the link we can find a small word ?Cached?, and when we click it we can see the cached pages of that particular website. So not only this, Google also updates these cached pages at regular intervals.Now my question is that, how can this be legal, It also gets the snap shot of several copyrighted stuffs, isn?t it ?? So if some one has some sensitive data, it caches that too and stores it and gives it to the public. So how come caching of copyrighted data be legal?? Moreover is there anyway where one can stop or prevent Google spider entering his/her website so that the contents wont be cached. What I mean here is that if some one is hosting some sensitive data such as personnel information or so and if the concerned person doesn?t want Google to cache that particular page and store it in its cache, then what must the person do ?? Share this post Link to post Share on other sites
miladinoski 1 Report post Posted April 4, 2009 Well you said it, Google is a great resource where you can find cached pages of content that has previously been removed because of copyright infrigment or whatever else made the webmaster to remove it. But if the webmaster is smart enough to think of this then he should dissallow any robot to cache his page, a very speedy procedure. You just need to add a meta tag in the <head> section of your web-pages you do not wish to be cached: <meta name="robots" content="noarchive"> That would be just about it. Share this post Link to post Share on other sites
Phoenix.Illusion 0 Report post Posted April 4, 2009 @ Thanks miladinoski,Are your sure thats the one?I will try it.- Dark Share this post Link to post Share on other sites
longtimeago 0 Report post Posted April 4, 2009 So ..miladinoskim, thats it ?? is it so simple as that ?? If so if every one follows that i guess google spider will have no place to crawl then right ? Share this post Link to post Share on other sites
miladinoski 1 Report post Posted April 4, 2009 So ..miladinoskim, thats it ?? is it so simple as that ?? If so if every one follows that i guess google spider will have no place to crawl then right ?No, the Google spider and/or others like Yahoo! Slurp or MSN will crawl but they won't cache the content of your webpage. Your webpage will show up but the 'cached' link won't. Share this post Link to post Share on other sites
zakaluka 0 Report post Posted April 10, 2009 If you only want to stop Google's bots from caching your site, you can replace the above line with:<meta name="googlebot" content="noarchive">Regards,z. Share this post Link to post Share on other sites
longtimeago 0 Report post Posted May 2, 2009 does any one have any idea when will this Google Bot crawl across ones website ? To be clear i wanna know does it do a random crawling or does it have regular intervals or does it see when the traffic is less to the site?? I Just wanna know when and all will the Bot crawl, moreover how much bandwidth does these bots take when they crawl a site ? Especially how much bandwidth is consumed by google bot when it crawls and takes that snap shot ? Share this post Link to post Share on other sites