On my site I have built a basic counter for tracking how many visits I get. Unfortunately I'm not sure how to go about distinguishing bots from legit visitors so I end up counting every visit including bots. This does not give an accurate count of visits to my site. How I went about doing it was just basically having a session variable set flag once a user visited the site. Its very basic and If that session flag is not set then just insert their IP into the database table which is then counted for the total of visits. I probably should of looked at already made scripts for this since the problem is most likely solved but I want to ask if anyone has any information on this ready.
You have two choices. The first one is to look at the user-agent being used by the browser. Googlebot's is "Googlebot", and the other search engines use a similar sort of thing. Use the help pages on each search engine's site to find out the user-agent used by their bot, then just don't count their hits (or mark them in the database as bots). However, bear in mind that people can change their user-agent to whatever they want, so this method will not be 100% reliable (although I can't see why someone would set their user-agent to a search bot...)
The second choice is to use DNS lookups. Use the IP address of the visitor, do a reverse DNS lookup to get the host name. If it is a search engine's domain then it is likely to be a search bot. Google has more instructions here: https://support.google.com/webmasters/
I really dont think its worth spending time worrying about that.Infact, most websites count bots as hits.It would be hard to remove ALL hits from bots, so you could just leave it.
Oh, thats gonna be hard.You can check the clients user-agent, and let a background script scan his ip for common proxy ports, such as 8080.Maybe you could check the refferer? Or, if you want the client to fill in a formuse a JAVA form, that would be much harder for a bot to response.Anyway, I dont think you determine if a client is 100% human.- optiplex(tell me if you found the solution okay?)
Thanks for the information. I been reading some articles on this and there seems to be no clear cut way for doing it and there are a number of methods. I just went with a few checks its not 100% but its good enough for now.1. check if user agent is a google bot or any other known bot.2. check if ip is already flagged to be ignored3. insert into dbthe hardest part was thinking it through the code was not awfully too long.