Jump to content
xisto Community
vujsa

Reading Your Server Logs And Using The Statistical Data! Find Out What Is Working And What Isnt

Recommended Posts

Many website owners have no idea why people visit their site and even more have no idea why people don't visit their site. Using your server's weblogs as a valuable developement tool can help you figure out what your visitors like, dislike, where they came from, and where the didn't. Here I'll brifly explain how to read your web logs and more importantly what to do with the information.

 

Let's start with some raw weblog data:

70.116.XX.XX - - [20/Nov/2006:06:28:21 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "http://forums.xisto.com/topic/91791-topic/?findpost=1064359825; "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0"213.227.XX.XX - - [20/Nov/2006:07:16:46 +0000] "GET /req_form.php?sitename=Xisto&formname=new HTTP/1.0" 200 6216 "http://ww4.forum500.com/; "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text"213.227.XX.XX - - [20/Nov/2006:07:23:06 +0000] "POST /req_form.php HTTP/1.0" 200 4101 "http://ww4.forum500.com/; "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text"213.227.XX.XX - - [20/Nov/2006:07:25:46 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 167999 "http://forums.xisto.com/topic/91739-topic/?findpost=1064359477; "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text"213.227.XX.XX - - [20/Nov/2006:07:26:19 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 167999 "http://forums.xisto.com/topic/91264-topic/?findpost=1064355980; "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text"74.6.XX.XX - - [20/Nov/2006:08:03:11 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [url="https://help.yahoo.com/kb/search/SLN22600.html?impressions=true)""]http://help.yahoo.com/help/us/ysearch/slurp)"[/url]74.6.XX.XX - - [20/Nov/2006:08:03:11 +0000] "GET /req_form.php?sitename=Xisto&formname=intro&introname=new HTTP/1.0" 200 3616 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [url="https://help.yahoo.com/kb/search/SLN22600.html?impressions=true)""]http://help.yahoo.com/help/us/ysearch/slurp)"[/url]134.173.XX.XX - - [20/Nov/2006:08:45:14 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "http://forums.xisto.com/topic/91792-topic/?findpost=1064359827; "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1) Gecko/20061010 Firefox/2.0"68.58.XX.XX - - [20/Nov/2006:09:07:42 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)"68.58.XX.XX - - [20/Nov/2006:09:09:12 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax)"64.233.XX.XX - - [20/Nov/2006:09:16:24 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "http://forums.xisto.com/topic/91791-topic/?findpost=1064359825; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)"38.113.XX.XX - - [20/Nov/2006:09:58:53 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "voyager/1.0"38.113.XX.XX - - [20/Nov/2006:09:59:08 +0000] "GET /req_form.php?sitename=Xisto&formname=intro&introname=new HTTP/1.0" 200 3695 "-" "voyager/1.0"211.28.XX.XX - - [20/Nov/2006:10:28:29 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "http://forums.xisto.com/topic/86545-topic/?findpost=1064322153; "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)"82.171.XX.XX - - [20/Nov/2006:10:30:32 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "http://forums.xisto.com/topic/86545-topic/?findpost=1064322153; "Opera/9.02 (Windows NT 5.0; U; en)"202.7.XX.XX - - [20/Nov/2006:13:16:29 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 167999 "http://forums.xisto.com/topic/86545-topic/?findpost=1064322153; "Opera/9.00 (Windows NT 5.1; U; en)"202.7.XX.XX - - [20/Nov/2006:13:17:42 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 167999 "http://forums.xisto.com/topic/90939-topic/?findpost=1064353509; "Opera/9.00 (Windows NT 5.1; U; en)"74.6.XX.XX - - [20/Nov/2006:14:06:27 +0000] "GET /robots.txt HTTP/1.0" 404 - "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [url="https://help.yahoo.com/kb/search/SLN22600.html?impressions=true)""]http://help.yahoo.com/help/us/ysearch/slurp)"[/url]74.6.XX.XX - - [20/Nov/2006:14:06:28 +0000] "GET / HTTP/1.0" 200 1243 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp; [url="https://help.yahoo.com/kb/search/SLN22600.html?impressions=true)""]http://help.yahoo.com/help/us/ysearch/slurp)"[/url]129.184.XX.XX - - [20/Nov/2006:15:27:17 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.1" 200 168311 "http://forums.xisto.com/topic/91791-topic/?findpost=1064359825; "Mozilla/5.0 (Windows; U; Windows NT 5.1; fr-FR; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)"202.7.XX.XX - - [20/Nov/2006:15:34:36 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 151807 "http://forums.xisto.com/topic/91664-topic/?findpost=1064358850; "Opera/9.00 (Windows NT 5.1; U; en)"65.254.XX.XX - - [20/Nov/2006:15:44:54 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 27404 "http://forums.xisto.com/topic/90705-topic/?findpost=1064351612; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"65.254.XX.XX - - [20/Nov/2006:15:46:39 +0000] "GET /sig/sig_gen/mysig2ba4a515fff0a55d.png HTTP/1.0" 200 167999 "http://forums.xisto.com/topic/86369-topic/?findpost=1064320830; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
That is a lot of data without much explaination: So lets take a look a the first line.

213.227.XX.XX - - [20/Nov/2006:07:16:46 +0000] "GET /req_form.php?sitename=Xisto&formname=intro&introname=new HTTP/1.1" 200 3709 "http://forums.xisto.com/topic/87623-topic/?findpost=1064330531; "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.8) Gecko/20061025 Firefox/1.5.0.8"

213.227.XX.XX - This is the visitors IP address. We all have one. It is the address assigned to our account at the time of connection to our service provider.

- - - This is usually where the user's service providers domain information is stored but many times is turned of to reduce server stain.

[20/Nov/2006:07:16:46 +0000] - The date and time GMT of the request.

"GET - Method used for the request on the server (GET or POST)

/req_form.php?sitename=Xisto&formname=intro&introname=new - The path from the root for the file being requested.

HTTP/1.0" - The protocol being used.

200 - The server status code.

6216 - Number of bytes sent

"http://ww4.forum500.com/; - The refering url.

"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.0.6) Gecko/20060808 Fedora/1.5.0.6-2.fc5 Firefox/1.5.0.6 pango-text" - The user's client (Web Browser or robot).

So now I know which user accessed which file at what time with what browser and most importantly where the user came from.

[/hr]

From our sample, we can see which file is most common on theis domain.

/sig/sig_gen/mysig2ba4a515fff0a55d.png

 

This happens to be my signature image which is located on about 1000 posts around the web! We can also see that the majority of referal urls are from Xisto.com Now that I know what my most popular file is and what my most common referer is, I can begin to plan a course of action for my website. This doesn't really apply to this file since it is just a signature image but let's imagine that it is a content article for the sake of this discussion. If I know that most of my viewers of this file come from Xisto, then maybe I should adjust the content to reflect the general preferences of Xisto users. In this case, if I was to add more information about free web hosting of being a webmaster, I would get a better retension rate. That is that if the users view that file's content and like it, they are more likely to view other content on my website.

[/hr]

Now that we have looked at something that is going right, let us take a look at what might go wrong.

How many serch engine bots visited me during this period?

2 bots visited in the time period shown and one visited two seperate times.

Here are the two bots that visited:

"Mozilla/5.0 (compatible; Yahoo! Slurp; htt...)"

"voyager/1.0"

Now we all should know what Yahoo is and having the Yahoo Slurp stop by is a good thing. Having unknown or not well known bots stop by doesn't ususally hurt so we welcome the "voyager" bot. But in 9 hours, only 6 pages were viewed by the bots and one of those pages was actually a not found robots.txt file. :) We also don't see the ever important GoogleBot here nor do we see any referal links from the search engines.

 

While this file is relatively popular, the rest of the website is pretty much dead and the only people that know about it are the members of Xisto and Xisto! Obviously there aren't going to any visitors from the search engines since the site isn't getting indexed and we haven't seen and referal links from the search engines.

 

We can also tell from the number of different IP addresses used that we only received 13 users in this 9 hour period shown and those users only requested 23 pages. That includes the 6 requests made by the 2 bots.

 

So the most populr file isn't even popular, nobody knows that the site exists, and the search engines don't like me! :D I'm so depressed now. :D

[/hr]

Now is the time to really analyze the data. A 9 hour period isn't going to provide a large enough sample to really determine what is going on. Maybe this is just the slow day of the week or maybe the other 15hour of the day were really busy. We really should look at the stats for the whole month. Unless you are an upper level genious and can mentally sort the data for an entire month in your head, you'll really need some kind of Server Statistics sofftwere like Awstats. This software sorts all of the data out and gives you charts and graphs along with other very useful information. No matter what method we use to sort the data, here is what we need to look at:

How many page requests per day?

How many visitors per day?

What is the most common subject among the referal pages for your site?

- For example, most of my visitors might come to my site after reading articles at Xisto. What do most of the articles have in common. Are most of them about Online Gaming or PHP scripting?

What is the most common file request?

How many search bots are visiting and from where?

Analyzing these statistic will show you a few things but most importantly, what do our visitors think our website is about. If my Online Gaming website is getting most of it's hits from the PHP forums and most of the page requests are for a tutorial I wrote about keeping scores for online games with PHP, then it is likely that my visitors are looking for a PHP site so they are not interested in the majority of my content. This is a case of promoting my website in the wrong setting. I need to be promoting in Online Gaming related settings.

 

You can submit your website a million times to each of the major search engines and never get crawled! The best way to get you site crawled by the spiders is to have your link on other websites that are being crawled. If you are not getting crawled frequently enough or if there are some bots that have never visited, then you NEED to fix that. It is great that I get a lot of traffic from Xisto and Xisto but 1% of the few thousand members on these sites compared to the 1% of the tens of millions of users of Google is a huge difference.

It is still a good idea to subit your link to each of the search engines at least once to get the ball rolling.

[/hr]

Okay, you are finally getting crawled and you even are getting a few hits from the search engines. You can tell when you see something like this:

65.254.XX.XX - - [20/Nov/2006:19:46:39 +0000] "GET /content/view/1/1/ HTTP/1.1" 200 32768 "https://www.google.com/search?hl=en&lr=&q=php+leading+zeros&btnG=Search&%2334; "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)"
That means that this page was requested after the user performed a search for "php leading zeros" on Google. This is important becasue we now know that we actually have something that people are looking for! And since we know what the were searching for when they found it, we can ensure that the next search for the same thing is even easier by making sure that our content is related to "php leading zeros". If the content is directly related to the search term, then we have a winning page. If it is loosely related to the search term then we have a lead! We know that people have come to the site looking for "php leading zeros" but that isn't what we offer, YET! It might be a good idea to add content about the subject and link it to the page commonly mistaken for the content requested. Additionally, it might be a good idea to submit or discuss the page on a related website if permissible. Is there a link to this content item on the most common referal site? Maybe there should be.

[/hr]

Finally, if the overwhelming majority of visitors are coming from a certain type of site like a blog or forum, then it might be a good idea to advertise your site on these sites. These are usually cheaper than the major advertising programs and will be more focused. This is because if you offer $10.00 to a webmaster for so many impressions, they know for sure they will earn the money which isn't true with their affiliate programs. Good old fashioned link or banner exchanging may also work with smaller websites so don't disregard the free options. Your statistical data will tell you which advertisers are working and which are not as well as leading you in the direction of which to try to begin with.

[/hr]

I hope that this will prove to be useful information for all that read it. Here are a few related topics:

Search Engine Optimization.

Website Advertising.

Keyword Analysis.

 

vujsa

Share this post


Link to post
Share on other sites

Congrats, excellent and well explained information, but i have a question, in some cases you dont have access to any statistical software or your hosting provider dont install any kind of this software, is there exists an easy way to get this data from the server and process it locally????Best regards,

Share this post


Link to post
Share on other sites

Yeah, the free link swap works with google, they love sites have have lots of links in.

Not a 100% sure that I have any clue as to a suggestion of what you might be talking about. :P

But if you don't have access to statistical analysis software on your server but you do have access to the raw server logs, then you can use a common database program like MS Works Database to sort the data. You'll also need some text editor that can do a pretty good job os find / replace!

Using your text editor's find / replace, you have to replace all of the spaces between each element of the entry (row) with a tab. By adding a tab between each element, you are basically placing them in seperate columns of the database.

In the database program of your choice, create a new database with enoulgh fields for each element in the log entry. This will usually be 9 without domain names an 10 with. Additionally, if you chose to not convert some elements or leave combined; you won't need as many. Then copy all of the rows and paste in the first column / first row of your database table view. You should now have a nice table of server log data. You'll need to adjust the width of each column to be able to read everything.

Now that you server log data is in a database, you can select multiple views of the data, create various reports and sort by what ever you want.
[/hr]
To be more efficient, you might want to create a script that prepares your data for you. Using PHP and preg_replace() with regular expressions, you could insert your tabs in just a second if done correctly. The best part about that is that you can reuse the script each month to prepare your data.

If you are unsure how to prepare your data to be entered into the database or would like more information on writting a script that would prepare your data, please feel free to reply here.

This may not be as fancy as the results from a statistical analysis program but the infrmation will be just as useful. If you prepare your reports properly and really decide what information you really need, you may find this approach even more beneficial than the results from a statistical analysis program.

Hope this helps.

vujsa

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now

×
×
  • Create New...

Important Information

Terms of Use | Privacy Policy | Guidelines | We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.