Track Search Engine Bots and Spiders with Google Analytics

bots via google analytics Track Search Engine Bots and Spiders with Google Analytics

The other day when I wrote the post about tracking search engine bots and spiders on your site. One of things I mentioned was the weakness of Google Analytics (and other JavaScript tags used by Web Analytics) was that bots were not counted in the logs. In essence, the only true way to get a count of spiders and bots visiting your site is to have a file loaded from your server. We covered the use of a solution such as CrawlTrack, but the reality is, it would be a lot easier to have all your Analytics in one place. The solution to this issue seems to be have been solved by some creative gentlemen in France a couple of months ago. I do not to purport to know the complete efficiency of this solution, but I still wanted to present it to our readers. Also as many of our readers might not speak French, I took the the liberty of translating (paraphrasing) the basics about the method discovered. I have also made all the changes to files since it was first released, and translated the notes in the script. So enjoy the below solution, which will give you a quick way to add  Search Engine Bot and Spider tracking to Google Analytics. And now for the fun…

The goal behind this solution was two fold:

– Find a way to insert statistics in Google Analytics when the  Google tracking tag is in JavaScript (and it is not executed by search engine robots)
– Distinguish between human and bot traffic.

This is how it was solved. The trick (although not 100%, but close enough) for the major bots was to use the following 3 steps:

– Check the referrer,and if the client is connecting with a known referrer agent, it could not be a bot, so it is not included.
– Check the OS of the User Agent. If the user connects with a known OS (Win XP, Vista, Linux etc..) it is not a bot, and it is not included.
– Check the search engine, if the two conditions above are not respected, we check the patterns according to BBclone and include the item.

On the problem with JavaScript and Google Analytics, we got inspiration from Peter Van der Graff . Peter was looking for a way to track his RSS feeds and PDF documents. In brief, we constructed a script using the Curl function. Please note. You will need this functionality on your host to make this solution work.

So let’s move onto the steps for installation:

You need to create a new profile within your Google Analytics account. Select ” Add a Profile for a new domain”. You can give to the profile any name, although it would make sense to give it something to remember like robots.yourdomain.com . Doing it this way also allows Google to create a variant of your Google Analytics identifier tage (UA-XXXXXX-2). it will also make easier to keep the stats generated by bots and those by humans.

You then need to download this folder (I have translated the notes as well), which contains three PHP files. You need to change three small attributes within the config.php file only. They are:

* Add the Google Analytics ID of the profile you have just created. All you need is the analytics ID UA-XXXXXX, nothing more. You can ignore GA-Urchin code created for you by Google.
* Add your domain name (i.e. www.yoursite.com)
* Add the Google identifier of your domain, which you can find looking at the value of your _utma cookie on your website.

The last one seems tricky, but it isn’t. Just visit your site (assuming you already have the normal Google Analytics) installed and look for the cookie Google created. In the cookie you will see something like: 58715258.281663908.1207124725…., just take the first portion before the period “58715258”. The easiest way to find this, is if you use Mozilla Firefox. Just got to Tools>>Options>>> Show Cookies. Scroll to your site. Click the plus button, and looked for the cookie named “_utma”. It should look like the below:

 utma google analytics cookie Track Search Engine Bots and Spiders with Google Analytics

Once you have all the information modified, the next step is to add an “include” function for the analytics.php file (you downloaded earlier) to the source code of your web pages. (The header works, but wherever you choose to place it is fine). It should look something like:

So that’s it, sit back, wait a couple of hours, and you should start seeing some bot activity in your Google Analytics. Enjoy!

****Please note we do not know if this is against Google’s TOS, but wanted to share with you a cool hack. You use it at your own risk***

 Track Search Engine Bots and Spiders with Google Analytics

Adrian Speyer

About The Author: Adrian has over 12 years experience in Digital Marketing and Analytics. He currently works as a Marketing Manager at Vanilla Forums, a modern forum software platform that allows clients to connect and engage their communities and customers. He lives and works in Montreal.

More Posts - Website - Twitter - Google Plus

Print This Post Print This Post
Comments are closed.