Lately I have been searching ways to find when was the last time Googlebot crawled on my site. I am trying to guess what should be the frequency of blog posts on my website, so that every time the Google Bot comes to my site it discovers some new content and indexes it. I am trying to test a popular SEO theory of keeping the site updates to a fixed frequency level. So what happens if I cross this level, are my pages still indexed at the same frequency or vice versa.
Popular methods to find when did Googlebot/spiders crawl rate [number of times the spider crawled your site per day] your site are as follows:
1. Google webmasters tool
This tool helps us to know many important stats that helps us in optimizing our site such as sitemaps, crawl rate, crawl errors, internal links etc.
The below screenshot tells us pages crawled per day in the last 90 days.
Google Webmaster tool give not only the information regarding the crawl rate but also allows us to adjust the crawl speed. I am yet to test this functionality out, will keep you posted in future posts ☺.
Index status under the Health field tells about how many pages are indexed and how many are blocked by robots.txt.
2. Raw Access logs
If your site is running on Cpanel you’ll find “Raw access logs” in the logs section.
After clicking, it will be downloaded in .gz format. Expand it and open it as text file. The data at the first will look like a mess but start by searching for Googlebot. Have a look at the image below which tells you when and which page Googlebot crawled and at what time. You can also view other visitors IP visited on your site and also which browser they were using.
3. Awstats
The stat above doesn’t actually show the cumulated numbers.
I tried to analyze Awstats to know more about Googlebot.
Select Robots/Slider visitors from the left side navigation.
This tells you about all the robots/spiders that have crawled your site with the time stamp and bandwidth.
In Hits column, the figures like 7894+567 etc. means if that spider has crawled 7894 pages of your site and has crawled or read your robots.txt file 567 times.
As per my experience I found Raw Access Logs data to be more useful as it tells me when and which page was crawled at what time.
Awstats is helpful when we want to know the cumulated data. On the other hand data provided by Google webmasters was not much useful in determining the frequency. Off course, I am yet to test the frequency set in webmasters and the results.
Share your thoughts on this.
Very informative post i must say..Thanks a lot
Thanks for the information.
Glad I found this post. I have a newly started blog ( http://www.techhubnepal.com ) , been about 2 3 weeks or so .I wanted to know if googlebots crawled my site or not .Thank you for sharing this post.
Wow! Great article! Helped me a lot!