Sunday, November 28, 2010

Google Web Preview - A Bad Bot

Google's new search feature which allows users to preview your site before visiting may mean that you have seen many instances of this Google user agent in your log files : USERAGENT: Mozilla/5.0 (en-us) AppleWebKit/525.13 (KHTML, like Gecko; Google Web Preview) Version/3.1 Safari/525.13 Google preview says that it works by generating previews during normal crawls. Also, the web preview user agent fetches your page content live to show to search result browsers if no cached preview is saved. Google Web Preview and robots.txt Google Web Preview is a Google user agent which does not respect or even read a robots.txt file. It does as it wishes without reference to your robots.txt file because Google Web Preview is something which people browsing the search results utilise; a user initiated function. Google previews are also generated by normal google crawls. So, when a searcher looks for your preview they will either be served a cached one from a past crawl or a fresh one generated by Google Web Preview which would then also be cached for future results. So, this means that there are two methods of generating previews for your site which will add to confusion when you try and diagnose any problems as you will not know which method has generated the preview you are seeing. Google Web Preview and Cloaking
Cloaking refers to the practice of presenting different content or URLs to users and search engines.
Google Web Preview is not a search engine. It is a browser based utility and so you can modify content as you wish. Blocking Google Web Preview By htaccess I have tried to block Web preview by htaccess and can confirm that the following works # ban spam bots RewriteEngine on RewriteCond %{HTTP_USER_AGENT} ^(.*)Preview(.*)$ RewriteRule ^(.*)$ http://www.google.co.uk [R=301,L] ## You may know of more graceful methods. That is my suggested method. The rewrite rule to google's home page will end up showing google's home page as your preview snapshot and so you might want to change that to a more inviting page for people to see in the search results. Update - Further testing of the htacess block for Google Web Preview and I have seen that Google does not always follow the redirect. Either it works fine and the preview page shown is Google's home page (as directed above) or the result is the 'Preview not available message' (Should it be that Web Preview does not follow all redirects then there will be other ways to present different content depending on the user agent which does not redirect.) Note: This will only work for the previews generated by the Google Web Preview bad bot and will have no effect on the ones generated by a normal crawl, though in current testing I see that if you target pages newly indexd and not often crawled by Google to see the previews then you may have more chance of being able to generate a live preview request which will then be cached. Whether this turns out true and accurate on a wider scale.. time will tell. Goodbye to Googlebot-Image blocking image indexing Update Feb 2011. This page has been updated. Originally Google suggested that a directive in place for googlebot-image would affect previews. This was proven not to be the case. Do not worry if you have an instruction for googlebot-image in your robots.txt file. I have removed previous commentary here to avoid confusing issues.

Friday, July 30, 2010

Godaddy's 302 redirect deception

Having heard of previous issues regarding Godaddy and redirect errors, this site was hosted with Godaddy to discover what the problem was and document it if needed.It did not take long for the issue to arise, Google failed to get my sitemap due to a rogue 302 redirect.
Godaddy 302 redirect error in Webmaster Tools for sitemap file Pages which should be returned as 200 status were being redirected via 302 without my knowledge, trying with Fetch as Googlebot, the same 302 redirect occured

and in a site:search I now have urls indexed which I did not create.
Crawl errors caused by Godaddy Hosting These redirect errors as shown in Webmaster Tools reporting are 302 redirects following a chain and goodness knows what else. Godaddy also redirects by Meta refresh.
The site indexing had been affected by by Godaddy hosting.

Response from Godaddy support I enquired with Goddaddy regarding this and their response was :
We have reviewed this issue with our Advanced Hosting Team. The issue does not appear to be due to any setting in our system. It may be related to a script or setting in your website or it could be caused by an error on Google's end. You will need to review your site configuration, scripts and contact Google for further assistance with this issue.
My repy stated :
There is nothing on my site that would suddenly begin returning random 302 status to googlebot. Feel free to take a look should you wish. And I very much doubt it to be an error on the part of googlebot, especially as I can recreate this myself without a google user agent.
And Godaddy's response was
After reviewing your account once again, we are unable to locate any errors with your Hosting service. Please note that we are unable to support third party application issues and we do not support the inner functionality of this program. If you are having difficulty using a third party product, we can help troubleshoot that process to the point that we find that the issue lies entirely within the configuration of that product. You may wish to consult with a community forum online or do a search on your favorite search engine as other users may have encountered a similar problem in the past and may offer helpful solutions. We recommend the following URLs when searching for useful help forums:
The urls for helpful search forums were actually Google's Webmaster forums. I searched Google's webmasters forum and responded with a list of urls of threads with the same issue:
In your latest response, you suggested that I looked at useful help forums for assistance with the server responding randomly with 302 status codes for 200 ok pages. I have found these threads from users having the same problem as myself with your hosting ....-
Godaddy's response to that was
Thank you for contacting Online Support and bringing this to our attention. The issue you have been experiencing with the redirects is being worked on by our technicians. Service will return to normal as soon as possible. Unfortunately, we are unable to give a specific time frame for this resolution. We appreciate your patience and understanding in this matter and we apologize for any inconvenience.
I requested that my site be transferred to a server which did not have these critical hosting issues. The response to my request to move was
Thank you for contacting Online Support. Unfortunately we cannot migrate a customer's website between servers at their request. As we have stated, this issue is being worked on and will be resolved as soon as possible. We do appreciate your patience and understanding in this matter. Please let us know if you have any further questions, comments, or concerns by replying to this email. Our service departments and telephone lines are open 24 hours a day, 365 days a year to accommodate your needs anytime.
Summary of Godaddy's response to the 302 redirect problem.
  • Issue raised. Godaddy - Issue denied with the suggestion that it was a site config or Google error. This suggestion refuted.
  • Godaddy - Issue denied again and links sent to 'helpful forums' Threads made on the very same forum from other users with the same problem return to them.
  • Godaddy - There is an issue, there is no timeframe for resolution. Request to move to hosting without issues.
  • Godaddy- No It would be a huge co incidence if everyone who has ever experienced the rogue 302 redirects from Godaddy were on the same server as this, and so to imagine that this is an isolated incident which will be resolved by their technicians is fanciful.
I would submit that these 302 redirects are part of their general overall server setup and an aspect that Godaddy does not want to readily admit, and indeed denies in an outrageous fashion. (Aside from the general indexing issues, some users may act on the misinformation given and hire third parties to review their sites and scripts for security breeches that do not exist.) When the option to deny this is not longer there, then the response is essentially 'live with it and you can't move server' Moving server would be pointless of course as/if all their servers are affected by this. I knew that and suspect they did too.

Comments moved over from canig.com

Taylor says: April 21, 2011 at 9:55 pm
I was having the same issue with my site. I sent a blunt issue to support saying that this is a known GoDaddy issue and I was not interested in being told my problem was the code. They forwarded my email to the Advanced Technical Support Team and I received this response: Recently the hosting account for yardgopher.com was under Network Protection and was encountering intermittent FTP and HTTP issues. For security reasons there is no additional information that we can provide for why the hosting account was under Network Protection. We resolved the issue as quickly as possible and apologize for any inconvenience. The 302 redirect issue you were experiencing should be resolved as well. This looks like it fixed the problem; everything is working great so far. If anyone has this problem, they may want to recommend this as a possible solution.
Brandon says: April 6, 2011 at 1:41 am
On my third call to GoDaddy I think I finally got this exact problem fixed. No actual details from tier one, but the report conveyed from tier two was that it was “security setting” on the server.
Matthew Blackford says: April 2, 2011 at 1:07 pm
What was your final resolution to this problem? Did you end up getting them to fix your hosting, or did you move to a different company? I’m also in this very frustrating boat…
squibble says: April 2, 2011 at 9:19 pm
You cannot get them to fix it as it is a long standing feature of theirs. I believe that some users have amended their hosting package to resolve this but I do not know the exact details. I think it relates to getting a dedicated IP address. Personally, I would move hosting as it is important to have a host that you can trust and I dont feel Godaddy are a trustworthy host as they are happy to mislead users.
Allan Ng says: March 24, 2011 at 9:42 pm
We are currently experiencing the exact same issue where Google is returning crawl errors and redirect errors on our sitemaps. Now our site got de-indexed by Google. Anyone have this problem resolve beside switching to another hosting company? How do I get my site back on Google? (Just wait?) Any help will be greatly appreciated!
squibble says: March 24, 2011 at 10:15 pm
I would say that you need to move hosting, but much of that comes also because Godaddy does not admit anything and tries to make the site owners think it is their scripts at fault. I think for hosting you need a more trustworthy partner. With that in mind, some has said that switching to a different IP or getting a dedicated IP has helped alot. Maybe that is something which you could look into ?
Irfanullah Jan says: December 10, 2010 at 11:53 am
Thanks for the post. I have exactly the same problem. My sitemap is not accepted at google webmaster. My HTTP headers are 302 instead of 200 & 301. I have just contacted go-daddy
Chris says: November 27, 2010 at 4:28 pm
Woot. I got them to refund my remaining months on my account (11). I can post what I wrote them last if anyone wants to use it themselves. My advice is to move hosts, and don’t give up on getting a refund from Godaddy. Apparently they are a great domain service, but hopeless hosting. Chris
squibble says: November 27, 2010 at 11:27 pm
@Chris – nice to hear that you got a refund from them, and I agree with you – moving sites and just forgetting about it does not seem enough as there will be so many more out there who probably dont even realise that their indexing is being shafted by Godaddy.
Cracker says: April 4, 2011 at 12:08 pm
@Chris: I am facing the same problem and am planning to switch to another host. But would like to get a refund from Godaddy. Can you please send me the matter you sent them for getting the refund at crackercracks@yahoo.com?
Chris says: November 27, 2010 at 10:39 am
Have moved hosts. Using stablehost and so far everything seems good. But I’m still really wound up about this, even though in reality it is only over a few quid. I’m just very reluctant to let this giant of webhosting triumph over the little people by blatently sidestepping an important issue. No time frame, no explaination. Nothing. 24 hours after moving to Stablehost, google had crawled my site properly, found my robots files and uncached a load of pages I didn’t want showing up – more importanly, indexed pages that weren’t showing up. It’s just lucky I wasn’t in a hurry to take my site live. I gather it’s a USA based company… I think the Federal Trade commission deal with consumer rights similarly to the UK’s Consumer Direct. I might try both and see what happens. Will come back and post either way.
Chris says: November 19, 2010 at 12:49 pm
This is a f* outrage. I’m having the same problem, and the same outright denial despite sending screen prints of 302>302>200 occurring time and time again. Any ideas what we can do about this? We must have some rights as customers… it’s a faulty product after all.
squibble says: November 20, 2010 at 4:43 pm
There is no solution, once you prove it to Godaddy beyond doubt they essentially say ‘ Tough’ . I would move hosts.
Manjeet says: November 12, 2010 at 3:19 pm
Facing the same problem. I have opted for AP Grid Linux hosting. Is there any way of resolving the same?
squibble says: November 13, 2010 at 1:52 pm
I dont think there is a way of solving this without moving from Godaddy. Unless you have moved servers internally and got reassurance from them that the one you are now on does not have this 302 feature.