Accidental 404 Errors… Ugh.
You wouldn't think it would be possible to accidentally issue a 404 error code for every page request to a site. But I'm a particularly talented individual and that's just what I did. You see, apparently I had seen it all so why not go ahead and create brand new problems!?
I make it a point to log into my master analytics account and check out traffic levels for the sites I regularly work on. I typically just look at traffic changes over the last few weeks to identify problems. It takes deeper analysis to actually affect the performance of a site, but traffic is a great diagnostic metric. At the end of January, after a week long hiatus, I saw this:
Figure 1: Traffic Drops Precipitously
Some more digging revealed this wonderful graph showing a spike in 404 errors.
Figure 2: 404 Errors Spike
To confirm the issue I logged into Google's Webmaster Tools. Sure enough, about 75% of the URLs in my XML sitemap were flagged as having generated a 404. The remaining 25% probably weren't crawled during the week before I found the problem.
What's an SEO to do when a site gets de-indexed due because of accidentally issued 404 errors? Obviously the first order of business is to fix the problem. Turns out there's a funky bug in WordPress which took me about 4 hours to uncover. The much harder task still lay ahead of me. How do I get Google to re-index my site? I had 5 ideas immediately pop into my head:
- Submit an updated XML sitemap with every date stamp set to the current date. Done.
- Create some new content with links to the de-indexed content knowing that the FeedBurner pinging service will bring Google in quickly. Done.
- Take advantage of the $1 SEOmoz Pro subscription I was evaluating and ask the pros there for their opinion. Hey, my head is not so big that I won't seek ideas from other SEOs. The response I got was that I need to be patient and let things play out. Probably the right answer, but I was hoping there was something I could actually do.
- Crawl my site with a browser from a few locations and hope the AdSense ads trigger a visit from Google. A few servers and the iMacros plugin took care of this a few times a day for a few days. Note: I wasn't clicking ads, just crawling my own site. Seemed like an OK thing to do.
- Redirect some indexed URLs to the de-indexed URLs and hope the content is re-indexed. Seemed a little shady so I passed on this one.
Google was crazy slow with re-indexing my site. Every week I saw single digit percentage increases in organic traffic. Only 4 weeks later did I know my site was back in business — a more than doubling in traffic from one day to the next along with a return to near 100% inclusion in Google's index.
Figure 3: Traffic Climbs
So while I'm pleased to have recovered, this little excursion has cost me about 5 weeks of revenue. Ugh. If only Google Analytics had alert functionality I likely would've caught this much earlier.
Marios, can you give some insight into the Wordpress bug you found? I'm having a similar problem with 404 crawl errors showing up for pages that exist and load fine in my browser. My site is a combination of Wordpress pages and static php pages. The 404 errors are occurring with most of my static pages.
Mike,
It doesn't sound like your problem is the same as mine. I wasn't able to access my pages with a browser. My problem had to do with the permalink configuration being dropped by WordPress whenever I published a post. So resetting the permalinks fixed my problem.
Thanks for the reply Marios. I looked a bit deeper into my 404 errors and I think may have found the source of the problem. I did an HTML header check on the problem pages and found that the Bad Behavior Wordpress plugin was returning a 403 error for these pages. I deactivated Bad Behavior and viola, no more 403 error in the header. Now I'll resubmit my sitemap and wait for Google to recrawl my site. Fingers crossed.
I am experiencing this problem right now. I started a new blog around march and I had a little over 1,000 post by the beginning of September. The server where the blog was located was hacked and a little over 1,800 blogs were lost. I also had my back up on the server. When I check on Google webmasters tool I have 515 broken links. Since there is no way to get back the exact content I am slowly doing 301 redirects. Right now google still crawls my site but it is not indexing it.
This was a really interesting read: I could almost feel your pain. ;)
It's fortunate that you were able to sort this problem out for yourself, I know that many of my clients would have struggled with this and would have been jamming the phone line for an instant solution.
It was interesting to see SEOmoz's answer, 'be patient and let things play out' because this supports what I've been saying for a long time now, some people expect results instantly and SEO isn't always like that.
Thanks for posting.
Karl
Don't get me wrong, I wanted an instant solution too! :-) Still, I'm glad Google's system is robust enough to re-include pages and to include them in such a way that their link history seems to remain intact. I think this strongly supports the idea of identifying bad inbound links and redirecting them somewhere useful.
I usually recommend clients creating a custom 404 Error page in case something like this happens. On the 404 page I usually include a link to the website's Home Page and any other pages that are important. Also include an option to contact the Webmaster and let them know of the problem, which may help with the alert problem so you can react quickly. I recently was forwarded a link to Web Analytics World Blog which talks about Adding 404s to your SEO strategy and also has a list of Funny Custom 404 pages, http://www.webanalyticsworld.net/category/custom-404-error-pages.