I look through my Apache logs on a frequent basis, both to see how things are working and to see who and what is accessing our web sites, in particular, bots. I encourage bots to read through the public areas since that is the way search engines find their content. You also want to insure that they are looking where your content is, not where it was or is not. Unfortunately, it is also the way that thieves, hackers, and spammers look to find way to steal, corrupt, or bloat your content. You want to smack their hands firmly and lock any possible open doors. For the most part, Apache, WordPress, and WordPress plugins, alone and in combination, take care of these kinds of issues. Some, though, you must add a few additional tricks.
I have been lax in cleaning up the remnants of former directory strategies for sections of my web site. I decide to move, divide, or remove a blog or section and forget to make new paths to same or similar content or just provide a way for the reader/bot to find where he/she/it should be going. If you do not take care of this, you find line after line of 404 errors: document not found. This results in an error page or a broken image being delivered instead of the intended content. In any case, you run the risk that the reader or bot will give up, leave, and never come back, again, something you do not want to happen.
The easiest solution is to put in a redirect to an error document. This is simply a single line in the .htaccess file in the root web server directory:
ErrorDocument 404 /your-404-error-page.html
The page you specify can be located anywhere on your web server. It can be a web page or a script. Personally, I use a web page with server-side includes so that I can invoke a notification routine. This script gives me information as to where the reader was coming from and where they were looking to go. This gives me data to investigate if it is a potential problem in my design, as well as notify an off-site server that is linking incorrectly what the correct link should be. Here is my 404 page. As you can see, it gives feedback to the visitor as well as giving them some options as where to go next. Note: WordPress has a built-in facility to handle 404′s and most themes include a 404 page template that you can modify to your liking.
There is a particular class of 404 that should be handled in a different way. Those are for missing RSS feeds from old or moved blogs. For these errors, you should either fix the feed service if you have registered your RSS feed (example: Google Feedburner) or tell the visitor or bot where it is now located.
For example, let’s say that you had a WordPress blog at your.domain/blog/. The feed would be located by default at your.domain/blog/feed/. Today, you physically moved that blog to your.domain/newblog/ with a corresponding feed at your.domain/newblog/feed/. Those that still refer to the old address will get 404 errors. It can be remedied by keeping the directories /blog/ and /blog/feed/ and populating each of them with a single php file, called index.php, that will contain:
/blog/index.php
<?
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://your.domain/newblog/" );
?>
/blog/feed/index.php
<?
Header( "HTTP/1.1 301 Moved Permanently" );
Header( "Location: http://your.domain/newblog/feed/" );
?>
When encountered, they will perform the necessary action: direct the visitor to the new location.
I will address the issue of spammers and image thieves in a future article, as well as give some alternatives to this method of redirection.