The perfect 404 page

error 404 iconA error 404 page is displayed in the cases where another site is linking to a URL on your site that isn’t available.

There can be different reasons why the URL isn’t valid, maybe you redesigned/restructured  your site to a new URL structure, maybe somebody just made a typo in a link or in the browsers address bar, or maybe some of your content has been removed.

There are different philosophies about 404 pages. Some people just want to redirect the users to the sites front page, however if the user expects to see a very specific page on your site, being redirected to the front page isn’t a very good user experience.

In this post we’ll build a 404 page in PHP that tries to guide the user as close as possible to the most relevant content on your site.

Functions of the 404 page

The 404 page that we’re building should have the following features:

  • Make it easy for the user to understand that we couldn’t find the requested page
  • Short introduce what the site is about
  • Have a clear link to the front page
  • Have a search functionality that can search your site
  • Automatically search for related content on you site, if the user came from a search engine

The script will be a “raw page” without any particular styling. This makes the code cleaner, and makes it easier for you to adapt the script to your normal layout on your site.

Enabling a 404 page

In this post we’ll assume that you’re hosting your PHP site on a Apache webserver. I you haven’t got a 404 page already, the first thing you should do is open up your “.htaccess” file and add the following line:


ErrorDocument 404 http://www.yoursite.com/custom_page.php

Keep it all on one line, with a space on each side of the number 404. Also, this is case sensitive, so be sure to check the case of the E and D in ErrorDocument before you proceed.

This new line tells your webserver that if a page can’t be found, the page: “http://www.yoursite.com/custom_page.php” should be displayed instead of the standard 404 page.

Standard error 404 page webserver

As you can see from the image above, the standard 404 page isn’t very informative, and the possibility of scaring the user away is very high.

The basic text

There are many funny 404 pages on the web. The example below is from www.southparkstudios.com, and eventhough it’s funny, it might not tell the user so much about what went wrong, what kind of site “www.southparkstudios.com” is, and how to proceed.

funny 404 error page

So the first text on our example page is a paragraph that tells the user what went wrong:


<h1>Sorry we could not find the page</h1>
The requested page: "<? echo $pageURL; ?>" can't be found on our page<br/><br/>

The $pageURL is a php variable, that displays the url of the requested page. If you like to log what kind of requests that are returning a 404 page, you could save the  $pageURL variable in a text file or database. However that isn’t part of this script.

Next is to display a short text about you site, and a easy to identify link to your front page

Tips4php is a blog with tips for webmasters and website owners about php, seo, wordpress and many other interresting topics.<br/>
<a href="http://tips4php.net">Click here to to to our front page</a> and see our latest posts.<br/><br/>

That’s more or less the basic part, now the fun begins :-)

Google RESTful Search API

If the user isn’t interested in going to your front page, but wants to see if some specific content on your site, a search form is very important on your 404 page. If you have all your content in a database, you can build your own search engine. However in this post, we’ll use the Google Search API. The advantage of using Google Search API is, that it works pretty good out-of-the-box, as long as your site is properly indexed in Google web index.

Normally Google Search API is implemented as a AJAX functionality, which is easy to get up and running, but can be difficult to customize. But there is also a special version for non script environments (RESTful interfaces), that returns search results as a JSON object, that easily can be customized. We therefore base our  search engine on this version of the Google Search API. Using the RESTful interface requires the following:

  • The application needs to follow the normal Search API guidelines
  • A application key is not required but appreciated
  • The application MUST always include a valid and accurate http referer header in the requests

The search function should only search in your content, not the entire web. Unfortunately the RESTful search API can’t limit the search to specific sites out-of-the-box, but the API supports Custom Search Engines, that can be limited to search on specific sites. Therefor the first thing to do is to create a Google Custom Search Engine. You can create a Google Custom Search engine for free here.

The important part of the setup, is the part where you limit the search engine to only search on your site. This is done by selecting “Only sites I select” in the “What do you want to search?” selection, and then insert the url of your site in the next input field. How to set up a google custom search engine for your site

When you have completed the setup, you’ll get a id for your custom search engine. The id should be saved, since you’re going to use it in the RESTful API setup

Setting up the search

Requesting the Search API requeres you to set up a service that uses cURL functionality in PHP. In the URL that the cURL function calls, you need to insert the following variables:

  • lr (language). In this example “english”
  • rsz (result set). In the example “small” which means up to 4 results
  • filter (how to filter duplicate posts). In this example “0”, no filtering
  • q (keyword). This has to be url encoded
  • cx (Google Custom Search Engine Key). You the key from the search engine you created earlier in this post.

Finally, you need to insert the url of the current page as referer for th cURL request.

$keyword=urlencode($keyword);
$url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=small&lr=lang_en&filter=0&q=$keyword&cx=$google_custom_search_key";
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_REFERER, "$pageURLt");
$body = curl_exec($ch);
curl_close($ch);

Handling the response

The result is returned from the RESTful Search API as a JSON object. To decode the JSON object, we’ll use the PHP function “json_decode”, and then loop through the results

$json = json_decode($body);
foreach ($json->responseData->results as $node) {
  $name = $node->titleNoFormatting;
  $name = utf8_decode($name);
  $url = $node->url;
  $desc = $node->content;
  $desc = utf8_decode($desc);
  echo ("<b><a href=\"$url\" >$name</a></b><br/>$desc<br/><a href=\"$url\" >($url)</a><br/><br/>");
}

Now you got the basic components for a nice Google powered search functionality for your 404 page.

Automating the search

However to make the 404 page as intelligent as possible, why not let the page suggest relevant content on you site if possible. This feature can be built by analyzing  the URL of the previous page (referrer).

If the last page was a search result on Google for “wordpress seo”, the url of the refering page would be:

http://www.google.com/search?q=wordpress+seo&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:da:official&client=firefox-a

By looking at this URL, you can see that the keywords are included as part of the “q” parameter. With this knowledge, you can easily extract the keywords from this url:

$url_array = parse_url($_SERVER['HTTP_REFERER']);
$domain = $url_array['host'];
$param = $url_array['query'];
parse_str($param);
if (eregi("google.", $domain))  {
  $keyword = $q;
}

First the script takes the referring url and splits it into the variables $domain and $param by using the PHP function “parse_url()”.

Second the $param variable is further split into the different parameters in the referring url by the “parse_str()” function.

Last we’re setting up a rule for extracting keywords from Google. This is done by searching for “google” in the domain: If “google” can be found in the domain, we define that the keywords are found in the variable “q” (remember the part of the url “…q=wordpress+seo”).

That’s it. Now we have a piece of code that can extract keywords from Google searches, and we can then insert these keywords as default values in our 404 search, which means that the 404 page now automatically can do a search when the referring page is Google.

You can easily add automatic keyword extraction for other search engines.   A result list url for “wordpress seo” on Yahoo has the following structure:

http://search.yahoo.com/search;_ylt=AjdQpjqumy.B0uv1DrluXaubvZx4?p=wordpress+seo&toggle=1&cop=mss&ei=UTF-8&fr=yfp-t-701

As you can see from the url, the keywords are in the parameter: “p”. If we want to expand our referrer keyword extraction to cover Yahoo, we need to add the following line:

if (eregi("yahoo.", $domain))  {
  $keyword = $p;
}

So adding support for keyword extraction for more search engines is very easy.

The entire script looks like this:

<?
$ip = getenv ("REMOTE_ADDR");
$requri = getenv ("REQUEST_URI");
$site_url = 'insert your site url';
$site_name = 'insert your site name';
$google_custom_search_key = 'insert your Google Custom Search Engine ID';
$pageURL .= $_SERVER["SERVER_NAME"].$_SERVER["REQUEST_URI"];

// see if keywords from search engines are available
$url_array = parse_url($_SERVER['HTTP_REFERER']);
$domain = $url_array['host'];
$param = $url_array['query'];
parse_str($param);
if (eregi("google.", $domain))        { $keyword = $q; }
else if (eregi("yahoo.", $domain))    { $keyword = $p; }
else if (eregi("bing.", $domain))     { $keyword = $q; }
else if (eregi("tips4php.", $domain)) { $keyword = $q; }
?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
  <title>Error 404 - the page don't exist</title>
</head>
<body>
  <h1>Sorry we could not find the page</h1>
  The requested page: "<? echo $pageURL; ?>" can't be found on our page<br/><br/>
  Tips4php is a blog with tips for webmasters and website owners about php, seo, wordpress and many other interresting topics.<br/>
  <a href="http://tips4php.net">Click here to to to our front page</a> and see our latest posts.<br/><br/>
  <hr>
  <form action="<?php echo $PHP_SELF;?>" method="get">
    <b>Try searching our site:</b>
    <input name="keyword" type="text" value="<? echo $keyword; ?>"/><input type="submit" value="Search"  /><br/><br/>
    <?
    if (isset($keyword)) {
      echo ("<b>Search results for \"$keyword\"</b>:<br/>");
      $keyword=urlencode($keyword);
      $url = "http://ajax.googleapis.com/ajax/services/search/web?v=1.0&rsz=small&lr=lang_en&filter=0&q=$keyword&cx=$google_custom_search_key";
      $ch = curl_init();
      curl_setopt($ch, CURLOPT_URL, $url);
      curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
      curl_setopt($ch, CURLOPT_REFERER, "$pageURLt");
      $body = curl_exec($ch);
      curl_close($ch);

      $json = json_decode($body);
      foreach ($json->responseData->results as $node) {
        $name = $node->titleNoFormatting;
        $name = utf8_decode($name);
        $url = $node->url;
        $desc = $node->content;
        $desc = utf8_decode($desc);
        echo ("<b><a href=\"$url\" >$name</a></b><br/>$desc<br/><a href=\"$url\" >($url)</a><br/><br/>");
      }
    }
    echo ("</form>");
    ?>
  </body>
</html>

Testing the script

Testing the automatic keyword insertion can be a little tricky, since the “magic” is based on parameters in the referring url.

My recommendation is, that you make a page (eg. 404test.php) that has a link to your new 404 page.  You then call the 404test.php with the same parameters as the specific search engine you want to test.

If you want to test automatic keyword insertion from  Google, use the following address:

404test.php?q=wordpress+seo

and last but not least, add your site to the referrer configuration. In the code above, “tips4php.” is added for simulating 404 traffic referred by Google clicks.

When you click on the link on 404test.php, you should go to your new 404 page and a automatically search is beeing done on the keywords: “wordpress seo”. Try the example here

Final comments

As mentioned in the start of this post, the 404 script intentionally displays a very “raw” page without any styling. So you need to adapt the look-and-feel of you site before it can be used.

You might also add more search engines, so automatically keyword insertion is supported for more search engines.

Share and Enjoy

  • Facebook
  • Twitter
  • Delicious
  • LinkedIn
  • StumbleUpon
  • Add to favorites
  • Email
  • RSS
fold-left fold-right
About the author
Jørgen Nicolaisen has been passionately interested in everything online since 1995. His experience is based on working with small hobby projects as well as high volume websites. Jørgen is currently focused on the PHP based programming framework - Codeigniter, and WordPress naturally

5 Replies to The perfect 404 page

  1. Pretty nice post. I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts. In any case I’ll be subscribing to your feed and I hope you write again soon!

  2. js says:

    Pretty cool
    It helped me alot thanks!

  3. Joseph says:

    Great script! easy to use! I do have one question though. After I input an initial search keyword and the results are displayed. Let’s say I didn’t find what i was looking for. If I try to input another keyword the search results stays the same. I noticed the URL changes to reflect the new keyword though. If I press submit again it will display the new results whether I input them again or not. Would be great to perform more than one search.

  4. Great job. It’s fun and helpfull. This script can hold visitors to the site.

Trackbacks for this post

  1. 404 Page « Shrewdly Building Web Businesses

Comments are now closed for this article.