Web Site Link Checker

This PHP transaction spiders the local web site, and checks all external references found in the HTML markup where permitted to do so by the relevant robots.txt file. No attempt is made to access files protected by robots.txt.

It displays the following four lists:

  1. A list of those links which result in an HTTP error being returned (normally 404 - file not found)
  2. A list of those links which result in HTTP 301 (Permanently moved)
  3. A list of URLs not checked due to robots.txt rules
  4. A list of all URLs checked, together with the HTTP error code returned

The transaction uses the PHP curl library, and it processes each URL synchronously so it can take a couple of minutes to run on larger sites.

Two constants need to be changed in the header before running the transaction on your site. The first determines which URL extensions should be spidered on your site when building up the list of all the links. All other files (e.g. PDF files) will simply have their existence checked for:

  define ("EXTENSIONS", "shtml,html,htm,php");

The second specifies the directory and filename of a temporary file used by the software. The transaction must be able to write to this file.

  define ("COOKIEJAR", "/temp/cookiejar.tmp");

Once these constants have been updated, upload the link checker into your main web directory and run from your browser.

Download compressed PHP file (6,212 bytes)