The major search engines can now base their indexing on XML site maps. For further information see sitemaps.org.
In a site map, each page of the web site requires its own XML entry in the format:
<?xml version="1.0" encoding="UTF-8"?> <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"> <url> <loc>http://www.braemoor.co.uk/software/sitemapper.shtml</loc> <lastmod>2009-04-01</lastmod> <changefreq>monthly</changefreq> <priority>0.8</priority> </url> </urlset>
The <lastmod>, <changefreq>, and <priority> attributes are all optional.
There are a number of online facilities which create site maps for you, but all the optional attributes are left to take default values, and it is necessary to add these in manually. When you recreate the site map any additions you may have made will have been lost.
This facility also allows you to create site maps, but using a PHP transaction held on your own web site. It also understands three metatags held in the header source code of the web pages, which may be used to specify the optional attributes <changefreq>, <priority>, and <lastmod>.
<meta name="sitemap-priority" content = "1.0" /> <meta name="sitemap-changefreg" content="monthly" /> <meta name="sitemap-lastmod" content="2008-31-12" />
the values of which take the same as the attributes in the XML format. This means that the optional attributes associated with an URL entry are kept within the source of the URL and the site map can be readily recreated.
If the "sitemap-lastmod" metatag is missing the date the associated file was last modified is used. This is normally accurate, but if the page is dynamic this date will not reflect when the data was last updated. This can be overcome by giving the "sitemap-lastmod" metatag an explicit date value such as "2009-31-01", or by giving it the value "default", in which case the whole attribute will be missed out from the XML for this URL of the sitemap.
The PHP transaction spiders its way through the web site, taking into account robots.txt file and any <robots> metatags, sorts the URLs into directory order, and generates the standard XML site map file. A diagnostics section is also output, listing the URLs which have not been spidered due to the robots.txt or metatag rules, and highlighting in red any inaccessible links.
It also creates a second site map, mysitemap.xml, which is in the format:
<urlset> <url> <loc>http://braemoor.co.uk/</loc> <title>Braemoor: Home Page</title> <description>>Braemoor Home Page</description> </url> </urlset>
This is constructed from the <title> and <description> fields in the page header, and may be used to construct your own user-friendly sitemap.
Once the download file has been unzipped and loaded into the root directory of the web site, it is almost ready to run. However, there are two constants that first need to be modified to reflect your requirements. The first specifies a comma separated list of the file extensions of the pages to be processed:
define ("EXTENSIONS", "shtml,html,htm,php");
The second specifies the directory and filename of a temporary file used by the sofware. This must have write access for the transaction.
define ("TEMPORARY_FILE", "temp/site_mapper.tmp");;
Once these constants have been updated, simply run the transaction sitemapper.php.
Download compressed PHP file (6,965 bytes)