I realized I had one site with nothing but static html pages in my list, and decided that it would be the perfect place to try out Google’s python based Sitemap Generator.
The first thing I noticed is that this script ain’t for sissies. Unlike many scripts I’ve run across that only require the user to change one or two variables at the top, Sitemap Generator requires you to follow along till the end of the script, modifying or deleting sections as required.
It also requires shell access, either through telnet or SSH, since you’ll have to run the program through the console.
If neither of those things causes you to break out in a cold sweat, the Sitemap Generator is really a pretty neat tool. On the first run it went through all the files and directories on in my selected web directory in seconds. A warning to those who are thinking of using the access_log portion, though… if you’ve recently deleted any directories or files from your server, they will exist in your sitemap if they had been accessed before deletion.
The generator also picks up on scripts that use the GET method of posting, so if your autoresponder uses GET, you may find that your new sitemap has a whole lot of pages that look like “autoresponderscriptname.php?name=Someonefirstname.lastname@example.org” unless you add that form page to the excluded files.
The sitemap generator also includes images. I don’t know what use that could possibly be, though Google doesn’t seem to mind them being there.
Once you’ve got the config file perfected, you could set the script to run as a cron job nightly or weekly, depending on how often you update your site, and never have to worry about adding or changing your sitemap again.
And since the Sitemap Generator is open source, I’m sure there’ll be a lot of wonderful modifications, and maybe even a web based interface and setup script, created by its users.