Anti-spam Measures for Guestbooks
I have developed Guestbooks (Guest Books) in PHP, and in common with many other people who have done so, I have found them inundated with spam. Some spam one can do little about - there are always small-minded people who will vandalise what they can, because they can. Most spam entries, however, have a purpose - to propagate web links as widely as possible so that the associated web sites attain a high search engine ranking. Such entries are usually inserted by software agents.
My initial reaction to the flood of spam was that of annoyance, but I then took it as a personal challenge to beat the spammers. This article shares those measures which I have found most useful.
Note: attempts to exclude the spammers using htaccess are unlikely to be effective. It is probably that both the IP address and the user agent identity will have been spoofed.
You obviously don't want your Guestbook to be flooded with garbage without your knowledge. My Guestbook sends me a detailed email every time an entry is inserted, as well as every time an entry is rejected. The former is useful to ensure that your filters don't need upgrading, and the latter is useful to make sure that your filters are not stopping valid entries. For example:
Reason for rejection: Antispam field not set up Date: 28 April 2007 07:15:18 Referer: http://www.magnac.com/x.php IP address: 201.42.187.98 Domain: 201-42-187-98.dsl.telesp.net.br User agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; Deepnet Explorer) Contributor: sogetosden Email address: soget@sodavta.com Add to mailing list: No Title: Hydrocodone viagra phentermine Message: http://bluefin.cs.ucla.edu/bugzilla/attachment.cgi?id=317
Many spammers find guestbooks through search engines, normally by specially written software. Make sure, therefore, that your guest book transactions are not indexed by including the following in your HTML header:
<meta name="robots" content="noindex, nofollow" />
However, do NOT exclude robots from accessing your guest book pages using the robots.txt file. Whereas Good Bots will dutifully obey, such a request will be treated as an invitation to enter by the Bad Bots.
This is the most effective defence against your guestbook being spammed by software agents, but its use depends on how your guestbook is structured. The idea is to make sure that the Add Entry form is accessed only from a valid page, and is not invoked directly by a software agent. In theory one could use the referer environment variable to check this, but in practice this is not to be trusted.
In the case of my guestbook the Add Entry page should only be accessed from the View Entry page. To ensure this is so, I start a PHP session in the View Entry page, and load a session string variable with a random number. The Add Entry page incorporates the contents of the variable into a hidden field on the form, and when the completed form is returned for processing the value of the hidden field is compared with that of the session variable. If the session variable doesn't exist or the values differ then the entry is rejected. If the entry is valid, the current session is destroyed, and a return made to the View Entry page where a fresh session will be created.
This is the first anti-spam check that is performed in my guestbooks, and 95% of spam attempts fall at this stage - which effectively means all the ones initiated from software agents.
The purpose of the vast majority of spam is to propagate web addresses. Tell your users that entries containing web addresses will be treated as spam, and filter out entries that contain them. Checks should be made for such things as "http://", "www.", "href=", and "[url=" in ALL fields - not just the message field. I incorporate such pieces of text in an array which can be easily extended as and when required.
Incidentally, I also filter out all HTML tags from the entry, as I like to retain control on how the guestbook looks.
Once you have an array of strings of text as above, it is easy to add words like "porn", "viagra", and "cialis" to allow you to exclude entries containing words that you feel are inappropriate for your guestbook. This method is quite good for keeping out human spammers who often use standard message headings such as "Swell site, dude".
Because most spam is generated by software, you will sometimes find character entities in unexpected places - such as "%20" in a name field or a email address field. These would never get typed in by a human user, so when you find such characters, add appropriate strings to your filter array.
Auto-spamming software work with a list of target URLs. If you find that repeated attempts are being made to spam your guestbook, change the URL of the page and the attempts will stop. You'll probably find that you need to do this every three months or so.
To make this easier, I ensure that my Add Entry URL starts with a unique couple of characters such as "xy". The View Entry transaction then searches through the directory for a PHP file starting with "xy". When it is found, it dynamically creates its link to the Add Entry transaction. This has the benefit of being able to rename the Add Entry URL without having to remember to update any associated links.
Once you have detected the spam and sent details of it to your email address, the most important thing is to get the spammer away from your site. You can do this by simply ending the processing with a die () function, or by sending them to a suitably obnoxious web site elsewhere (there are plenty of them!). Either way, you may be able to help other guestbook owners by suspending execution for a few seconds using the sleep () function.
I have no wish to advertise the programs that are involved in spamming, but it is useful to be aware of what you are up again. XRumer is a bot that is specifically designed to spam guestbooks, and details may be found at http://www.botmasternet.com/. Trackback Submit is designed to spam blogs, and details may be found in the Search Engine Journal. If you haven't come across such software before, you will probably be shocked by just how sophisticated it is.
The short answer is "yes". My guestbooks haven't been spammed for many months - in fact, the number of failed attempts has also diminished considerably. I am, however, regularly reviewing the spam that doesn't get through to attempt to identify patterns which will help to improve the filtering. It is an arms race! It is worth remembering, however, that these guys are not out to get at you personally - if they cannot achieve what they want easily, they will move on.