I have never enabled comments on this site, but after reading the ongoing discussion about comment spam (Comment Spam Alert, Comment Spam Problem Continued), I tested the URL
diveintomark.org/mt/mt-comments.cgi. Much to my surprise, it brought up a comment posting form, and allowed me to post a comment to a very old entry from last December which does not have comment posting enabled. I would classify this as a bug. I have removed the
mt-comments.cgi script entirely, and, to avoid even the bandwidth of transmitting my custom 404 page, I added this to my
.htaccess file to return a
401 - Forbidden error whenever someone tries to access my comments URL:
Redirect 401 /mt/mt-comments.cgi
This whole discussion on comment spamming seems like it borders on other domains as well. Not necessarily domains where the problems have been solved, but domains where smart people have done lots of good thinking. Jeremy Bowers nails it when he says that (a) this problem is not new, (b) this problem is not specific to this domain, and (c) this problem has not been solved by technological solutions in other domains.
I dug around a little and found an article from last September where Bruce Schneier talks about how to secure a room.
Option one: convert the room into an impregnable vault. Option two: put locks on the door, bars on the windows, and alarm everything. Option three: don’t bother securing the room; instead, post a guard in the room who records the ID of everyone entering and makes sure they should be allowed in.
Option one is the best, but is unrealistic. Impregnable vaults just don’t exist, getting close is prohibitively expensive, and turning a room into a vault greatly lessens its usefulness as a room. Option two is the realistic best; combine the strengths of prevention, detection, and response to achieve resilient security. Option three is the worst. It’s far more expensive than option two, and the most invasive and easiest to defeat of all three options. It’s also a sure sign of bad planning; designers built the room, and only then realized that they needed security. Rather then spend the effort installing door locks and alarms, they took the easy way out and invaded people’s privacy.
I have, in Bruce’s analogy, chosen option 1, by turning off comments (or rather, never turning them on). This obviously solves the spam-in-comments problem, at the expense of significantly reduced functionality. For those who are not willing to make this trade-off (both Phil and Shelley have said as much), there are disturbingly few options.
The analogy to option 3 would be a blacklist, whether based on IP address, domain name, referrer, or user agent. Phil Ringnalda did this for his comment spammer. Phillip Pearson has proposed such a blacklist for referrer spam. I do this to weed out public portal pages from my
further reading lists (not exactly the same problem, but similar). This is virtually guaranteed to fail in the long run, as evidenced by the overwhelming failure of email blacklists. I say this as someone who subscribes to SpamCop, uses all the available blacklists, maintains my own personal whitelist of false positives like Joe, and still gets dozens of spam emails a day. Blacklists can help slow the tide, but they’re expensive and difficult to maintain, and my personal experience tells me they are becoming less and less effective with each passing day.
Instead of a blacklist, you could use a whitelist, by requiring registration to post comments. Manila already requires this, and future versions of Movable Type may offer it as an option. But as Mena points out, this raises the barrier to entry for open discussion. Besides, when done poorly, registration can be automated, and when done securely, it raises ugly privacy issues (see Bruce’s option 3).
That leaves option 2, which is where Phil, Shelley, Ben, Mena, and others are spending their time at the moment. Shelley wants to restrict posting by referer, which would make it impossible for me to post comments on her site (I use a web proxy to filter ads which also blanks out referers), and which would require one line of code in the spammer’s scripts to circumvent.
Phil considers, and dismisses, visual validations (
please enter the code contained in this blurry image with no ALT text, like PayPal and some free email systems use), both on annoyance and accessibility grounds.
Shelley is now using a hidden field in the comments form and checking for it upon posting. As she admits, this is only a quick fix. Certainly, it is defeatable by a more generic script that simply downloads the comments form itself and parses it. This is not difficult; HttpUnit does this and more for automating web-based unit testing.
You could throttle comment posting per IP address to once every so often, but then spammers would just compile a database of possible victims and attack each of them sequentially, like email spam works today.
The more interesting thing about these
option 2 approaches is that they each only work as long as they are not widespread. Consider the analogy of protecting email addresses from spam harvesters. Enterprising young webmasters who think they’re cool will obfuscate their email address with a combination of numeric entities, hexadecimal ASCII characters, and other junk. And spammers will simply use scripts that cut through such obfuscation like butter (deobfuscation methods explained). Even the vaunted Hivelogic Email Address Encoder is not safe anymore. Why? Because once enough people started using it, it was worth somebody’s time to write a simple regular expression to reduce it to numeric entities, which can be deobfuscated into plaintext.
The really interesting thing about these approaches, from a game theory perspective, is that they are all Club solutions, not Lojack solutions. There are two basic approaches to protecting your car from theft: The Club (or The Shield, or a car alarm, or something similiar), and Lojack. The Club isn’t much protection against a thief who is determined to steal your car (it’s easy enough to drill the lock, or just cut the steering wheel and slide The Club off). But it is effective protection against a thief who wants to steal a car (not necessarily your car), because thieves are generally in a hurry and will go for the easiest target, the low-hanging fruit. The Club works as long as not everyone has it, since if everyone had it, thieves would have an equally difficult time stealing any car, their choice will be based on other factors, and your car is back to being as vulnerable as anyone else’s. The Club doesn’t deter theft, it only deflects it.
Similarly, installing a secret form field on your comment form will stop spammers from spamming your comments, until enough people do that that it’s worth the spammer’s time to upgrade their scripts. Ditto referer hacks (just set the referer); ditto registration schemes (just auto-register); ditto time limits (just hit each weblog sequentially). Ditto ditto ditto.
Lojack, on the other hand, does nothing to stop the theft in progress, but makes it much easier for police to retrieve a stolen car later. (It outputs a homing beacon from a randomized hidden location inside the car.) I don’t know what’s happened in the past 5 years (like if anybody has found a way to block the homing beacon’s transmission). But when that Slate article was written, independent studies alleged that Lojack actually deterred crime, presumably by making it much more likely that criminals (and entire criminal organizations) would be discovered after the fact. (Individual criminals may be disposable, but chop shops dislike homing beacons that give away their location.) Although it does nothing to stop individual crimes, by making it easier to catch criminals after the fact, Lojack may make auto theft less attractive overall.
Those with the time, money, and talent to fund such endeavors tend to favor Club solutions, out of rational self-interest. Club solutions directly help those who implement them, possibly at the expense of others who do not. Lojack solutions may or may not help those who implement them (and certainly not as directly or as well as a comparably-priced Club solution), even though they do disproportionately help the community at large. It’s one big Prisoner’s Dilemma, where rational self-interest always leads to a suboptimal solution. In the case of auto theft, crime never goes down, it just disproportionately affects those without Clubs, until everyone has a Club, theft still hasn’t gone down, everybody’s out $20, and we’re all back where we started. In the case of email obfuscation, harvesters never go away, they just disproportionately affect those who don’t obfuscate, until enough people obfuscate that the harvesters get smarter, everybody’s wasted a lot of time, everybody’s email is still getting harvested, and we’re all back where we started. In the case of comment spam…
(As a side note, much has been made of the music companies’ attempts to shut down file sharing systems. Initial attempts focused on shutting down Napster by suing it into oblivion. This is a Club solution, with predictable consequences: Napster’s demise spawned a whole new generation of file sharing systems that were more resistant to legal attack, and file sharers followed. The music companies have finally gotten smart; now they’re flooding the file sharing networks with fakes. This is a Lojack solution: it doesn’t stop individual file sharers, but it makes file sharing less attractive overall.)
So where does that leave us with comment spam? Nowhere good, I’m afraid. In the short term, I see a bunch of different Club solutions arising to meet market demand. Shelley’s hidden form field is just the first. Others will try referer hacks, or throttling hacks, or registration. But what we really need is a Lojack solution, one that makes it less attractive overall to spam comments.
Again, searching for analogies in other domains, I learned how to use Apache to stop bad robots, how to save my site from spambots, and how to trap and automatically ban spambots. That last one is wicked cool, but as far as I can tell, all of them are just fancy Club solutions. However, I did dig up one article, back from when Code Red and Nimda were all the rage: A tar pit for worms. LaBrea uses TCP hacks to trick worms into spending most of their time waiting for your site to respond (and therefore not hurting anyone else). I don’t know if such a method could be used to slow down comment spammers (since they could asynchronously POST and not wait around for the response), but I think this is the right direction to be looking.