Revealed: the grubby world of comment spam

By Greg Stevens on August 30th, 2012

From time to time you may see a comment on a blog or a news article that looks something like this:

Definitely believe that which you stated. Your favorite justification seemed to be on the web the simplest thing to be aware of. You managed to hit the nail upon the top and defined out the whole thing without having side effect, people can take a signal. Will likely be back to get more. Thanks

At first glance, it could be an earnest attempt by a non-English speaking reader to give the author some kind of compliment. Detracting slightly from this impression is the fact that the name of the commenter shows up as “buy cheap loui vuitton bags” with a link to an online store.

If you run your own blog or news site, you may see dozens of these comments a day. They come in many varieties. There is the Vague Compliment (“Excellent post! Thanks for the useful information!”), the Vague Criticism (“of course like your web-site but you need to take a look at the spelling on quite a few of your posts”), and of course the very charming categories of Endless Rambling Nonsense and Endlessly Repeated Links. Often exactly the same comments will appear, word-for-word, across dozens or hundreds of different web pages.

In technical circles, these comments are called “Search Engine Optimization (SEO) Spam” or “Search Engine Spam,” although the reason for this might not be obvious to those who are not technologically savvy.

When Google, or any other search engine, decides which websites to place at the top of a list of search results, one of the factors it considers is the number of links pointing to the site. A page that has many links from other places on the web (these are called “inbound links”) will rank more highly in the search results than a page that has only a few links. Web pages with many inbound links are more popular, and therefore Google concludes that those pages are more likely to have the information that a user is looking for.

Spam comments are a way to “game the system” by randomly blasting comments into the web in order to get as many links to your site as possible. Some of these will be deleted by attentive (and irritated) editors and administrators, and some of them will be filtered out automatically by spam filter programs. But some will get through, and the more that do, the more inbound links your website will have, and the higher the search engines will place your site in search results.

The mass-production of generic comments is one of many techniques that are described in the industry as “black hat SEO”: techniques for increasing a website’s search engine status that are viewed as underhanded, shady, or in some other way inappropriate. The term contrasts with “white hat SEO”, which includes techniques for improving search engine placement that conform to the proper ideals of how the web should be used, and are generally honourable, honest, and non-annoying. At least, that is the standard pitch.

Google is constantly working to identify “black hat SEO” techniques and punish websites that use them by ranking those sites lower in search results lists, or even preventing them from showing up entirely. Website administrators are constantly on the look out to clean up their pages and remove irrelevant, distracting, and sometimes offensive comments that are created by spammers.

It can get emotional. Spam is frustrating, annoying, and time-consuming to deal with. It can make it very difficult for the casual user who has to sift through hundreds of comment “advertisements” to find comments of actual substance. The problem has become so prevalent that some have thrown up their hands and declared that all SEO is nothing but a scam. Derek Powazek, a developer and web consultant, famously declared in a blog post several years ago that all SEO experts are cockroaches, bastards, and scum. No “white hat” versus “black hat” nuance there.

Despite the vitriol rained on the world of black hat SEO, the distinction between “good” and “evil” in the world of search engine optimisation may not be as clear-cut as media and industry experts make it out to be. People who do “black hat SEO” don’t normally see themselves as doing terribly anti-social things to manipulate the system. Most see themselves as engaging in just another form of marketing. Some people even point out that the “system” itself is murky, underhanded, and suspicious. And, after all, who is Google to decide which online marketing techniques are “good” and which are “bad”?

Once you get past clear-cut abuses, such as people posting ads for Viagra on a blog post about knitting, the issue is quite complicated, and worth taking a deeper look at.

GScraper: a case study

GScraper proudly advertises at the top of its website that it is “the new black hat SEO software, most powerful for link building!” For $38, you can download a tool that will help you to increase your search engine ranking by blasting out comments to the world.

If you are not in the SEO business, the product’s website probably seems very cryptic, but the basic idea is simple. You type in some keywords, and the tool will find blogs and news articles that contain the keywords you are looking for. Once you have a list, the tool provides information about each web page that it finds: it tells you the title, the page rank of the website, how many outbound links the page has, and whether comments can be left on the page. You can refine your search to only look at certain types of web pages, or to view only pages updated within a particular period of time.

GScraper Spam Comment Generator

Once you have identified the list of target pages, you can set up the comments. There is a box where you type in multiple lines of text: each line of text is the template for a comment. You can make a “comment template” as long or as short as you want, and you can even use special characters to add some variation: you can have it choose words randomly from a list, or insert random text. When you click on “Start Comment”, the program will go through every web page on your list and try to post a comment to that page, picking different templates at random from your list.

Naturally, not every attempt to post a comment will work. Some of the pages might not have comments enabled, or in other cases the comments might be moderated and will not show up unless they are approved.  But when you are working from a list of thousands of potential web pages, this hardly matters. Most people who mass-produce comments this way are working the odds: even if only (for example) 1 per cent of automatically-generated comments actually get published to the web, that can produce dozens of links to the target site if the spammer starts with a large enough list.

Sample Spam Comment

This comment was generated by GScraper using the settings shown in the earlier screenshot. As you can see, it was able to create a fake, randomly-generated Gmail address, pick one of the comments from my list of templates at random, and link back to my target URL: in this case, example.com.

GScraper is produced by a Chinese company called Jitesi. It was built in April and released for sale on May 1. Within the first month, Jitesi had over 100 customers. In addition to the basic comment-creating features, the program allows you to post comments through proxies. Proxies let you post comments that seem like they are coming from anywhere in the world. This is especially helpful if you post a large number of comments to the same website, because often websites will ban IP addresses that are known to be sources of spam.

Because I am a website administrator myself, testing this tool has been very enlightening for me. For example, I’d often wondered at spam comments that linked back to seemingly random web addresses that did not actually point to any web pages. What could possibly be the purpose of that?

After using GScraper, the answer seems fairly straightforward: whoever was using the commenting tool simply didn’t know what he was doing. He accidentally used one of the “insert random text” special characters in the place where the website address should go, and ended up creating spam comments with links to nowhere. Not surprisingly, many of the worst and most perplexing spam comments that I have seen over the years can probably be explained by people trying to use software like this, and simply not being clever or skilled enough to figure it out.

But apart from people not knowing how to use this kind of software, there are also those who simply don’t care. It is easy to see how this kind of tool could be abused. You can target a list of websites that have nothing to do with the product you are selling. You can post comments selling Viagra to blogs about knitting and kittens. You can create comment templates with vague or annoying text, ranging from gibberish to “I love your site!” You can even configure it to leave 100 comments on the same website within the span of a few minutes. These are common and obvious abuses.

On the other hand, this tool could also be used properly. It is sophisticated enough that you can use it to find blog and news articles that truly are relevant to the website or product that you are trying to promote. When I asked the head of Jitesi about his thoughts on “black hat SEO,” he replied that GScraper is just a tool:  “I think GScraper is just software that helps us to do work. You could call it ‘Black Hat SEO’, because it helps you to post comments. But if you are a professionals SEO, you can refine the search, check things like page rank, outbound links, and the search index, so that you can narrow your URL list to the best, most relevant results.”

White hats, and off-white hats

Ron Gallagher, of Website Chemistry, describes himself as a purely white hat SEO professional, and says that he would never use a tool like this.  He does admit, however, that leaving comments on blogs is a useful way to help a website’s SEO.

“Spam blog comments are an easy way to get backlinks. I’ve done them myself,” he admits. “I know there are bots and software out there that will spam blogs with comments, but I never use them because I believe that Google can pick up on things like this and could blacklist your site.  So I do all my blog commenting manually.”

When the process is totally manual, each website can be carefully selected and each comment can be individually written. This is the “white hat rationale” behind spam comments: they are just another form of highly targeted marketing.

Introduce a tool like GScraper, however, and some of that individuality and selectivity is lost. Nonetheless, the tool could be used to send well-written comments to a highly selective and relevant list of target web pages. One could think of this as “off-white hat SEO”: it is still mass-producing comments, but not in the offensive and annoying way that most people associate with spam.

What begins to veer into “grey hat” territory, however, is when the comments become more generic and the targeting of blogs or news sites becomes sloppy.  But the sloppiness isn’t really the fault of the tool being used. Like a knife, which can be used to kill a person or create a gourmet meal, tools like GScraper can be seen as “good” or “evil” depending on how they are used.

This presents us with an interesting question: what is the real source of the anger, distrust and grief associated with “spam comments”? Is it really the mere fact that people are mass-producing comments for the purposes of advertising?  Or is it the fact that they are doing it sloppily?

Pay no attention to the search engine behind the curtain

Do you know who else mass-produces advertising links and blasts them all over the web? Google. Aaron Wall of SEO Book has published a striking infographic called “What is search spam?” It shows a list of practices that Google has declared to be “black hat SEO”: practices that Google will punish when they discover a website that is using them. Side-by-side with this list, however, the graphic shows the things that Google does for itself that arguably fall into all of the same categories. The obvious conclusion: Google says “it’s ok when we do it, just not when you do it.”

“The graphic is intended to be a critique of the derogatory and polarising way in which SEO is frequently portrayed,” says Wall. So-called “black hat SEO” methods have acquired a terrible reputation in part because there are loud and powerful spokespeople who constantly beat the drums to anathematise them. Wall points out that we need to take a closer look at the motivations of some of these speakers. Usually, they are players with skin in the game.

Over time, the list of practices that Google itself has branded with the dreaded “B” of black hat SEO has evolved, and unsurprisingly these changes have almost always served to benefit the corporate interests of Google. Although blog comment spam doesn’t specifically make it onto Wall’s infographic, it clearly fits the same pattern.

Google AdSense has become so common in blogs and news sites that it seems almost omnipresent. So what is the message, when Big Brother Google tells you not to post ad links on people’s blogs? The message is: You better not post ad links on people’s blogs, unless you are us.

None of this is meant to exonerate ads for Viagra and payday loans that flood bloggers’ inboxes on a daily basis. But it’s worth placing these nuisances in their correct context. They stand out because they are extreme and they are stupid: they are the result of people who don’t know what they are doing or honestly don’t care. But the vast majority of people who delve into the world of “comment spam”, no matter what shade of gray you consider it to fall into, are simply trying to reach an audience. They are not doing anything substantively different than what Google does when it puts inches of advertisements above the fold in its search results lists. The only difference is that Google is powerful enough to convince us to put up with it.

Wall sums it up nicely. “The problem is not in differentiating between the hard black and lily white things, but that big gray area in the middle, and how it changes based on who is playing the game.”