A look inside the world of search from the people of Yahoo!

January 18, 2005

A Defense Against Comment Spam

I'm pleased to announce that Yahoo! Search is one of several organizations in support of a technique that should help combat weblog comment spam. Others involved are: Google/Blogger, MSN Search, Six Apart (TypePad, MovableType, LiveJournal), and WordPress.

By adding a rel="nofollow" attribute to hyperlinks, webmasters and weblog owners can tell search engines that the links are effectively untrusted. For example, this:

<a href="http://spammer.example.com/">buy now</a>

Becomes this:

<a href="http://spammer.example.com/" rel="nofollow">buy now</a>

We think this is a good first step toward significantly reducing the spam burden on bloggers and weblog hosting companies. It's great to see so many players on board. In the coming weeks you can expect to see the changes reflected in our web index.

Related Announcements:

Jeremy Zawodny
Yahoo! Search

Posted by Yahoo!Search at January 18, 2005 03:45 PM | TrackBack
Comments

Jeremy, im impressed at the unilateral agreement between engines and 6A but im positive that this will do very little to cull comment spam.

MT will be shipping this as a plugin - spammers will just redouble their efforts to hit blogs without the plugin - even if in future releases they build it right into the system proper, im sure you can imagine the enormous amount of blogs that will not get updated - will not get the plugin - will not read any of these announcements, newsletters etc...

This is impressive. But as far as solving this issue goes, it will be ineffective.

Nick

Posted by: Nick W at January 18, 2005 04:30 PM

I am glad to see that Yahoo, MSN and Google are all supporting a nofollow link attribute.

Even if it doesn’t fix blog comment SPAM I appreciate the added flexibility it provides to website owners.

Posted by: Nathan Enns at January 18, 2005 04:36 PM

It'll be a great feature for cheating recip links out of PR.

Posted by: rcjordan at January 18, 2005 04:40 PM

Not to mention dodgy directory owners RC lol....

Posted by: Nick W at January 18, 2005 04:49 PM

Great news Jeremy! Yeah, it is great to see the cross-company cooperation that happened here.

Posted by: Robert Scoble at January 18, 2005 05:20 PM

a real fix needs to be chalked up

Posted by: mac at January 18, 2005 06:32 PM

Thanks for working together with the competition for a common good. We appreciate it!

Posted by: Yakov Shafranovich at January 18, 2005 06:52 PM

Why won't this work? Sure, it will have no immediate effect, but think about how it'll be in a year or two when every blog uses this. Spammers will see no noticible boost in PageRank from their comment spam. You take away the reward, you kill the motivation.

Hey... it can't hurt.

Posted by: Mark J at January 18, 2005 08:10 PM

On the surface, it seems like a great idea and not before time. A link isn't necessarily an endorsement, and it's a good way to distinguish the two types of link, as well as rooting out comment spam.

It's all about education, however. There are already ways to ensure that links aren't counted for PR purposes, and the comment spammers rely on webmasters not knowing about them or not implementing them. The rel="nofollow" attribute is a simple enough method, it's just a matter of getting the word out.

Posted by: Blanche at January 18, 2005 08:31 PM

Blogs are the fastest growing segment of the web. Doing this doesn't solve all the problems now but helps prevent a larger problem in the future.

Posted by: jim l at January 18, 2005 08:32 PM

I think it will hurt smaller bloggers. I get a fair amount of link karma from links to my site I've put in on-topic comments on relevant posts on other blogs.

Like, say, what I've written about this:

http://www.aquick.org/blog/2005/01/18/google-adds-nofollow-attribute-for-links/

This mechanism isn't differentiating between "comment spam" and "legitimate links in comments".

Posted by: Adam Fields at January 18, 2005 08:33 PM

Adam, I'd imagine there will be plugins very soon to allow whitelisting of known commenters... I know there are already whitelist plugins to, say, highlight certain posters, so it'll be peanuts to adopt that.

Posted by: ceejayoz at January 18, 2005 08:57 PM

This isn't a solution, it's a finger in a leaky dyke. It takes away one avenue for spammers to benefit, but forces me to manage my legitimate users whom I wish to support.

It's better than nothing, but it's reactionary. What happened to these companies being on the leading edge of things? I guess they are getting too fat on text ad and subscription revenues and the need to stake new ground just isn't there any more.

I far more thankful to the open source coding community who have written effective spam-control plugins, which basically negate the need for this 'nofollow' nonsense anyway.

I hope nobody got hurt by patting themselves on the back.

Posted by: craig at January 18, 2005 09:18 PM

"By adding a rel="nofollow" attribute to hyperlinks, webmasters and weblog owners can tell search engines that the links are effectively untrusted."

And also an easier way to obfuscate voluntarily shared links that doesn't require javascript or other methods on the webmaster's side - and not as a way to "combat blog spam" - that will be 'supported' by the 3 major search engines in it being used in this way.

A step forward for the blog, wiki, and guestbook owners ... two steps backwards for some of us that weren't interested in spamming or such methods but tried to accrue honest, true links shared our way based on content versus rigging up linking strategies to "impress" search engines with.

Posted by: C. w. at January 18, 2005 11:30 PM

. . And how about the ? Will it work? Or this new implementation works only on links?

Posted by: diego nunes at January 19, 2005 02:39 AM

. . And how about the ? Will this work too? The internal links of this div won't be indexed? Or this new implementation works only on links (only on "a" tags)?

(sorry for the other message, I haven't noticed that this comment system accept HTML tags...

Posted by: diego nunes at January 19, 2005 02:41 AM

. . And how about the <div rel="nofollow">? Will this work too? The internal links of this div won't be indexed? Or this new implementation works only on links (only on "a" tags)?

(grrrr... again =/ please delete the two other messages)

Posted by: diego nunes at January 19, 2005 02:42 AM

I too am glad to see so much cooperation. Although I will not begin to use "nofollow", at least not at this time. I want legit readers of my site to get their share of page rank for commenting. I see it as incentive for participation.

Possibly creating a whitelist of commenters would be necessary so that only readers not on the whitelist receive "nofollow".

Thank you again though for such a great level of cooperation. :-)

Posted by: Earle at January 19, 2005 05:24 AM

Can someone at Yahoo! clarify if this "nofollow" attribute will tell the search bot to NOT FOLLOW such link. By reading the prose at both Google and Yahoo! I couldn't tell.

Posted by: padawan at January 19, 2005 05:35 AM

Thanks!

Next you might wish to tackle referer spam. Leeches, the lot of them, and easily ditched: just omit apache (and similar) log files, and ditch all log evaluation package output as well.

Thanks!

Posted by: Henriette at January 19, 2005 06:05 AM

I suggested this idea on my blog about a month and a half ago. It's awesome that Google, Yahoo, and MSN were able to work together to turn it into reality.

The potential goes far beyond curbing blog spam. Perhaps more importantly, it provides a means for publishers to link to information without lending creed to it.

One of the most prominent example of a reason publishers would want to do this are educational sites that link to sites such as www.martinlutherking.org in order to teach children about misinformation on the Internet. In doing so, they inadvertantly raise its search engine rankings, propogating the misinformation itself.

Click on my name, it'll take you to my blog post if you want to read more.

Posted by: Adam Herscher at January 19, 2005 07:40 AM

Ok, first off, think like a bad guy.

Adding some sort of wrapper for "Don't follow links in this section" won't work because the very first thing a spammer will do is add "</div>" (or whatever the closing tag is) to the top of their comment. Yes, you could strip out those tags, but you're already modifying content on the page. It's also FAR easier for spiders to make the choice on a per-link basis than it is to keep some sort of state engine that determines "Ok, for this link am I in a properly structured tag or not?"

As a bad guy, I know that there are two ways that I can boost my page rank using comment spamming. One is to hand target Big Blogs with lots of page rank (like Scoble or Scripting). Or I can write bots that attack lots and lots of little blogs that have been abandoned or ignored. Giving folks the plug-in to turn off comment pagerank is good because it makes mindless, script-kiddie bots worth less to spammers.

Yes, there are lots and lots of ways to avoid this. Yes, there are also lots of folks who are screaming that ALL links in blogs should be ignored by crawlers. This is a good compromise. Now stop whining, write a few plug-ins for the various engines and help get this out there.

Posted by: jr at January 19, 2005 09:40 AM

http://google.com/

Posted by: bobik at January 20, 2005 02:02 AM

Jeremy, I don't understand: How will this "reduce the spam burden on bloggers and weblog hosting companies"? As you probably know from your own blacklist-logs, spammers don't spam less when they are blocked (or their links made useless by "nofollow") - the volume stays the same. I see the value of taking the pagerank away (while I don't link the idea that my commenters also don't get pagerank, but that's a different issue) but I don't understand why you think that any "burden" will be reduced by this measure.

Posted by: Martin Röll at January 20, 2005 06:02 AM

Sorry, but i don't understand...does that mean that, let's say, if i post a pertinent comment on a blog about web design and templates and include a link to http://www.web-source.net/ or to http://www.turnkeywebpros.com because i think that over there there can be pertinent and useful info about the topic discussed, the two sites will be denied the importance of their link? Is this fair? It doesen't sound fair to me at all. It comes less the spirit and the nature of linking and of the Internet itself. I didn't invent pagerank; why should i have it denied, if in bona fide, just because there are spammers around? Then start ranking sites the old way and don't give pagerank any importance! And couldn't i go spamming the blogs and the forums putting links of competitors with a nofollow attribute? If i have misunderstood, please, someone explain, thanks.

Posted by: at January 20, 2005 08:35 AM

I have posted the above message, but it didn't take the name? should i add an url in the weblog url?

Posted by: Lore Jannsens at January 20, 2005 08:36 AM

Great move by both google and yahoo, I highly appreciate this effort, It prevents a lot of spam in some of our blogs, those sucking spammers,

they will realize this well,

Posted by: blogger at January 20, 2005 12:31 PM

Lore and others are spot on -- this is a problem of indexing, not of HTML mark-up. I'd also add that it actually runs counter to the whole meaning of the attributes used:

"The problem with this whole proposal is that it changes the meaning of the @rel attribute -- or if we were to slip into OO terminology for a moment, it *overloads* the attribute. The whole point of @rel is to define a relationship between two documents. The whole point of 'nofollow' is to say that there is *no* relationship between two documents. It's like using a special value of the title element to say that there is no title, or a special value of the @style attribute to say that there is no style -- it's just plain wrong. You cannot say that the relationship between two documents is that a search engine should ignore the relationship between those two documents!"

-- Meta Muddle at Google, http://internet-apps.blogspot.com/2005/01/meta-muddle-at-google.html

Posted by: Mark Birbeck at January 20, 2005 05:23 PM

i want to have NO SPAM in my blog, not SPAM WHO DOES NOT COUNT FOR SEO... this initiative goes to thewrong direction... because spamming is (almost) free for spammers, they won't stop.
regards sandro

Posted by: feuman at January 21, 2005 01:39 AM

Well, it won't work - most links with rel=nofollow will be good, only minority will be spam. So ignoring all of them can make the search results worse, not better.

Also, ignoring the link is too harsh. A little better solution can be "rel=usersubmitted" (or visitorsubmitted) atribute. Which should mean that link wasn't created by webmaster, but by a visitor of the site. Nothing more. The search engine is not obliged to do something specific, that atribute is only information about the nature of this link. The search engine can treat it specially and try several variation of solving the problem.

Posted by: Michal Illich at January 21, 2005 07:50 AM

I was kinda wondering, has Yahoo, Google, or MSN, or any other involved party contacted the World Wide Web Consortium [http://w3c.org] for support?

Or at least to officially tell them about 'rel="nofollow"' to at least get the argument started?

I'm all for innovation and not letting standards bodies hold back innovation, but please remember all past lessons about the detrimental effect of runaway nonstandards. And besides even if the W3C decides do reject 'the proposal' (under quotes because it has already been implemented) sooner or later they will also have to support it because of it ubiquity (assuming it becomes ubiquitous).

Posted by: fERDI:) at January 21, 2005 08:00 AM

General blogger response does not look too favourable:

http://www.platinax.co.uk/news/archives/2005/01/new_nofollow_ta.html

It's also a genuine shame that the nofollow tag is touted as a solution for bloggers - when in reality it primarily serves search engines - yet this is plainly not admitted to.

Blogs will still be commented with crap - simply that the search engines will not always have to look at it.

A little more openess and less spin would have been much better welcomed - if the search engine standpoint were more openly explained, and webmaster help invited on that basis, then I'm sure bloggers would have been happy to help.

At the moment, they have simply been made to feel as if the high-profile of the blogosphere is being directly threatened by the devaluation of the bona-fide links between blogs.

Posted by: Brian at January 25, 2005 02:07 AM

Today I was talking to my friends about this silly measure. Silly because in a table of 5 friends, 4 immediately came to their minds to add a nofollow tag to all external link in order to boost ranks.

Yahoo still has time to leave the group and have its own measures to avoid spam.

Today I saw the collateral damage of this measure, a paid web directory changing all links to nofollow more than 50,000 contacts.

What you have to say about this collateral damage?

Posted by: Hugo Romano at January 26, 2005 06:51 PM

Ok, since there are a lot of folks here who are against the "NOFOLLOW" tag, what do you suggest?

Mind you whatever solution you come up with must:
* Be able to differentiate a given link as being spam even though the link itself may not indicate it's nature. (e.g. someone may actually be discussing various online card gaming sites or herbal organ suppliments)
* Be able to be implemented with minimial effort by non-technical individuals. Many over spammed blogs and forums are abandoned or out of date yet get just as much traffic now (from spammers) than they did previously.
* Require minimal processing effort from search engines With billions of pages requiring indexing and constant demands for "freshness" by folks, spiders really can't afford to waste a lot of CPU making fuzzy logic determinations or doing very complex linguistic associations. Heck, having them even look for an attribute tag can be a burden. They simply don't have the time to do it.

Seriously, does anyone who's opposed to the NOFOLLOW tag have a better idea that's more implementable?

(For what it's worth, I keep a whitelist of URLS commentors and posters use. Whitelisted URLs don't get the NOFOLLOW treatment, but i also don't get a ton of links so keeping up with them is fairly simple.)

Posted by: jr at January 28, 2005 09:35 AM

Seems to me that the SE's need to configure a better way of PR that are not so dependant on text links, don't you think?

Posted by: M. Anthony at January 31, 2005 02:12 PM

If Yahoo is such a big believer, why is it not implemented on this Yahoo Blog???????

ITS NOT IMPLEMENTED ON THIS BLOG

Posted by: Guest at February 1, 2005 06:41 AM
Post a comment



(or blank, but don't fake one)


(no weblog? leave it blank)


Remember personal info?

(on topic, please!)




Disclaimer and Reminder. The opinions expressed here are not necessarily the opinions of Yahoo! and we assume no responsibility for such content. Yahoo! may, in our sole discretion, remove comments that are off topic, inappropriate or otherwise violate our Terms of Service. Please do not post any private information unless you want it to be available publicly and never assume that you are completely anonymous and cannot be identified by your comments.

Copyright © 2004 Yahoo! Inc. All rights reserved. Privacy Policy - Terms of Service

1