Skip to article

Technology

Researchers Track Down a Plague of Fake Web Pages

Stuart Isett for The New York Times

Yi-Min Wang of Microsoft helped trace the source of the Web pages.

Published: March 19, 2007

Correction Appended

Tens of thousands of junk Web pages, created only to lure search-engine users to advertisements, are proliferating like billboards strung along freeways. Now Microsoft researchers say they have traced the companies and techniques behind them.

A technical paper published by the researchers says the links promoting such pages are generated by a small group of shadowy operators apparently with the acquiescence of some major advertisers, Web page hosts and advertising syndicators. The report is available at www.cs.ucdavis.edu/~hchen/paper/www07.pdf

The finding is striking because it hints at the possibility of curbing the practice.

The researchers uncovered a complex scheme in which a small group, creating false doorway pages, works with operators of Web-based computers who profit by redirecting traffic passed from search engines in one direction and then sending advertisements acquired from syndicators in the opposite direction.

“A small number of rogue actors who know what they are doing can create an enormous amount of disruption,” said David L. Sifry, chief executive of Technorati, a blog-indexing company that works to keep junk pages of this sort out of its indexes. “It’s sort of like putting a blindfold on you and spinning you around three times and then taking off the blindfold and showing you an ad.”

Using questionable or illegal techniques to improve the ranking of a Web site in query results is known as search-engine spamming. The practice has proved to be a vexing problem for the major search companies, which struggle to prevent both spammers and companies specializing in improving legitimate clients’ Web traffic — a field known as search-engine optimization — from undermining their page-ranking systems.

Surprisingly, the researchers noted that the vast bulk of the junk listings was created from just two Web hosting companies and that as many as 68 percent of the advertisements sampled were placed by just three advertising syndicators.

Search-engine spam is a small but growing component of the overall spam problem, which is predominantly junk e-mail sent from millions of Internet-connected home PCs that have been infected with malicious software. The overall amount of e-mail spam has more than doubled in the last year, according to Postini, a communications security firm.

Mr. Sifry said search-engine spam might be more controllable because of the improved accountability of the Web. “I am actually optimistic about squashing all of this, or at least making sure that it is manageable,” he said.

The Microsoft paper was distributed by Yi-Min Wang and Ming Ma, cybersecurity investigators in the company’s research division, in collaboration with Yuan Niu and Hao Chen, computer scientists at the University of California, Davis.

The researchers found that for some keywords like “drugs” and “ring tone,” more than 30 percent of the results from major search engines were fake pages created by spammers.

They discovered that the average spam density — a measure of the percentage of Web pages that contain only advertisements — was 11 percent for 1,000 keywords they used in their research.

The researchers said large advertisers were to blame for a significant share of the spam problem.

“Ultimately, it is advertisers’ money that is funding the search-spam industry, which is increasingly cluttering the Web with low-quality content and reducing Web users’ productivity,” they write in the paper, which will be presented in May at the International World Wide Web Conference in Banff, Alberta.

Mr. Wang, group manager and senior researcher for cybersecurity and systems management at Microsoft, said, “The good guys are part of the problem.”

The researchers’ specific findings included evidence that some blog-hosting services have permitted an explosion of phony doorway pages. For example, the researchers noted that such pages were far more prevalent in Google’s blogspot.com service than in other hosting domains. The Microsoft Research team has worked extensively with the managers of Microsoft’s Spaces blog-hosting service to detect and identify search-engine spam, Mr. Wang said. Google would not comment for the record on its own efforts to combat such practices.

The Microsoft research findings, based on a survey in October, also determined that much of the spam ad traffic was being funneled through the Internet addresses of just two Web-hosting companies.

Phillip Rosenthal, chief technology officer of one of the companies, ISPrime, an Internet services company based in New York, said the activity had been traced to a single customer and violated the company’s acceptable-use policy. He said the company’s relationship with the customer, whom he would not identify, had been severed after the company was notified about the Microsoft paper by a reporter.

But he was also pessimistic about permanently stopping operators who subvert search engines to gain advertising revenue in this way.

Correction: March 20, 2007

An article in Business Day yesterday about efforts by Microsoft researchers to trace the origins of fake Web pages that lure search-engine users to advertisements misspelled the surname of a cybersecurity investigator for the company. He is Ming Ma, not Ming Wa.

Tips

To find reference information about the words used in this article, double-click on any word, phrase or name. A new window will open with a dictionary definition or encyclopedia entry.

MOST POPULAR - TECHNOLOGY

Inside NYTimes.com