January 15, 1999
  ToolBox

Tools
 Search Engine
 
Advanced search
  Quick Quotes
 
Click here for InvesTrade.
  Tool Newsletter
 
  Departments
  Technology
Convergence
Startups
Companies
E-Business
Personal Finance
  Publications
  Forbes
Forbes Global
ASAP
FYI
American Heritage
  Centers
  Small Business
Mutual Funds
  Forums
  Forbes Forum
On My Mind
  Streaming Media
  Conferences
Audio Series
Media Center
  Services
  Subscriptions
Archives
Sitemap
Employment
Help
Contact Us
Reprints

Letters to the Editor

Special Sections

Conferences

Digital Feedback

 
Subscribe



20 million people are shopping online...

Technology
 
Top 40 Entertainers

Some congressmen are trying to put an end to junk E-mail. A bunch of computer scientists in Redmond may beat them to it.

Spam killers

By William Baldwin

JUNK E-MAIL courses through the Internet, clogging our computers and diverting attention from mail we really want. America Online estimates it is the unwilling bearer of at least a million pieces of spam per day.

The counterattack has been mounted. At least seven bills pending in Congress, including one that has passed the Senate, seek to outlaw spam. AOL has filed a flurry of lawsuits against spam senders, charging trespass and other offenses. Some Internet service providers have compiled a blacklist of known spammers and block all mail coming from their addresses.

Will any of these defenses solve the problem? Doubtful. One difficulty is that junk mailers can go out of business and reappear a day later with different corporate names and different E-mail addresses. Spam may also prove to be too elusive for the federal statute writers. Just what constitutes "unsolicited commercial E-mail," as the stuff is officially known? To get this story, a FORBES reporter sent out several unsolicited E-mails to experts. Was that spam? After all, magazine publishing is a commercial venture. What about the banner ads in our Web site? Spam?

The taming of the spam monster may come from entrepreneurs, not government. Researchers working at Microsoft's Redmond, Wash. headquarters have come up with intriguing ways to turn the computational power of a PC against the crap that people seek to put into it. They have come up with several versions of a filter that zaps spam while allowing legitimate mail to come through. In essence, the filter corners an offending piece of text in a multidimensional vector space where there are no hiding places.

Can such a filter do the job it is supposed to do? Early results look promising. David Heckerman, one of the scientists working on the project, has the filter running on his own mail, where it misclassifies only 0.1% of the legitimate mail while managing to extinguish 91% of what Heckerman considers junk. Says Eric Horvitz, another Microsoft scientist, "I am fairly convinced that the methods we are proposing will effectively kill spam." He expects the technology to turn up in a coming version of Microsoft's Outlook Express, the E-mail software component of Internet Explorer.

It is a simple matter to write an E-mail filter that looks for signs of spam, such as messages containing telltale words like "FREE!" or including suspicious patterns in the sending address. Indeed, a few such filters are on the marketplace, including Spam Buster from Contact Plus and the current version of Outlook. But the simple filters run into two problems. One is that it's difficult to zap an advertisement containing "FREE!" without running the risk of deleting a legitimate E-mail containing "free for lunch?" The other is that spammers mutate like pathogenic bugs, changing come-ons slightly to get past the filters.


Scientist Heckerman uses the spam filter himself: "I feel like a great weight has been lifted from my shoulders."


The experimental Microsoft program deals with the first problem by looking at a lot of variables at once. It takes a constellation of symptoms to trigger the diagnosis of spam—some having to do with the words in a message and some having to do with its appearance (for example, a high percentage of special characters like ! and $).

The program will deal with the evolution of spam over time by mutating itself. It is an example of "machine learning," in which software learns from experience and puts that learning to work in detecting patterns. In this case the software would come to your PC pretrained on a set of several thousand real-world spams and legitimate messages. It would keep learning, observing the pile of legitimate E-mail you may have sitting in a folder and also a folder of trashed spam, if you happen to have one of those. In this fashion the spam-blocker would become sensitive to your needs; if you work on Madison Avenue, it would learn to be more tolerant of exclamation points. And it would maybe stay one step ahead of the spammers. Says Horvitz, "It's going to be hard for a guy sitting in a spam center to keep up with the gazillions of filters out there."

The bad guys are arrayed against an interesting collection of talents at Microsoft. Horvitz, 40, has an M.D. from Stanford. Why would a software firm want to hire a doctor? Horvitz, who also has a Ph.D. in decision science, has thought long and hard about diagnostic logic. You must detect a disease in a constellation of symptoms, no single one of which provides the answer. The problem is closely analogous to the spam problem. Among Horvitz's colleagues are Susan Dumais, an experimental psychologist hired away from Bellcore, and John Platt, who concocts mathematical tricks.

The fine line that separates good from evil

The fine line that separates good from evil What distinguishes trash from desirable E-mail? A subtle interplay of many features—such as the frequency of characters that are neither numbers nor letters, the tendency to be sent in the wee hours and the occurrence of promotional language like "Must be 21 or over." Software in development at Microsoft looks at telltale signs like these, plus a few hundred others, in a test bed of several thousand real E-mails. Then, in the recesses of 200-dimensional space, it aims to locate a hyperplane that does the best job of segregating good from bad. Throw a new E-mail at the program and it tells you whether the message lies on the good or bad side of the plane.

These are abstruse mathematical tricks indeed, but the diagram (see chart) gives you some idea of how they work. In this illustration of what might set spam apart from good mail, only 3 features are shown. In the real spam filter, the list of features runs to 200 or so, meaning that the formulas dance around in a 200-dimensional vector space.

Forget, for the moment, about the other 197 features, and look at just the three. If possible you want to cut the space in two so that all the spam is on one side of the plane. And you want a margin of safety around the plane, to minimize the chance that messages will fall on the wrong side. So, where do you put the plane?

One way of getting the answer has to do with the classic "Lagrange method of undetermined multipliers," named after the 18th-century French mathematical genius Joseph-Louis Lagrange. Clever though he was, Lagrange failed to anticipate how slowly his elegant equations would compute when late-20th-century PCs tried to tackle E-mail. Earlier this year John Platt came to the rescue with an ingenious shortcut that speeds up the work by a factor of 1,000, according to an academic paper he published. He and his colleagues have since come up with an even better scheme, he says, but this one will stay a trade secret for now.

There is immense commercial value in these algorithms. If you can find the attributes that define spam, you can move on to solve other "cluster" problems in artificial intelligence. Divine from the features on a mortgage application whether this borrower belongs in the deadbeat cluster. Or: Slice the population of 1040s so that you can pluck out the 1% that ought to be audited. Or: Decide whether a document is more about weather or about commodities.

Horvitz and his colleagues are already looking at a text classification problem very close to the spam one. They think they can separate pornography from innocent text. As with spam, so with obscenity: You can't define it, but you know it when you see it. If scientists can train computers to recognize pornography, that's one more thing that congressmen shouldn't have to worry about.

| back to top |

Sidebar:

Read more:


20 million people are shopping online...

Sitemap  ·  Help  ·  Search  ·  The Toolbox

Website Directory ·  Webmaster  ·  Ad Information

™© 1998 Forbes Inc. Terms, Conditions and Notices

September 21, 1998
Table of Contents:
Forbes

On The Cover:
The Markets: Armageddon
-How now, Dow?

-Whispering the "D" word

-When the music stops

-A splash of cold water

-Playing the losers

-Riding out the storm

-Beating the IRS

-Rush in to Russia

The Top Entertainers:
-Idea moguls

-The Top 40 list

-Profit disaster

The Shipping News
-Titanic payoff

-Wheeling and dealing

Pixel Envy
-Digital dreamin'

-The Rap Pack

Bad Boy Biz
-Huff behind the Puff

-Biggest show on earth

Soundtrack Maina
-The Sound of Money

-Profits by the gross

The Springer Empire
-Fights, Camera, Action!-and Cash

-Rock the Net

Booty Hunter
-Lara's big numbers

-"Yo, Tokyo, wanna buy a ham?"

Casting Call
-Bits for bit parts

-"It was a dark and stormy night"

Hollywood Rip-Off
-Stripped of cash

Investment Columnists:
Point of View
-Advice to my daughters

Stock Trends
-Stay bullish

Portfolio Strategy
-Stay cool

The Contrarian
-The tide has turned

Stock Trends
-Hard times ahead for index funds

Market Trends
-I'll stick with growth stocks

Management, Strategies, Trends:
-Captain Ahab returns

-A gazelle, not a Godzilla

-The un-PC

-Capitalist chain gangs

-It'S A Gusher!

-Rx: Wild oats

-"It's like, 'Duh'"

-The book on Borders

-Please pass the ants

-California designin'

-(Re)birth in Bethlehem

-Small frog, huge pond

-A sweeter deal

-Benign growth

-Gearheads wanted

-Magna's story

-Fighting terrorists

-The flying sage

-Is Informix back?

-Thiokol's booster

Entrepreneurs:
Up & Comers
-A pinch of hype

Follow the Money/Ventuure Capital
-Greed kills

-Local heroes

International:
-Beer boys

-That's Italian

-An unlikely cop

Charticle:
-Underground nomad

Law & Issues:
-Where there's smoke...

On the Docket
-Supreme illiteracy

Technology:
-Spam killers

Digital Tools
-The family in-box

-War of the Web

-Blackout 2000

-Alcohol-free

Departments:
-Side Lines

-Follow-Through

-Flashbacks

On My Mind
-The roads less traveled

-Readers Say

-Fact and Comment

-Other Comments

-Commentary

Digital Rules
-Keep your eyes on the prize

-Transparent Eyeball

-Forbes Index

-Thoughts

Money & Investments:
The Funds
-The Vanguard of Texas

-Good for whom?

-Safe harbor

Taxing Matters
-Roth, optimized

-Streetwalker

-The Forbes/Barra Wall Street Review

Columnists:
Observations
-Bundled up

Looking Forward
-The spirit of play

Backseat Driver
-The wrong medicine

Management Strategies
-How to foul up a merger

Insights
-Sex and government

The Forbes Life:
The Technologist
-Wired Victorians

Collections
-Beacons of bucks

Re-Viewing
-Saving the Sullivans

Journeys
-Haute hotels

Small Business is Big Business at Forbes