Anti-spam solutions rely on a variety of increasingly sophisticated techniques to block spam
Anti-spam solutions use a variety of techniques to check the contents of e-mail, gathering information from all parts of the message including the header, body, and any attachments. A basic technique of spam filtering involves checking the header of the message for the IP address of the original sender and comparing it to a whitelist or blacklist. Blacklists are lists of addresses of known spammers, and whitelists are lists of senders whose e-mail should be allowed through even if it appears to be spam. The filter may also look for signs that the message header has been forged to hide the original sender.
Content checking is the basis for most anti-spam technology, and includes simple filtering based on certain words, attachment types (such as MP3 files), signatures, heuristics, and statistical analysis. The problem with simple filtering -- filtering all messages containing the words “Viagra” or “spam,” for example -- is that not only might there be legitimate messages containing those words, but that it’s easy to change the words, either by deliberate misspelling (“V!agra”) or by using an HTML message and inserting invisible characters between the visible ones. The same is true of signatures, where the filter looks for content similar to known spam. The tools spammers use to conceal or obfuscate the content of messages that filters look for continues to become more sophisticated.
As spammers become more adept at bypassing filters, anti-spam vendors must find more sophisticated methods of detecting spam. Heuristics uses a series of rules to score an e-mail, so that a message might get one point for containing the word “Viagra,” one point for a “click here” link, one point for a price ($19.99), one point for a “click here to unsubscribe” link, and one point for a URL that points to a known spam site. A score of three or more might get the message quarantined.
Bayesian filtering uses statistical analysis to detect spam. It looks at content and assigns a probability that a document is spam based on the number of documents defined as spam (or not spam) with similar content. Thus, a message containing “Viagra” would have a certain probability of being spam, but a message containing “V!agra” would have a much higher probability of being spam, since legitimate e-mail wouldn’t likely include the misspelling. Likewise, a message containing “RFP” (request for proposal) or “process,” for example, would have a very high probability of being legitimate mail.
All the products I tested use a combination of these techniques. Some combine heuristics with blacklists, or Bayesian analysis with source IP checking. As my tests show, today’s commercial products are extremely good at identifying spam. But even the best of them ultimately block some legitimate mail, if only because marketing e-mails and newsletters that people want have the same characteristics as those they do not want. All enterprise anti-spam solutions include whitelists, because no system is perfect.
Looking for the missing free copy icon? It's been replaced. There's a new direct link that works like a...
Supreme Court's decision is bad news for developers targeting the U.S. market, who will now have to...
The transition from command line to line-of-command requires a new mind-set -- and a thick skin
The upgrade improves everything from language support to lifecycle management, and it opens the door to...
A new, added-cost threat-protection service for Exchange can protect email, but you should look at...
No one knows why the retired exec continues to hang around the office, but they know to expect extra...
Programmers fare well, but some positions, such as Web developers, see rising unemployment ...