Anti-spam solutions rely on a variety of increasingly sophisticated techniques to block spam
Anti-spam solutions use a variety of techniques to check the contents of e-mail, gathering information from all parts of the message including the header, body, and any attachments. A basic technique of spam filtering involves checking the header of the message for the IP address of the original sender and comparing it to a whitelist or blacklist. Blacklists are lists of addresses of known spammers, and whitelists are lists of senders whose e-mail should be allowed through even if it appears to be spam. The filter may also look for signs that the message header has been forged to hide the original sender.
Content checking is the basis for most anti-spam technology, and includes simple filtering based on certain words, attachment types (such as MP3 files), signatures, heuristics, and statistical analysis. The problem with simple filtering -- filtering all messages containing the words “Viagra” or “spam,” for example -- is that not only might there be legitimate messages containing those words, but that it’s easy to change the words, either by deliberate misspelling (“V!agra”) or by using an HTML message and inserting invisible characters between the visible ones. The same is true of signatures, where the filter looks for content similar to known spam. The tools spammers use to conceal or obfuscate the content of messages that filters look for continues to become more sophisticated.
As spammers become more adept at bypassing filters, anti-spam vendors must find more sophisticated methods of detecting spam. Heuristics uses a series of rules to score an e-mail, so that a message might get one point for containing the word “Viagra,” one point for a “click here” link, one point for a price ($19.99), one point for a “click here to unsubscribe” link, and one point for a URL that points to a known spam site. A score of three or more might get the message quarantined.
Bayesian filtering uses statistical analysis to detect spam. It looks at content and assigns a probability that a document is spam based on the number of documents defined as spam (or not spam) with similar content. Thus, a message containing “Viagra” would have a certain probability of being spam, but a message containing “V!agra” would have a much higher probability of being spam, since legitimate e-mail wouldn’t likely include the misspelling. Likewise, a message containing “RFP” (request for proposal) or “process,” for example, would have a very high probability of being legitimate mail.
All the products I tested use a combination of these techniques. Some combine heuristics with blacklists, or Bayesian analysis with source IP checking. As my tests show, today’s commercial products are extremely good at identifying spam. But even the best of them ultimately block some legitimate mail, if only because marketing e-mails and newsletters that people want have the same characteristics as those they do not want. All enterprise anti-spam solutions include whitelists, because no system is perfect.
Android 5.1 fixes a lot of what's wrong in 5.0.
Macworld goes hands-on with Apple's thinnest, just-announced laptop. It's so thin, it can only fit a...
With only the third CEO in the company's history, Microsoft did not want to remain complacent and on...
Sponsored by Nuage Networks
Sponsored by Fibre Channel Industry Association
Windows 10 betas are coming fast and furious. Discover what Microsoft has released so far
An open, fully connected environment is impossible and dangerous, which is why IoT is really a...
Your personal brand is the set of unique talents, skills and personality that makes you exactly the...
The upcoming version of the server-side scripting language has provided 100-percent-plus performance...