Free Newsletters
InfoWorld Daily

InfoWorld
Log-in | Register

Inside the spam filter

Anti-spam solutions rely on a variety of increasingly sophisticated techniques to block spam

By Logan G. Harbaugh
November 14, 2003
 

Anti-spam solutions use a variety of techniques to check the contents of e-mail, gathering information from all parts of the message including the header, body, and any attachments. A basic technique of spam filtering involves checking the header of the message for the IP address of the original sender and comparing it to a whitelist or blacklist. Blacklists are lists of addresses of known spammers, and whitelists are lists of senders whose e-mail should be allowed through even if it appears to be spam. The filter may also look for signs that the message header has been forged to hide the original sender.

Free IT resource

TechNet: More ways to know it, share it, and keep it running.

Sponsored by Microsoft

Free IT resource

Attend the SOA Executive Forum: Breaking SOA Bottlenecks SOAExecForum.com/may2007

Sponsored by InfoWorld

DOWNLOAD PDF

Click here to download InfoWorld's special report: Spam shootout


Content checking is the basis for most anti-spam technology, and includes simple filtering based on certain words, attachment types (such as MP3 files), signatures, heuristics, and statistical analysis. The problem with simple filtering -- filtering all messages containing the words “Viagra” or “spam,” for example -- is that not only might there be legitimate messages containing those words, but that it’s easy to change the words, either by deliberate misspelling (“V!agra”) or by using an HTML message and inserting invisible characters between the visible ones. The same is true of signatures, where the filter looks for content similar to known spam. The tools spammers use to conceal or obfuscate the content of messages that filters look for continues to become more sophisticated.

As spammers become more adept at bypassing filters, anti-spam vendors must find more sophisticated methods of detecting spam. Heuristics uses a series of rules to score an e-mail, so that a message might get one point for containing the word “Viagra,” one point for a “click here” link, one point for a price ($19.99), one point for a “click here to unsubscribe” link, and one point for a URL that points to a known spam site. A score of three or more might get the message quarantined.

Bayesian filtering uses statistical analysis to detect spam. It looks at content and assigns a probability that a document is spam based on the number of documents defined as spam (or not spam) with similar content. Thus, a message containing “Viagra” would have a certain probability of being spam, but a message containing “V!agra” would have a much higher probability of being spam, since legitimate e-mail wouldn’t likely include the misspelling. Likewise, a message containing “RFP” (request for proposal) or “process,” for example, would have a very high probability of being legitimate mail.

All the products I tested use a combination of these techniques. Some combine heuristics with blacklists, or Bayesian analysis with source IP checking. As my tests show, today’s commercial products are extremely good at identifying spam. But even the best of them ultimately block some legitimate mail, if only because marketing e-mails and newsletters that people want have the same characteristics as those they do not want. All enterprise anti-spam solutions include whitelists, because no system is perfect.





 


 
IT consultant Logan Harbaugh is the author of two books on networking. Contact him at logan@lharba.com.
 

TOP NEWS:


»  Four quick tips for choosing an IM security product
71 percent of businesses will invest in real-time messaging this year. If you're one of them, be sure to protect your enterprise

»  Forrester analysts ID hot IT jobs
Research group finds 16 IT roles with a promising future

»  Nvidia claims 10 hours of HD video on Tegra chip
The Tegra 600 and 650 can be used with hard disk drives and are designed partly for mobile Internet devices

»  Database vendors add Google's MapReduce
Greenplum and Aster Data Systems will support Google's programming technique, developed for parallel processing of large data sets across commodity hardware

»  Network management: Tips for managing costs
New technologies, changing requirements, and ongoing equipment maintenance and upgrades cost money, but there are ways to manage expenses

»  EMC targets SMBs, branch offices with new low-end storage
Celerra NX4 highlights include thin provisioning, snapshot technology for data recovery and backups, and Web-based console for management of storage volumes




FIVE WAYS TO REDUCE IT COSTS IN 2009
The demands on IT have never been greater, particularly in light of lower revenue and uncertain demand for the goods and services. There are many ways that IT can help organizations adjust to this new economic environment. Learn about five key technology trends that can immediately impact your organization's bottom line, and how to build a strategy to implement these technologies within your current budget. Sponsored by: Riverbed

»  Click here to view this Webcast
  Enterprise Data Security Solutions Guide
Data security used to be about outside threats. These days the biggest challenge for data-driven organizations is the management of secure information from the inside out. Data is available on laptops, your network and even USB devices, but not always secure. Read this Solutions Guide to learn the best ways to keep it safe. Sponsored by ISC2

»  Click here to download now

- Special Advertising Partners -
WHITE PAPERS
 

» Technology White Papers Library

Technology White Papers by Topic

Technology White Papers E-mail Alert

Find out when the latest white paper is available:
 
 
INFOWORLD MARKETPLACE
 
» BUY A LINK NOW
 

FIND PRODUCTS AND COMPANIES
» COMPLETE PRODUCT GUIDE



TECHNOLOGY INDEX
• Applications
• Application Development
• Security
• Networking
• Wireless
• Platforms
• Hardware
• Data Management
• Storage
• Web Services
• Business
• Telecom
• Professional Services
• Standards

TECH WATCH 


What's the 411 on GOOG-411?
Just as Google has become synonymous with "performing a Web search," 411 is understood to mean "information" -- as in "what's the 411?" I was thus surprised to discover, from a billboard, no less, that the king of search is taking on the ...

Apple HTML source reveals 'iPhone Extreme'
"This one's a stretch..." reports AppleInsider. Um, yeah. Reporting on HTML code sightings of product names could be called a stretch, but iPhone Extreme has a ring to it. Now, that sounds like the product Apple should have released first, rather ...

COLUMNISTS

Unified under law
Ephraim Schwartz's Column and Blog (InfoWorld) - In the litigious world we live in, deploying a unified communications platform in your enterprise could...
» MORE COLUMNISTS

MORE INFOWORLD BLOGS


Open Sources 
Product Management
When I joined MySQL four years ago, there was quite a lot of debate about product management. We didn't actually have ...

Zero Day 
Botnet herders tending smaller flocks
New research backs up the theory that botnet operators are keeping their networks smaller in a continued effort to keep ...



• Advice Line
• Database Underground
• The Deep End
• Enterprise Mac
• Geeks in Paradise
• Grid Meter
• The Gripe Line
• InfoWorld Daily
• Inside IT
• IT Troubleshooter
• ITXtreme
• Open Sources
• ProdBlog
• Real World SOA
• Reality Check
• Security Adviser
• SMB IT
• The Storage Network
• Tech Watch
• Virtualization Report
• Zero Day

ADVERTISEMENT


RESOURCE CENTERadvertisement 

GOVERNMENT IT & POLICY
'If you don't go after the network, you're never going to stop these guys. Never.'
From the State Department, All the News for Inquiring Minds
TechPresident, the Internet Citizenry's New Consensus Taker



Sponsored Technology Links

 
 
 HOME  NEWS  BLOGS  PODCASTS  VIDEOS  TECHNOLOGIES  TEST CENTER  EVENTS   About | Advertise | Awards | RSS | Contact Us 

Copyright © 2008, Reprints, Permissions, Licensing, IDG Network, Privacy Policy, Terms of Service.
All Rights reserved. InfoWorld is a leading publisher of technology information and product reviews on topics including viruses,
phishing, worms, firewalls, security, servers, storage, networking, wireless, databases, and web services.

CIO :: ComputerWorld :: CSO :: Demo :: GamePro :: Games.net :: IDG Connect :: IDG World Expo
Industry Standard :: IT World :: JavaWorld :: LinuxWorld :: MacUser :: Macworld :: Network World :: PC World :: Playlist
TecChannel :: TecCommunity