The best way to can spam

It’s a security risk, a productivity drain, and just plain annoying. Put a lid on unwanted e-mail with a multi-layered strategy

Spontaneous end-to-end communication used to be the Internet’s magic ingredient. But scarcity of IPv4 address space and legions of vandals resulted in NATs and firewalls. Now, unfiltered end-to-end communication happens, for the most part, by invitation only.

Until recently, the lone exception was e-mail. You didn’t need permission to contact someone by e-mail, and you could be reasonably certain that a message you sent would land in the recipient’s inbox. Inevitably that had to change, too. The spam epidemic compels us to create and use the e-mail equivalent of NATs and firewalls: a combination of content filters, white lists, and blacklists.

The immediate tactical question is not whether to use these techniques, but how. There are also long-term strategic questions about the things we expect e-mail to do. But first things first: If more than a trickle of spam is landing in your organization’s inboxes, you need to solve that problem now.

Identifying the sender

The two main types of anti-spam solutions — those based on the identity of the sender and those based on the content of the message — can be, and usually are, deployed in combination. The two models for deploying anti-spam solutions — on gateways and servers or on clients — can also be used in combination, although many enterprises would prefer a server-based approach that won’t add to existing desktop support and training burdens. Click for larger view.

28FEspam-in2.gif
The two main types of anti-spam solutions — those based on the identity of the sender and those based on the content of the message — can be, and usually are, deployed in combination. The two models for deploying anti-spam solutions — on gateways and servers or on clients — can also be used in combination, although many enterprises would prefer a server-based approach that won’t add to existing desktop support and training burdens. Click for larger view.

Enterprise-oriented products, such as Proofpoint’s Protection Server and ActiveState’s PureMessage, run inbound e-mail through a gauntlet of checks defined by corporate policy. These can include virus scans as well as spam detection. In the latter case, the customer decides which identity- and/or content-oriented spam-detection modules to deploy and whether to reject, quarantine, or merely tag a message when its score tips the spam scale.

Identity is, however, a double-edged sword that can vilify or sanctify a sender. Modules that use DNSBLs (DNS-based blacklists) look up the sender’s IP address in databases that track misconfigured mail servers and reported spammers. These services exhibit varying degrees of transparency and accountability, making them useful yet controversial (see “Blacklists: The New Neighborhood Watch”). Some anti-spam vendors ship with DNSBL modules disabled, subject to customer override. Others use them by default, but judiciously, as a component of an overall score.

Eric Allman, CTO of Emeryville, Calif.-based Sendmail and creator of the company’s eponymous e-mail solution, calls DNSBLs “a dull sword”; a sample of my own messages caught by DNSBL filtering shows why this method should not be used in isolation:

Subject: Caldwell and Associates, Inc. Expands Grant Writing Department

Subject: Final Reminder — SOHO Reception

These legitimate bulk mailings are classic DNSBL “false positives” from probably legitimate bulk-mail senders who got blacklisted. To certify such legitimate senders and to avoid incorrect identification, another set of checks and balances is helpful. IronPort, for example, has created a Bonded Sender program that inverts the DNSBL idea. In this scheme, a high-volume sender (for example, CNET) puts up a bond that’s forfeited if one or more of its registered IP addresses violates a list’s opt-in policy or otherwise engages in spam.

This strategy is a DNSWL (DNS-based white list) from which a positive response means “trustworthy sender.” How an anti-spam system makes use of that judgment is, again, a matter of policy; skipping content checks would be a reasonable and likely policy.

Another new strategy for certifying the sender’s identity is the RMX (Reverse Mail eXchange) proposal. A DNS MX record creates a mail route for a domain name. A domain owner would use RMX records to identify those hosts within the domain that are specifically authorized to send mail, and a server receiving mail would check to see whether the sender’s IP addresses were so identified. Mail from an unauthorized host can be rejected or quarantined.

This is a nice idea that can be rolled out incrementally to combat forged From: addresses. The IP address of the mail server that delivers a message is nearly impossible to forge, but the address in the From: header is easy to rewrite. Spammers do that routinely, playing havoc with white lists or blacklists that depend on those addresses.

RMX is a bit problematic for road warriors who lack remote access to company mail servers and consequently transmit directly from their laptops. But once again, a missing or negative RMX response can be used as just one component of a message’s overall score.

“It’s like caller ID,” says Jesse Dougherty, Vancouver, B.C.-based ActiveState’s director of development. “If I don’t recognize your number, that’s one strike against you, but I may still choose to take the call.”

These approaches deal with organizational identity, and so they cut a wide swath. Ideally, they should be complemented with finer distinctions based on the identity of individual senders. One possible addition is the challenge/response protocol offered by EarthLink and other e-mail providers. In this scheme, an unknown sender is challenged to read digits embedded in an image on a Web page; if successful, the sender is then exempt from future challenges. It’s highly effective, but it’s inappropriate and rude in a customer-service-oriented business setting.

Another candidate is the S/MIME (Secure MIME) digital signature. Although all the major e-mail clients have supported that scheme for years, most enterprises opt not to deploy the client certificates that would enable digital signing and encryption.

Were such policy implemented now, it would simplify spam prevention in one way but complicate it in another: A digital signature, in and of itself, wouldn’t mean that a message was good, because spammers would sign their messages, too. But it would enable the anti-spam engines to look up the spammers’ signatures in online databases in a more granular way than using DNSBLs.

If people also used their digital IDs to encrypt messages, IT would find itself in a bit of a quandary. Plain-text e-mail leaks confidential data like a sieve, so if users and their correspondents in other organizations make routine use of encryption, it will help seal the holes. The downside is that encrypted mail becomes opaque to another weapon in the spam detector’s arsenal: content analysis.

28FEspam-in3.gif
Click for larger view.
Inspecting the content

Anti-spam vendors collect and analyze vast databases of spam, boiling messages down to fingerprints (“fuzzy checksums”). These can be compared to corresponding fingerprints derived from incoming mail to weed out unwanted messages. These vendors also maintain vocabularies of objectionable terms and use Bayesian and other kinds of content classifiers to watch for common spam terms, but these approaches require a lot of feedback and training.

Bayesian filtering, which computes a score for each message based on statistical analysis, is proving both a popular and effective spam-fighting technology (infoworld.com/29). This does require users to make an initial investment in classifying their messages, and that’s something IT is rightly reluctant to impose. When trained, a client-side Bayesian filter adapts to the ever-changing stream of mail in a very easy and natural way: the Delete key becomes a more powerful Classify-as-Spam key.

Some users love this ability to define spam in a personal and unique way; others would rather not have to think about the problem. But until Bayesian filtering or related techniques are part of the standard enterprise desktop — as will begin to happen when, for example, Outlook 2003 rolls out — many companies are reluctant to embrace end-user deployment.

“Most users can’t even figure out how to use Outlook rules, so it’s a struggle to make the case for an individualized solution,” says Andrés Kohn, director of marketing at Cupertino, Calif.-based Proofpoint.

IT would rather do things in a centralized way, and that’s the right instinct — but client-side technologies can be powerfully complementary to the gateway technologies. The combination works well when users want to be more engaged in the anti-spam processes and enterprises support those efforts. Overall, it makes sense to combine the two. If you can pick only one, start with a centralized gateway solution — it will keep users slightly removed from the processes, making both users and IT departments happy.

Rethinking e-mail’s purpose

Anti-spam vendors want to deliver centralized turnkey solutions, and customers want to have them. The devil, however, is in the details. Because the damage done by a single missed message could be severe, anti-spam systems quarantine caught messages so that users can review them and, if necessary, release false positives. Of course, when the volume of junk falls off dramatically, users don’t want to root around in the haystack looking for a few needles.

When the only tool you have is a hammer, every problem looks like a nail. But e-mail is no longer the only communication tool at your disposal. RSS is an easier and better way to run a subscription service. IM trumps e-mail for intensive real-time collaboration. These methods are useful in their own right, but the spam epidemic makes them even more compelling.

Even as our supply of anti-spam weapons grows, it makes sense to find ways to reduce the demand for them. Reliable, direct, and spontaneous communication, from anyone to anyone, is the special magic of e-mail — what makes it such a precious resource. Reserving e-mail for these purposes will make spam detection easier and will help us conserve that resource.

Related:

Copyright © 2003 IDG Communications, Inc.

How to choose a low-code development platform