You are on page 1of 23

Spam Task Force Network and Technology Working Group Anti-Spam Technology Overview

Telecommunications Engineering and Certification Industry Canada May 2005

Contents
Abstract ............................................................................................................................... 5 Scope................................................................................................................................... 5 Abbreviations...................................................................................................................... 5 1. Introduction..................................................................................................................... 6 2. Overview......................................................................................................................... 7 2.1 Email Technologies .................................................................................................. 8 2.1.1 Simple Mail Transfer Protocol And Extensions ................................................ 8 2.1.2 Post Office Protocol And Internet Message Access Protocol............................ 9 2.1.3 Multipurpose Internet Mail Extensions, Pretty Good Privacy, And Secure Multipurpose Internet Mail Extensions ...................................................................... 9 2.1.4 Web-Based Technologies .................................................................................. 9 2.2 Categorizing Spam.................................................................................................. 10 2.2.1 Email Spam...................................................................................................... 10 2.2.2 Spam For Instant Messaging............................................................................ 11 2.2.3 Spam Over Internet Telephony........................................................................ 11 2.3 Spam Sources.......................................................................................................... 11 2.3.1 Open Relays ..................................................................................................... 11 2.3.2 Disposable Accounts........................................................................................ 12 2.3.3 Proxies.............................................................................................................. 13 2.3.4 Compromised Hosts......................................................................................... 13 3. Anti-Spam Technologies .............................................................................................. 15 3.1 Message Filtering.................................................................................................... 15 3.1.1 Content Filters.................................................................................................. 15 3.1.2 Hashing Filters ................................................................................................. 15 3.1.3 Statistical Filters............................................................................................... 15 3.2 Address Lists........................................................................................................... 16 3.2.1 Domain Name System-Based Systems............................................................ 16 3.2.2 Dynamic Users Lists ........................................................................................ 17 3.3 Client Server Authentication................................................................................... 17 3.3.1 Simple Mail Transfer Protocol Authentication................................................ 17 3.3.2 Post Office Protocol Before Simple Mail Transfer Protocol ........................... 18 3.3.3 Transport-Layer Security ................................................................................. 18 3.4 Packet Filtering And Inspection.............................................................................. 18 3.4.1 Simple Mail Transfer Protocol Egress Filter ................................................... 18 3.4.2 Firewalls........................................................................................................... 19 3.4.3 Traffic Monitoring And Rate Limiting ............................................................ 19 4. Emerging Technologies ................................................................................................ 20 4.1 Domain Authentication........................................................................................... 20 4.1.1 Sender Policy Framework................................................................................ 20 4.1.2 Sender Identification........................................................................................ 21

Anti-Spam Technology Overview

4.1.3 Domain Keys ................................................................................................... 21 4.1.4 Identified Internet Mail .................................................................................... 22 4.2 Internet Protocol Version 6..................................................................................... 22 4.3 Presence .................................................................................................................. 22 Discussion ......................................................................................................................... 23 Conclusion ........................................................................................................................ 23

Spam Task Force Network and Technology Working Group

Anti-Spam Technology Overview

Abstract
This paper details the issues relating to the distribution and prevention of unsolicited electronic messaging, or spam. The document provides an overview of existing and emerging technologies used to combat spam. The methods used to distribute spam and evade anti-spam technologies are also discussed. The goal of this paper is to help explain the technical methods involved, in order to improve understanding of the issues at stake.

Scope
This paper covers the topics surrounding email-based spam and the technologies used to prevent it. The technologies explained in this report are meant to capture the current state of the art, and should not be considered exhaustive. This document will be revised in the future to reflect the changing landscape of anti-spam technologies and, as such, should be considered a living document.

Abbreviations
CAN-SPAM DNS DNSBL IIM IMAP MAPS MARID MASS MIME MTA MUA MX POP RBL SMTP SPAM SPIM SPIT SRS TCP TLS UBE UCE VoIP Controlling the assault of non-solicited pornography and marketing Domain name system Domain name system block list Identified Internet mail Internet message access protocol Mail abuse prevention system Mail transfer agent authorization records in DNS Message authentication signature standards Multipurpose Internet mail extensions Mail transfer agent (server) Mail user agent (client) Mail exchanger Post office protocol Real-time black hole list Simple mail transfer protocol Self-promotional advertising message Spam for instant messaging Spam for Internet telephony Sender rewriting scheme Transmission control protocol Transport layer security Unsolicited bulk email Unsolicited commercial email Voice over Internet protocol

Spam Task Force Network and Technology Working Group

Anti-Spam Technology Overview

1. Introduction
Electronic messaging has been one of the key factors in the growth of the Internet. Users ability to send messages to recipients on the other side of the world at nearly no cost has been very disruptive for other message-delivery methods such as fax and letter mail. The low cost of message delivery has enabled unsolicited senders to deliver their messages using the same media. Some of these unsolicited messages have been classified as spam by users. In the past, spam was simply considered a nuisance by many users. However, in recent years, the volume of this type of unsolicited message has increased. Often, the message content is deceptive, fraudulent or offensive, and the source cannot easily be identified. The current state of electronic messaging has caused concern for many and has led to the development of anti-spam solutions. Solutions have come from a range of areas, including technological, judicial and political. This paper will explain the various technical solutions used to combat spam in electronic mail and related messaging technologies.

Spam Task Force Network and Technology Working Group

Anti-Spam Technology Overview

2. Overview
Email transmission remains relatively unchanged from the original model developed in the early 1980s. Figure 1 depicts a generic message exchange between a sender and receiver. There are variations to this model that may change the message flow, but the basic exchange remains the same. Variations might include mail relays, mail gateways or proxies, some sender-authentication techniques, web-based email, etc.

Figure 1: Typical Email Message Sequence

1. A client or mail user agent (MUA) first assembles a message to be sent. The MUA then establishes a simple mail transfer protocol (SMTP) connection with their mail transfer agent (MTA). The MUA uses SMTP commands to identify the sender and recipient, and transmits the message to the MTA. In this case, the MTA is a mail server hosted by the senders Internet service provider (ISP). 2. Once the message has been received by the senders MTA, the recipients MTA must be located. Using the domain portion of the recipients email address, a query to the domain name system (DNS) is issued requesting the mail exchanger (MX) record for the recipients domain. The query will return a listing of the recipients mail servers. 3. Once the address is known, a subsequent SMTP session is established and the message is transmitted to the recipients MTA. Once the message is received, it will be stored for retrieval. 4. The recipients MUA uses the post office protocol (POP) or Internet message access protocol (IMAP) to contact the server and retrieve any stored mail.

Spam Task Force Network and Technology Working Group

Anti-Spam Technology Overview

2.1 Email Technologies


Various protocols and technologies are available for sending and receiving electronic mail. The following sections briefly describe the most common protocols used in email messaging.

2.1.1 Simple Mail Transfer Protocol and Extensions


SMTP is the most widely used protocol for the transmission of email. The protocol was originally used for exchanging text messages between nodes on the United States Department of Defenses Defense Advanced Research Projects Agency (DARPA) internetwork, and became widely adopted with the expansion of the Internet. The protocol has since been refined,1 and extensions have been added.2 SMTP was intended to be simple and robust. The protocol was designed to be open, use human-readable commands, and support relaying across disparate networks. These features have aided in the widespread deployment of SMTP for email transmission, but have contributed to the abuse of the protocol as well. Because of the global acceptance and reliance on SMTP, it is thought that it would be very difficult to replace the protocol outright. Extensions and enhancements have improved the protocol throughout its lifetime, but some of its underlying weaknesses still remain. For robust message delivery, SMTP allows for relaying that enables an MTA to send a message to the nearest available MTA if the intended receiving MTA is offline or unreachable. When the MTA is available the message can be received, otherwise the relay point will notify the sender that the destined MTA is unreachable. Relaying is necessary, and is a legitimate function of SMTP. Relaying is used for remote users, web-mail services, spam-message filtering and other functions. To prevent abuse of relaying services, access should be limited through address restrictions; SMTP authentication; or network-security mechanisms, such as Internet protocol (IP) security (IPSec) or transport layer security (TLS). SMTP application-level gateways, also known as SMTP proxies, are used to transmit mail across network boundaries. Often, corporate networks use an SMTP gateway to process mail to and from external sources, and to modify internal corporate address information when passing through the gateway. Similar to relays, proxies can also be used for sending spam if they are not properly secured.

1. J. Klensin, RFC 2821, Simple Mail Transfer Protocol, April 2001. 2. J. Klensin, N. Freed, M. Rose, E. Stefferud and D. Crocker, RFC 1869, SMTP Service Extensions, November 1995.

Spam Task Force Network and Technology Working Group

Anti-Spam Technology Overview

2.1.2 Post Office Protocol and Internet Message Access Protocol


POP was developed to retrieve mail for users from mail servers. Without a messageretrieval method, all messages would remain on the server, and clients would need to access the server in some way to use email or run their own server locally. POP became widely accepted after a third revision (POP3) was created. A POP user can connect and retrieve all messages, but POP does not provide adequate server-side storage or categorization. A subsequent protocol was designed to address the message-storage issues associated with POP. IMAP allows for synchronization, strong authentication, and server-based message management (e.g. folder-based storage, searches and message states). Both protocols are still used today.

2.1.3 Multipurpose Internet Mail Extensions, Pretty Good Privacy, and Secure Multipurpose Internet Mail Extensions
Originally, plain text (7-bit ASCII text) was the only format for content specified for SMTP messages. To extend this format, multipurpose Internet mail extensions (MIME) were developed to provide pictures, data and multimedia content in email. MIME provided a way to send many types of content, but did not address the need for confidentiality in email. Confidentiality and authenticity have been addressed through two privacy methods: pretty good privacy (PGP) extensions and secure/MIME (S/MIME). Both of these standards ensure the message content cannot be altered or viewed by anyone but the intended recipient. To protect the message using either technology, both the sender and recipient must be capable of handling PGP or S/MIME message content. Both clients need to support the same method, either PGP or S/MIME, but the mail servers do not, since they only transfer the message based on the headers, which are not encrypted.

2.1.4 Web-Based Technologies


Web-based email and email-enabled Internet applications will also be discussed in this report. Web-based-email, or web-mail, services have become commonplace, and play the role of a traditional email client. A web-mail user only needs a web browser, and does not need a mail client to handle SMTP, POP, or IMAP sessions. Features such as security, antivirus protection, storage space and spam filters can be easily introduced for users. Frequently, web-mail accounts are offered at no cost to the user and are subsidized by advertising. Similar to web mail, server-side applications can enable direct access to an MTA. Most applications use a common interface for web-based clients to communicate with the application. One example is a website-feedback application, which is typically used by website visitors to send email inquiries to website administrators.
Spam Task Force Network and Technology Working Group

Anti-Spam Technology Overview

2.2 Categorizing Spam


The general term spam is commonly used to refer to any unwanted unsolicited electronic message. In this report, we focus on email-related spam, but other types, such as instant-messaging (IM) spam or Internet-telephony spam, are also of concern. This section describes the properties of the various types of spam.

2.2.1 Email Spam


Spam, for purposes of discussion here, is considered unwanted, unsolicited bulk email (UBE) or unsolicited commercial email (UCE). The main objective of UBE and UCE is to provide direct advertising to the largest possible audience at the lowest possible cost. New technologies such as email lower the costs of distributing advertising and have enabled the growth of spam. There are many costs associated with email spam, including the costs of lost productivity, user education, network-infrastructure loads, and development and deployment of antispam technologies. The common properties that help identify a message as spam are that: the source of the message cannot be trusted or authenticated; the costs incurred by the sender are often less than the total costs incurred by the recipients; and the content of the message contains unwanted offensive, fraudulent or deceptive material. For spam email to be delivered, a list of target email addresses has to be compiled. A common way to acquire valid email addresses is through harvesting. Harvesting is done using automated software that can scour public databases and websites for email addresses. Once a list of recipients is obtained, software tools can be used to formulate the content and header of each message. These headers and contents are customized to hide the source and bypass anti-spam technologies such as filters. The messages can then be sent using automated tools to distribute the transmission load across various sources (relays, proxies, etc.). Sources of email spam are discussed further in Section 2.3. A user should be aware of how spammers get addresses, as well as how they collect data about their message recipients. For instance, some spam email may provide an unsubscribe option at the end of the email. A user should be cautious in using this option, however, since it can be used for malicious purposes. The link can let a spammer identify which recipients have read the message. This can, consequently, increase the amount of spam a user receives. The unsubscribe option may also link to content that contains a method for exploiting vulnerabilities in the recipients web-browser software. In some cases, these exploits are used to install malicious software on the recipients computer, which can then be used to transmit spam to others.

Spam Task Force Network and Technology Working Group

10

Anti-Spam Technology Overview

2.2.2 Spam for Instant Messaging


Instant messaging (IM) can also be used as a medium for sending spam hence the name spam for instant messaging (SPIM). Internet-based IM services, like MSN Messenger, Yahoo! Messenger and Jabber, can also be used to transmit spam. In the case of Internet-based IM services, a directory is often used to locate and identify subscribers. The directory services can be used to harvest usernames and, subsequently, send individual messages to each subscriber. This process can be automated using software; however, the sender must also be a user of the IM system. Most IM systems now block, by default, messages from unknown senders, and provide users with the ability to block specific senders.

2.2.3 Spam Over Internet Telephony


Voice over Internet protocol (VoIP) has the potential to lower the cost to advertisers of sending direct voice communications to a target audience. The transmission of spam using VoIP, also dubbed spam over Internet telephony (SPIT) involves using an Internet connection to place calls or leave messages to VoIP subscribers. The costs associated with VoIP spam are higher than for email spam or SPIM, since senders of audio messages incur higher costs for the bandwidth used. The method for sending SPIT is similar to that for sending other spam; automated software can be used to establish connections to a VoIP terminal. Once a connection is made, a message can be sent as audio or recorded as voicemail. Unlike telemarketing, VoIP spam can be sent over Internet connections, where trust and traceability are often lacking. The use of secure network architectures can protect subscribers from receiving calls from untrusted callers. However, non-technical methods may be needed to mitigate spam for VoIP services.

2.3 Spam Sources


This section discusses the various sources of email spam and the techniques used to send spam messages from these sources.

2.3.1 Open Relays


The ability to relay electronic mail is a core feature of the SMTP protocol. Historically, most relays would accept mail from any host, in order to best ensure message delivery. When an MTA relays mail from any host it is referred to as an open relay. Normally, an MTA is configured to relay mail from known clients, with identification usually based on Internet address or authentication information. Users unknown to an MTA receive a relaying prohibited notification if they try to send mail through a secured relay. Open relays have become obsolete with the enhancements in SMTP;
Spam Task Force Network and Technology Working Group

11

Anti-Spam Technology Overview

permitting unknown clients to relay mail is considered poor practice. The open relays that are publicly available have been widely abused by anonymous senders of spam. Many network operators have forbidden open relays on their network, and have methods to detect them. Once an open relay is located, it is usually listed so that other mail servers can be aware of the open relay. The CAN-SPAM Act of 2003 includes penalties if a sender transmits spam through an open relay.3 However, they are only effective if the sender is not anonymous. To transmit spam via an open relay, a sender must first identify an open relay. Once a relay has been identified, a sender can use it to relay messages to targeted recipients. Senders often use dial-up accounts or connect via a proxy to conceal their identity. Once a connection is made, the message, including its list of recipients, is sent to the MTA, and the client can disconnect. The relay will then proceed to send the message to each recipient, without any further involvement from the client. The message headers will record each additional relay the message travels, as well as the source address of the sender. This information can be used to notify the senders service provider of the abuses, and usually results in termination of the source account. Automation software can be used to manage lists of open relays, lists of recipients and the success of message delivery. These tools are commercially available, and have enabled the distribution of unsolicited bulk email.

2.3.2 Disposable Accounts


One of the earliest and simplest ways to transmit spam was through Internet service providers (ISPs) that offered dial-up accounts. A customer was able to sign up for a dialup account using a credit card. Once an account was obtained, a customer could use it to send and receive mail. This process was easily abused by spammers, who used providers mail servers to transmit spam. Once complaints were received, an ISP could terminate an account due to violations of the terms of service. However, by the time complaints were received, an abuser would, most likely, already have disposed of the account. To maximize the effectiveness of email spam, a large volume of messages must be sent. With the deployment of broadband technologies, dial-up connections have fallen out of favour with spammers. Similar to dial-up accounts, web-mail accounts can also be considered disposable and be abused by spammers. Because of the typically high volumes of spam needed to achieve positive results, many accounts are required. Automation scripts can be used to sign up for several accounts, or to send messages through active accounts. Web-mail service providers now offer ways to prevent automated account subscriptions, such as by 3. United States Federal Trade Commission, Public Law 108187, Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM Act of 2003), 2003.

Spam Task Force Network and Technology Working Group

12

Anti-Spam Technology Overview

displaying a human-readable image of text that the user must decipher to confirm the validity of the account user.

2.3.3 Proxies
An SMTP proxy can give spammers anonymity, which is essential in avoiding prosecution. A proxy acts as a broker or intermediary between a client and their desired resource. In the case of email spam, a sender transmits a message to the proxy and the proxy then transmits the same message to the recipient. The recipient of the message knows only the address of the proxy, not the sender. Proxies have many legitimate uses, such as between corporate and public networks, and are typically used to temporarily store network resources or inspect the network traffic that travels them. Similar to open relays, there are also open proxies, which fulfill requests on behalf of any client that connects to them. Open proxies can result from poor software configuration or, more likely, from software installed by a malicious user. Open proxies can be installed by malicious users exploiting software vulnerabilities (this method is further examined in Section 2.3.4). One of the obvious uses for open proxies is in sending spam while concealing the senders identity. In order to identify a spammer behind a proxy, access to the proxy is often required. Proxies can also be chained together, adding another degree of difficulty for those trying to trace the source of the spam. Open-proxy lists are maintained, and network operators can use them to block incoming spam from these sources.4

2.3.4 Compromised Hosts


In order to find new hosts to relay spam messages, spammers have turned to new techniques. Many of these techniques have been used by malicious users for some time, but have only recently been exploited by spammers. A networked computer, or host, is considered to be compromised when a third party installs software, without the users knowledge, to control the host. These compromised hosts, also known as bots or zombies, can be used to launch distributed denial of service (DDoS) attacks or to transmit spam. There are several ways to compromise a host connected to a network. The simplest is to trick the user into installing malicious software (i.e. a Trojan program). These malicious programs are often found on peer-to-peer file-sharing systems or as email attachments.

4. http://opm.blitzed.org
Spam Task Force Network and Technology Working Group

13

Anti-Spam Technology Overview

A more advanced way of compromising a host is to exploit a software vulnerability that exists in either an operating system or application software. Once a vulnerability is found, a specific exploit needs to be developed. The exploit then needs a method of delivery. These can be simple or complex, and can involve custom network packets, infected email attachments, infected peer-to-peer file-sharing downloads, web-based applications, or any of many other methods. To protect hosts from these exploits, software must be maintained (e.g. through application of vendor-issued patches, antivirus definition updates, secure configuration, etc.) and network connections must be protected (e.g. with a firewall). Some Trojan software provides only a control channel that can be used as a conduit to install other malicious programs in the future without the users consent. In order for someone to control a group of compromised hosts, a communication channel needs to be established between the malicious user and the hosts. Often, these channels can be closed with a firewall and antivirus software can be used to remove the malicious software. An example of a virus that carried an open proxy as payload is SoBig, which increased email traffic in early 2003.

Spam Task Force Network and Technology Working Group

14

Anti-Spam Technology Overview

3. Anti-Spam Technologies
There are many solutions that can be used to combat the various types of email-based spam. Filtering messages based on their properties, blocking message senders, ensuring senders are authentic, and authorizing clients are all methods used to combat spam.

3.1 Message Filtering


In general, implementing message filtering is straightforward and does not require any modifications to existing email protocols. A good filter design will minimize the number of false positives (filtering of a message that is not spam) and maximize the efficiency of the filter. Filters simply prevent spam from entering inboxes, but do not stop the generation of spam. This section gives an overview of the common types of filters, including hybrid filters that use combinations of filtering methods.

3.1.1 Content Filters


There are many varieties of content filters, but, as the name suggests, they all simply filter based on the content of the message. The filter rules are normally defined for all local users on an MTA by a system administrator. Rules can be set for any content in the header, body or extensions of a message. The filter can be configured to analyze or parse the header for malformed fields, to parse the body of the message for spam-related content, or to examine message extensions such as attachments. Most content filters also have a high rate of false positives, especially when legitimate mail contains content similar to that in the filters rule set. The content filter rules must be constantly updated to stay effective. Spammers adjust their messages to bypass filters, and these bypass methods have led to oddly worded content and image-only spam.

3.1.2 Hashing Filters


Once enough spam has been observed, common elements can be collected. These common elements can be hashed to give a unique value that, in turn, is stored and used as a filter rule. When a hashing filter processes a message, the common elements are collected and hashed. The unique value is then used to determine if the common elements were previously categorized as spam; if so, the message can be filtered as spam. The filters can be bypassed, however, by inserting insignificant content into the message to disrupt the processing of common message elements.

3.1.3 Statistical Filters


As an improvement to content and hashing filters, statistical filters use rules to measure the frequency and patterns of email messages. The most popular statistical filter used for spam is the Bayesian filter. Bayesian filters calculate the likelihood of known elements combining with additional elements in order to obtain an overall likelihood ratio that can be used to categorize a message as being legitimate or spam.

Spam Task Force Network and Technology Working Group

15

Anti-Spam Technology Overview

Bayesian filters produce a low percentage of false positives and do not require their rules to be updated by an administrator.5 The filter adapts by monitoring what the user classifies as spam, and adjusts likelihood ratios accordingly. Methods spammers have used to bypass Bayesian filters include inserting random lowlikelihood elements in their messages. Insertion of these elements lowers the overall ratio so the message may not be filtered.

3.2 Address Lists


Address lists allow or deny messages based on a senders network address or domain. Similar to filters, address lists are a defensive measure against spam but cannot prevent its generation.

3.2.1 Domain Name System-Based Systems


DNS-based listing systems have become an essential tool in identifying hosts or network addresses that have been used for sending spam. These listing systems use DNS to create lists of network addresses, which can then be used to identify spam sources. To operate a listing system, an operator needs a domain. Under this domain, network addresses are listed as entries of the domain in reverse order. For example, for the address 1.2.3.4 the operator would list the entry in its domain as 4.3.2.1.domain.net. Clients who want to use the list will make a DNS query for a specific network address and send it to the operators DNS. If the host exists in the DNS records, appropriate action can be taken by the client for that address. The first system to use this method was called the real-time black hole list (RBL), maintained by Mail Abuse Prevention Systems (MAPS).6 The RBL initialism has since been used interchangeably for other variations of DNS-based listing systems. These similar systems each have their own advantages, such as only listing known open proxies or open relays. DNS-based lists must be constantly updated to reflect the ever-changing addresses used by spammers. These DNS-based systems can also block legitimate mail servers, however, if the servers meet the listing criteria. The process of removing legitimate servers should be straightforward and quick to resolve.

5. Kai Wei, A Naive Bayes Spam Filter, fall 2003 (www.eecs.berkeley.edu/~kwei/courses/cs281a/cs281a.pdf). 6. www.mail-abuse.com
Spam Task Force Network and Technology Working Group

16

Anti-Spam Technology Overview

3.2.2 Dynamic Users Lists


Dynamic users lists identify hosts in a network that do not have static network addresses and may change network addresses over a period of time. These dynamic network addresses are used in dial-up and residential broadband connections to simplify provisioning and allow for efficient use of the address space. Hosts with dynamic addresses do not typically run mail servers, but, in many cases, they violate the acceptable use policies of most service providers. Dynamic users lists are maintained either by the network operators themselves or by third-party organizations. Similar to other address lists, they must be rigorously maintained to remain effective.

3.3 Client Server Authentication


This section covers various methods used to authenticate clients who connect to mail servers or MTAs before these clients are able to send mail.

3.3.1 Simple Mail Transfer Protocol Authentication


In public networks, where trust can no longer be assumed, authentication, confidentiality and integrity have become necessities. SMTP authentication is an extension to the SMTP protocol, and ensures that clients are able to log in to a mail server. By itself, authentication does not prevent forging of a senders address, nor does it ensure the confidentiality or integrity of a message. The authenticated connection is normally established on a different transmission control protocol (TCP) port (i.e. TCP port 587) rather than on an open SMTP connection (i.e. TCP port 25). The authentication extension provides two authentication techniques, one for client-toserver (MUA-MTA) communications and the other for server-to-server (MTA-MTA) communications.7 In the first, before being permitted to send mail, a client is authenticated by entering a password associated with a given username. This method can also allow remote users to authenticate themselves to a mail server while they are in a remote location. The second method was only meant to be used in a trusted environment, and also made use of the authentication command. The command, when used between servers, indicates to the receiving MTA that the sender has been previously authenticated. It should also be noted that SMTP authentication is best used in combination with the transport layer security (TLS) extension. By using both authentication and TLS, SMTP sessions can be secured.

7. J. Myers, RFC 2554 SMTP Service Extension for Authentication, March 1999.

Spam Task Force Network and Technology Working Group

17

Anti-Spam Technology Overview

SMTP authentication can prevent spammers from gaining unauthorized use of mail servers. However, if a spammer is able to compromise a user account, they can then send messages without further restrictions. The original specifications allowed for the use of weak passwords, so it is essential to ensure that a strong password algorithm is used.8

3.3.2 Post Office Protocol Before Simple Mail Transfer Protocol


A weaker method than SMTP authentication is to ensure that clients first authenticate themselves with their mail-receiving protocol (e.g. POP). Once a client has been authenticated, the server will keep the clients network address and allow them to connect to the SMTP server.

3.3.3 Transport Layer Security


TLS or secure socket layer (SSLv3) can also be used to ensure a secure connection between a client and server. TLS can be seen in web-mail implementations that use a secure http connection. It is also used for the IMAP protocol for secure mail transfer between client and server. TLS can also be used with SMTP implementations to allow secure transmission of mail by both MTAs and MUAs. TLS is also referred to by its command, STARTTLS, which is typically used in conjunction with SMTP authentication.

3.4 Packet Filtering and Inspection


Packet filtering and inspection is a very large field and applies to many areas other than spam. This section addresses the use of packet filtering and inspection with respect to spam.

3.4.1 Simple Mail Transfer Protocol Egress Filter


The most common sources of spam are zombies, or compromised machines. Zombies can send spam using a clients connection without the knowledge of the client. To block such traffic, an egress filter filters all unwanted traffic that originates with a client. An SMTP egress filter can be used to block outbound connections from hosts within a network to external mail servers. Egress filters can be useful when a compromised host tries to send spam by connecting to external MTAs. If a connection attempt is blocked by an egress filter, the spam messages will not be able to reach their destination. This method reduces spam-email traffic on the network and can help stop spam before it reaches its intended recipients. Blocking connections can be controversial, however, and should not prevent legitimate traffic from being transmitted.

8. Ibid.
Spam Task Force Network and Technology Working Group

18

Anti-Spam Technology Overview

3.4.2 Firewalls
Firewalls can prevent the unauthorized sending of mail by infected hosts, and can help protect insecure hosts from becoming infected. A firewall can allow or deny inbound or outbound connection attempts by or to a host. For instance, most network worms spread by sending connection attempts to random hosts. If an incoming connection attempt is blocked by a firewall, the host will be protected and the worm will be unable to infect the host. Firewalls can also be used to allow or deny outgoing connections (i.e. egress filtering) and can be applied not just to SMTP, as described in the previous section, but to any service. If a host is compromised and is being used to transmit spam, a host-based firewall can also alert the user of the activity.

3.4.3 Traffic Monitoring and Rate Limiting


Another option instead of blocking connections is to monitor and limit the volume of traffic in a network. This is called rate limiting. Spammers who compromise systems often send large volumes of messages, resulting in abnormal traffic patterns. Traffic monitoring can be done at any point in the transmission path of a message and can be used to combat spam by observing the traffic flows of various hosts in a network. Once a host has been identified as having abnormal traffic patterns, its owner can be notified and the abnormal traffic can be limited or blocked. Limiting reduces the available bandwidth and thereby restricts the rate at which the host can send messages. One advantage of this technique over blocking is that legitimate traffic can still flow, even if at a lower rate.

Spam Task Force Network and Technology Working Group

19

Anti-Spam Technology Overview

4. Emerging Technologies
4.1 Domain Authentication
Domain-authentication technologies are used to ensure that a senders domain is not forged or spoofed. Because of the openness of SMTP, a sender is able to forge another senders identity. This protocol weakness is frequently exploited by spammers. To avoid prosecution from legal authorities or termination of service from an ISP, spammers must remain anonymous. Protocol enhancements are, therefore, necessary to ensure the authenticity of a senders address. The following section discusses these enhancements. The primary stakeholders behind the technologies are AOL for the sender policy framework (SPF), Microsoft for Sender ID, Yahoo! for Domain Keys, and Cisco Systems for Identified Internet Mail (IIM).

4.1.1 Sender Policy Framework


SPF has a large user base and has become generally accepted as the current method of sender authentication. Similar technologies that have been in competition with SPF include reverse mail exchange (RMX) and designated mailers protocol (DMP). All of these solutions use DNS to help authenticate senders addresses. The main goal in implementing a sender-authentication scheme is to have a single solution in order to ensure global adoption. The need for a single, global standard has been the cause of much debate in the technology-development community. The Internet Research Task Force (IRTF) originally formed a research group on anti-spam technologies. This group brought forth several draft specifications to the Internet Engineering Task Force (IETF). Once the need for a single solution became apparent, the MTA Authorization Records in DNS (MARID) working group was formed and was tasked with developing the next version of SPF. However, the working group was unable to agree on a common method. As a result, MARID was closed on September 22, 2004.9 Throughout the debates, SPF version 1, or SPF Classic, has gained greater acceptance among ISPs as a method for validating message sources. SPF authenticates a senders domain by using a reverse look-up on an MX record in the DNS. The process is similar to the one that uses MX records to locate a recipients mail server. A receiving mail server uses a senders domain and makes a DNS query for that senders domain. The response to the query from the domain-name server contains the addresses it allows to send mail. The receiving MTA uses the SPF record to verify that the sending address is a valid mail source. If the addresses cannot be verified, it is assumed that the sender has forged the address, and the message can be filtered on that basis.

9. www.imc.org/ietf-mxcomp/mail-archive/msg05054.html
Spam Task Force Network and Technology Working Group

20

Anti-Spam Technology Overview

One known problem with SPF happens when an MTA forwards mail on behalf of a recipient. The domain of the original sender must be passed through without modification, so that the verification will not fail. The sender rewriting scheme (SRS) has been one way of solving the forwarding issue for SPF; the other has been Sender ID. A general concern is that spammers may still be able to register SPF records for domains that can be used to send authenticated spam email. In this case, the spam may reach the recipient if alternative methods are not in place, but the messages source remains known. If the source of a message is known, methods such as blacklisting, registrar notification and legal prosecution can be used. Disposable domains can let spammers send authenticated messages and then simply register a new domain once the abuse has been detected. This problem has not yet been solved, but a partial solution may be domain accreditation.

4.1.2 Sender Identification


Microsoft originally developed the Caller ID for E-mail proposal for sender authentication as an alternative to SPF. Once the need for a global solution was understood to be essential, technology developers, with the help of the MARID working group, tried to merge the two projects into a new draft specification called Sender ID. The Sender ID proposal is backwards-compatible with SPF. When a recipient receives a message, the senders from address is verified by checking a DNS record for that domain. The information in that record is used to ensure that the senders mail did originate from the purported domain. To address the forwarding issue that was identified in SPF version 1 (SPFv1), a purported responsible address (PRA) was introduced into the draft. The PRA algorithm within the Sender ID draft contained licensing terms for intellectual property held by Microsoft. Some open-source projects were unable to agree to the license terms and, therefore, could not support the draft. Since unanimous consent could not be obtained for the Sender ID proposal, the efforts of the MARID working group were discontinued.10 In November 2004, Microsoft presented the Sender-ID proposal at the U.S. Federal Trade Commissions Email Authentication Summit. The proposal has now been endorsed by AOL, which had supported SPF following the closure of the MARID working group.

4.1.3 Domain Keys


Domain Keys uses a different method for authenticating senders than do Sender ID and SPF. To ensure that users are able to trust the authenticity of a sender, Domain Keys signs all messages with a cryptographic key specific to the domain.

10. www.imc.org/ietf-mxcomp/mail-archive/msg04673.html
Spam Task Force Network and Technology Working Group

21

Anti-Spam Technology Overview

The Domain Keys method uses an asymmetric key algorithm with public and private keys. The sending MTA requires a Domain Keys-enabled MTA, which uses a private key to sign the message. Upon receiving a signed message, the receiving MTA looks up the senders domain to find the public key for that domain. The public key is then used to verify that the signature of the sender is valid. If the signature proves to be valid, the recipient can be sure of the senders domain. As with other domain-authentication protocols, only the MTAs have to support the technology.

4.1.4 Identified Internet Mail


The proposed Identified Internet Mail (IIM) solution is similar to the Domain Keys approach, but does not depend on DNS to provide the keys. Instead, it uses a keyregistration server that is linked through DNS. The key-registration server allows for the authentication of hosts or groups of hosts, not just the domain. The IIM protocol is now being drafted, along with Domain Keys, by the mail-signatures draft working group of the IETF.

4.2 Internet Protocol Version 6


The current version of IP, IP version 4 (IPv4), has a limit to the number of addressable hosts that can be supported in the network. Many of these addresses have been allocated, and shortages have become apparent. A new version of IP, IP version 6 (IPv6), has been designed with substantially more address space. To take advantage of the increased network size and additional features, IPv6 has begun to be deployed, notably in the AsiaPacific region. As IPv6 is generally deployed, applications for mail servers and clients will need to support IPv6. The software must be able to correctly interpret the larger addresses that will be used to identify MTAs. Most software applications are capable of handling IPv6 addresses; however, some may need to be upgraded. Similarly, the systems that email depends on, such as DNS, SMTP gateways and filters, will also need to be able to support IPv6. The domain-based authentication lists that are used to block abusive senders will need to be modified to include IPv6-enabled senders. As spammers and their recipients move to IPv6, there will be an increasing need for IPv6-enabled anti-spam measures.

4.3 Presence
The concept of presence is being used to provide location-aware services for IM and VoIP applications. Information on a users state and geographical location can be accessed by these applications. This information must be dealt with carefully, however. Some wireless spam already exploits this information by using location-specific advertising for example, in airports. If the information is not properly secured, both IM and VoIP technologies could be vulnerable to abuse by spammers.

Spam Task Force Network and Technology Working Group

22

Anti-Spam Technology Overview

Discussion
With the volumes of spam increasing, a more coordinated approach is needed to combat spam. Several network operator industry associations have developed recommendations and best practices to combat spam. One of the first to put forth recommendations was the Anti-Spam Technical Alliance, which is supported by large-network operators and service providers.11 Their recommendations address known issues in limiting the conventional sources of spam, such as open relays, proxies and compromised hosts. The recommendations are seen as largely beneficial, and if implemented, increase costs for the senders of spam. It has yet to be seen if a single sender-authentication method can be globally adopted and, if so, whether it can decrease the volume of spam that is on the network today. As mentioned in Section 4.1.2, the IETFs MARID working group was not able to come to a consensus on a single, global method. As a result, discussions continue about various sender-authentication methods. Other emerging technologies, such as cryptographic signature methods like IIM, may still prove to be a better solution. However, the most widely adopted and currently available method of sender authentication continues to be the classic SPF, SPF version 1. Spam will only stop once the methods for transmitting spam through given media are no longer cost-effective. The costs of sending spam using SMTP will increase with the technical measures discussed here. In the future, spam will migrate to other, more costeffective technologies, such as IM or VoIP.

Conclusion
The current drive to fight spam has increased the costs of sending email-based spam, and has produced methods that can be applied to other media. Anti-spam technologies must continually be developed and implemented in a coordinated way in order for them to be effective. Emerging technologies have helped remove anonymity from spam, and can also be applied to media other than email. These technologies, when properly used, should also benefit other anti-spam measures, such as legal action. Even with the use of advanced anti-spam technologies, an insecure host can easily be used to sidestep many of these measures. This issue must be addressed through education and the increased awareness of common users. To lessen the burden on users, technological solutions must be as transparent as possible to them.

11. Anti-Spam Technical Alliance, Anti-Spam Technical Alliance Technology and Policy Proposal, June 2004 (http://docs.yahoo.com/docs/pr/pdf/asta_soi.pdf).

Spam Task Force Network and Technology Working Group

23

Anti-Spam Technology Overview

The descriptions of the anti-spam technologies discussed in this paper are intended to provide an overview of current technological methods. New issues that arise will be included in subsequent revisions of this living document.

References
Anti-Spam Technical Alliance, Anti-Spam Technical Alliance Technology and Policy Proposal, June 2004 (http://docs.yahoo.com/docs/pr/pdf/asta_soi.pdf). J. Klensin, RFC 2821, Simple Mail Transfer Protocol, April 2001. J. Klensin, N. Freed, M. Rose, E. Stefferud and D. Crocker, RFC 1869, SMTP Service Extensions, November 1995. J. Lyon, Purported Responsible Address in E-Mail Messages, August 2004 (http://draft-ietf-marid-pra-00.txt). J. Myers, RFC 2554 SMTP Service Extension for Authentication, March 1999. P. Resnick, RFC 2822, Internet Message Format, April 2001. United States Federal Trade Commission, Public Law 108187, Controlling the Assault of Non-Solicited Pornography and Marketing Act of 2003 (CAN-SPAM Act of 2003), 2003. Kai Wei, A Naive Bayes Spam Filter, fall 2003 (www.eecs.berkeley.edu/~kwei/courses/cs281a/cs281a.pdf).

Spam Task Force Network and Technology Working Group

24