Reverse MX: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>Hadmut Danisch
imported>Hadmut Danisch
Line 113: Line 113:
To overcome this problem and to be able to at least perform experiments easily, the IRTF working group proposed to use (existing and implemented) TXT resource records, and to use plaintext instead of binary encoding, allowing to use any DNS servers and resolvers without update. This was named SPF (see below). However, this method significantly increased the length of DNS answers and could easily exceed the size limit for DNS UDP packets, forcing DNS into the much slower TCP mode (which is not even allowed by many firewall configurations).  
To overcome this problem and to be able to at least perform experiments easily, the IRTF working group proposed to use (existing and implemented) TXT resource records, and to use plaintext instead of binary encoding, allowing to use any DNS servers and resolvers without update. This was named SPF (see below). However, this method significantly increased the length of DNS answers and could easily exceed the size limit for DNS UDP packets, forcing DNS into the much slower TCP mode (which is not even allowed by many firewall configurations).  


=== Design flaws of the SMTP email protocol ===
=== Problems of the SMTP e-mail protocol and its implementations ===
 
During the experiments made for the Anti Spam Research Group of the IRTF and IETF, further complications showed up. Although the protocol definitions in RFC 821 and RFC 2821 seem to be rather precise, they were not precise enough, and too many SMTP implementations did not conform exactly to the specs. Especially the different ways to forward e-mail from the first recipient's mailbox to other mailboxes, a common practise in the internet, caused problems. Usually, mail forwarders simply replace the recipient's envelope address (and sometimes the address given in the header's To: or Cc: field), but not the sender's address. That way they act on behalf of the original sender without the sender's knowledge and approval, and without taking any care about how the final recipient could verify the origin or trust the forwarder. Technically, there is no difference at all between a forwarder and a spammer forging the sender address, acting the very same way as a forwarder. The process of forwarding of e-mail had never been precisely defined.
 
Another problem was that the RMX principle primarily focusses on and verifies the the envelope sender address. While most traditional mail transport and user agents in the Unix environment preserve and display the envelope sender address (particularly in the first "From ..." line in the mbox format), some commercial e-mail systems turned out to not be completely compatible with the SMTP world. Some of them internally used proprietary message and storage formats, that did not support the SMTP idea of having and storing separate header and envelope addresses, making it impossible to verify an envelope sender address that is not preserved.
 
Further discussion showed that due to lack of a sufficient precise and enforced definition of the e-mail protocols, the worldwide e-mail infrastructure had grown uncontrolled and developed a huge variety of flavors, dialects, and versions of SMTP, its extensions, and background mechanisms. Maybe this can be compared to the unmanageable difference in HTML formats and interprations we had for several years with the browser types (until stricter definitions had been published).
 
Spam and forgery could be seen as trivial exploits of fundamental security flaws of SMTP. IT seems as if the deployed worlwide e-mail infrastructure can neither easily be fixed nor replaced with or updated to better and more secure protocols. Most probably, the worldwide internet infrastructure has grown that far out of any central control, that the problem of spam and forgery cannot be fixed in reasonable time and on a worldwide scale anymore. Maybe the next chance to fix these problems will be when SMTP in common or even the Internet protocol itself will be replaced by more modern and secure protocols.


=== SPF and IRTF/IETF-specific reasons ===
=== SPF and IRTF/IETF-specific reasons ===

Revision as of 19:06, 3 December 2009

This article is a stub and thus not approved.
Main Article
Discussion
Related Articles  [?]
Bibliography  [?]
External Links  [?]
Citable Version  [?]
 
This editable Main Article is under development and subject to a disclaimer.

Reverse MX (RMX) is an email authentication method developed by Hadmut Danisch. It became a basis for the two most commonly used methods, Sender Policy Framework and Sender ID.

Development of RMX

Background and motivation

Between 1990 and 1998, the author of RMX, Hadmut Danisch, was working as a security researcher and system administrator at the European Institute for System Security (E.I.S.S.) at the University of Karlsruhe, Germany. Subject of research at the E.I.S.S. were both cryptographical (e.g. RFC 1824) and non-cryptographical methods (such as early firewall technology), with special focus on authentication and communication security. As a demonstration of a so called 'organizational security measure', Danisch had developed a scheme to prevent forgery of SMTP e-mail sender addresses, implemented as a complex and recursive rule set for sendmail. The basic idea was to perform a recursive sequence of database lookups with both the sender's IP address and given e-mail address after the 'MAIL FROM' command in the SMTP protocol. The first matching database lookup would tell whether to accept or to decline the message for delivery. At that time, the system was working well under lab conditions and in experimental implementations, but did not yet have a particular purpose, except for the demonstration of security technologies.

Around 1996/1997 the system became practically useful when suddenly there was an increasing number of spam messages began to fill and jam the mail system and mailboxes. While spam messages had formerly been seen merely on Usenet, spammers now began to systematically collect e-mail addresses and send spam by e-mail. At first, spammers used their real sender addresses, which were blacklisted soon. Later they used random sender addresses, but again this could easily be filtered by querying whether the sender's domain has a valid MX record. Then spammers started to use real domains to forge sender addresses, thus bringing the first spam storms that could not be systematically detected and blocked by SMTP-based mailsystems. Mailboxes started to contain more spam than regular mail. Since the number of registered domains and personal e-mail contacts was rather small and surveyable at that time, the sendmail ruleset drastically reduced the amount of spam coming into the institute's mailboxes, once the most important sender domains and their legitimate sender machines had been put in the local database (and optionally been whitelisted).

In 1998, Danisch left the E.I.S.S. and became a security consultant at the first german internet provider, soon beeing confronted with the increasing number of harsh complaints of commercial internet customers, who's leased lines had been jammed by spammers, and who had, on top of that, been billed for the spam traffic. At that time, internet traffic was expensive and billed on a volume base, and spam could easily increase the costs by ten times or even more, so a technical solution was urgently needed. Although - and because - the provider had bursted with the dot-com bubble in spring 2002, Danisch was still busy with finding a technical solution.

Beyond that, there was an ongoing intense dispute with the University of Karlsruhe about the nature, the basics, and the principles of IT security in common, and the asserted necessity of cryptography in particular. Therefore, a hard and large scale technical proof of concept in the real world outside the university labs was needed to prove that real, robust, and easy to use security can be achieved by organizational methods without cryptography.

Design criteria

The development of RMX was based on the following design criteria:

  • The main strategy to fight spam was to not try to detect spam by content, but to prevent forgery of sender addresses in order to enable domain based black- and whitelisting, reputation databases, easier tracking of spammers.
  • There should be no governmental or centralized measures. Measures should be implemented and controlled by sender and receiver of a message without interaction with any third party.
  • The mechanism should be smart, simple, and robust, easy to understand, easy to test and debug, easy to repair. Common knowledge typical for mail and domain administrators should be sufficient to work with the system.
  • The system should be cheap and durable, no costs, no fees, no costly subscriptions and updates.
  • The system should be implemented on the mail relays (MTA) without the need to interfere with the end user mail programs (MUA).
  • The system should avoid cryptography for several reasons:
    • The system should in principle be resistant against bots, hackers, and malware. Since this obviously cannot be achieved in common, because an authorized and infected system can by design send spam, the transport of spam should be immediately and reliably be terminated once the system is cleaned or taken down after notification (in contrast to, e.g., cryptographic systems, where secret keys could have been leaked and fast methods to revoke and replace keys where required).
    • Cryptography is secure on small scale use and within closed user groups only. There are about a billion of internet users world wide and millions of domains. Even if only 0.1% of the secret keys of these domains were compromised (which is an extremely low estimation), it would still mean thousands of domains that can still be used by spammers.
    • Cryptography requires the key holders to cooperate and to behave well. There is no way for a receiver to detect whether the sender intentionally revealed his key to spammers.
    • It must be possible to easily and immediately verify the integrity of the system (in contrast to, e.g., cryptographic systems where you cannot verify that no unauthorized party has knowledge of a secret key).
    • It must be easy to implement by programmers, no skills required beyond regular internet programming.
    • It should be available all over the world, even in those countries that ban the use of cryptography or the possesion of secret keys. The system should not use cryptography or provide any infrastructure that could be used for secret communication, that could prevent the world wide use and legality of the system. No country should be excluded.
    • It was meant as a proof of concept for non-cryptographical security methods.
    • Cryptography is error prone and complex. Keep in mind that even today, more than 30 years after the invention of public key cryptography, there is still no common use of cryptography, and the use and configuration of those services, that are in use such as HTTPS, IPSec, PGP, X.509, S/MIME, are complicated to use and require experts or specially trained users. Any cryptography based system would necessarily mean lots of misconfigurations and lots of lost legitimate mail. Even if it worked, the overhead of using cryptography makes it expensive.

The concept of reverse MX records

The old sendmail based system suffered from a major shortcoming. It was the receiver's task to continuously keep his database of the systems authorized to send mail from a given domain up to date. This is unfeasible, since the receiver cannot know which system the owners of all domains consider as authorized to send mail from their domains, and it would require continous maintenance.

The basic idea was that the owner of a domain publishes a list of all systems authorized to send e-mail from that domain. The receiver would query this list when receiving a message or on a regular base and verify, whether the sending machine is authorized to do so. This is a typical task for a distributed directory service, and the only world wide established domain-based distributed directory service is currently DNS.

DNS is already used for mail delivery, it lists the authorized receivers in the MX record for a given domain. A first approach would be to use the MX records for senders as well, but at a first glance it can be seen that sending and receiving machines are not identical, and MX records cannot be used for senders at the same time. Furthermore, sender lists are far more complex than receiver lists and can include large networks, e.g. a full /8 address range (or even more with IPv6). While a list of receiving machines can be just a list of some machines, the list of sending machines can contain network address ranges.

Another problem is the problem of delegation to remote networks, such as e-mail service providers. In the case of MX records, delegation is achieved by using a symbolic name of a domain under control of the delegate, who can then assign arbitrary IP addresses. In case of the sender, the delegation of complete lists of different network ranges would be required. So none of the existing record types would have these properties, and a new record type was designed to list the authorized senders of a domain. Since it does the reverse task of an MX record, it was called Reverse MX Record (RMX).

Publication at the IRTF and IETF

By incidence and due to the dramatical increase of spam sendings and the lack of solutions, the IRTF had opened a research group and a mailing list, which remained completely silent. In fact, the announcement of the first RMX task was the very first message on that mailing list, and initiated an extreme intensive and longlasting debate and discussions on the mailing list. Thus RMX was the starter of the IRTF and IETF research on spam fighting and the first approach to establish a world wide spam and fraud protection system. Until then, only commercial solutions limited to some customers, basically subscriptions of lists of spam patterns, existed, comparable to virus filters.

Technical description

Overall principle of RMX

The idea of RMX is to detect forging of sender addresses used in e-mails and their transfer. While RMX can be used for both the sender's address given in the e-mail header (RFC 2822) and the envelope address (RFC 2821), RMX was intended primarily for and would be most efficient with the envelope sender address. After transmission of the MAIL FROM SMTP command, the receiving mail relay would perform a DNS lookup for the RMX record(s) for the DNS name that was used as the domain part (what's on the right side of the @). These RMX records would describe a list of machines authorized to send e-mail with that given sender domain. In other words, the owner of a domain can publish a statement about where e-mail from his domain can legitimately come from. If the sending machine is listed in the RMX record, the mail is accepted. Otherwise, depending on the receiver's policy, the mail could be rejected right after the MAIL FROM command, or tagged as spam. In simple words, MX records tell, where mail for a given domain should go to, and RMX records tell, where mail from a given domain should come from.

To be precise in terms of security science, RMX is an authorization scheme, and DNS is it's database. The identity of an entity is it's IP address, and the underlying (hidden) authentication method is the TCP handshake needed to establish a TCP connection for SMTP. Due to the TCP sequence number used in that handshake, it is difficult to forge the sender's IP address. While this is not a strong security mechanism, it is cheap, robust, and appropriate to reduce spam.

Implementation as binary DNS records

When using DNS as a directory service, it is important to keep DNS records as small as possible, since there is a (historical) limit of 512 bytes (not counting the IP or UDP headers; see section 4.2.1 in RFC 1035). Although DNS allows to use TCP for DNS queries and replies as well, TCP considerably slows down DNS, and many firewall rulesets would not let DNS TCP traffic through. Therefore, there are two methods to keep DNS records as small as possible: The compact binary encoding of the records, where the encoding scheme is determined by the resource record type and type number, and a compression scheme for DNS names. In contrast to its successors like [SPF], which used TXT records and plain text, RMX was originally designed to use both methods, the compact and DNS-like binary encoding, and the DNS compression scheme for DNS names. Due to technical flaws in DNS explained below, the use of compression had to be omitted.

Structure of RMX records

An RMX record is basically a simple list of entries (a sequence). Since DNS allows a domain name to have more than one record of the same type, it could have more than one RMX record. This is interpreted as a concatenation of these records into one large RMX record. However, DNS does not maintain the order of records, and therefore multiple RMX records must be avoided unless order does not matter. As usual with common security rulesets, these entries are processed in sequence, and the first matching entry determines whether the sender, identified by its IP address, is authorized or not. To do so, every entry can be negated. Positive entries declare permission, negated entries declare non-permissions. These entry types were supported:


unused
This domain will not be used for sending e-mails, no sender can be authorized.
ipv4 and ipv6
The entry contains an IP address and a CIDR mask length, to authorize a single IP address or address range.
DNS hostnames
The entry contains the DNS name, which has to be resolved to its A or AAAA record in a subsequent DNS query. This allows to delegate authorization to someone else and to authorize machines with dynamically assigned IP addresses (with DynDNS).
APL reference
RFC 3123 introduces DNS APL records, which are basically lists of IP address ranges. This also allows to delegate authorization to someone else (e.g. an e-mail service provider or a company branch).
Domain member
This entry type does not take parameters. It means, that the reverse (and verified) DNS name of the sender's IP address belongs to the domain, thus effectively authorizing all machines that have a domain name of the same domain.
Full Mail Address Lookup
This entry type does not take parameters. It means, that the RMX lookup is to be done with the full sender's email address instead of the domain part only to allow per-address granularity.
MX Reference
This entry type does not take parameters. It means, that all MX hosts of the domain are also authorized to send mail.


In addition, RMX had some experimental entry types:


TLS fingerprint
This authorizes a sending machine that initiated SMTP with TLS encryption, identifying itself with a client certificate matching the fingerprint.
TLS and LDAP
Verify the sender's certificate with an LDAP lookup.
SASL
The sender has to authenticate through SMTP SASL mechanisms.
PGP or S/MIME
Accept messages with a content signed by the given key (which differs from all other entry types in that it authorizes the content, not the sender).

Limitations of the Reverse MX approach

Reverse MX is actually not a method to detect spam. It does, by design, not (except for the experimental entry types based on signatures) take any part of the e-mail message itself (except for - optionally - the From: header entry) into consideration. RMX is solely based on the IP address of the sending relay and the sender's envelope e-mail address as given in the SMTP MAIL FROM command. That way, RMX can deny a message before transmission. In contrast to many other anti-spam proposals like greylisting, content-based heuristics, or external blacklists, RMX is fast, precise, predictable, and easy to debug. On the other hand, RMX has serious limitations:


  • RMX provides protection against forgery only. Unforged messages will never be blocked or marked. Spammers who legally use their own domains are not affected by RMX.
  • Since RMX is an authorization scheme only and is based on the (weak and cheap) TCP handshake authentication of the sender's IP address, it does not protect against the forgery of the IP address itself or other ways to take it over. Installing malware like bots or TCP relays on an authorized machine will still allow to send messages on behalf of the alleged sender.
  • Since RMX relies on DNS, it is vulnerable against DNS attackes like cache poisoning.
  • With the RMX records, RMX reveals the IP addresses of the sending machines to the public. If these addresses should be kept secret, either a central relay should be used or other measures should be negotiated with those authorized to know the sending SMTP addresses. Alternatively, symbolic DNS names can be used that can be resolved over the local hosts files only.
  • Because of this sort of unforged spam, RMX requires (and enables) additional black- or whitelisting. Black- and whitelisting require protection against forgery for beeing effective and reliable.
  • Automatic mail forwarding has become very common in the internet. The recipient's relay then rewrites the envelope recipient address and forwards the message as if it were a regular part of the delivery chain. Since this way of forwarding is undistinguishable from forgery of the sender address, RMX breaks this simple forwarding, but still allows forwarding in principle. When forwarding, the forwarding relay must verify the RMX entry of the message's origin. The final recipient's relay must have the forwarding relay whitelisted as trusted in common, rely on it having RMX previously verified, and bypass its own RMX routines.
  • RMX can be effective even when only a very limited number of domains support it. E.g. a mailbox can whitelist all messages from a given domain, if its RMX records prevent forgery. But to become effective against spam in common, RMX entries would have to be used on a world wide base, or alternatively for all hosts of a given top level domain (like .com). An important prerequisite would be that mail relays could deny mails from domains without an RMX entry or with unplausible wide RMX entries (like allowing mail from 0.0.0.0/0).
  • RMX does not really support domains where a large number of users sends e-mail from their personal computers outside a central area, and thus from remote and changing IP addresses. Such a domain would basically authorize anyone to send e-mails and thus defeat the idea of sender verification based on IP address and make spamming easy. RMX protects the traffic between domains, but relies on the fact that the domain authority itself does the job of authentication and protecting against abusive mails. Such domains should therefore maintain central mail relays where users or machines deliver their outgoing messages to and authenticate with any method chosen by the domain administration, like SMTP password authentication.

Reasons for failure

While RMX technically worked exactly as expected in the lab, it never succeeded in reality for various reasons. More can be learned from why RMX failed and security mechanisms in general fail than from the way it actually worked.

Shortcomings of the DNS design

DNS was designed between 1983 and 1987 (RFC 882 and RFC 883), and it's design matched the state of the art of programming of that time. Although DNS was principally extensible with new record types, DNS lacks the ability to transport and process new data types without beeing explicitely extended for these data types (in contrast to e.g. LDAP, which can be extended to new objectClasses by configuration and handles all data encoded as ASN.1). As a consequence, four major design and implementation details prevent DNS from beeing easily extended:

  • Authoritative DNS servers need to be explicitely extended to be able to handle new DNS resource record types. Without a software update, a DNS server will not be able to serve new record types. Upgrading the internet's DNS infrastructure to new software within limited time is virtually impossible.
  • DNS cache and resolver implementations originally did not forward unknown record types, and would require software update for the same reason. Only newer implementations are able to forward unknown record types.
  • DNS UDP messages are by design limited to 512 byte. Since RMX records would tend to be rather complex and to refer to other record types such as A, MX, or APL records, which could be transported in the same message as additional answers. It is therefore essential to keep messages short by making use of the DNS name compression scheme. Unfortunately, compressed records require decompression and recompression for caching and forwarding, and thus binary de- and reencoding. Even if a DNS server would forward unknown new record types untouched, it would break the compression and make the record unusable.
  • Client side DNS implementations do not support querying arbitrary record types through the standard API, and tie each known record type to a particular API function such as gethostbyname(). Implementing the client side for new record type would require either an extension of the C/Unix or POSIX runtime environment, or move the implementation details into the application program (usually any mail transfer agent) and cause incompatibilies and use of undocumented DNS functions.

While the correct and intended way to transport new types of data in DNS would be to define new record types, these characteristics of the DNS make it more than difficult, to deploy on a world wide scale. Even for experimental purposes it would require all DNS servers and libraries involved in the tests to be patched and recompiled every time the record type definitions change.

To overcome this problem and to be able to at least perform experiments easily, the IRTF working group proposed to use (existing and implemented) TXT resource records, and to use plaintext instead of binary encoding, allowing to use any DNS servers and resolvers without update. This was named SPF (see below). However, this method significantly increased the length of DNS answers and could easily exceed the size limit for DNS UDP packets, forcing DNS into the much slower TCP mode (which is not even allowed by many firewall configurations).

Problems of the SMTP e-mail protocol and its implementations

During the experiments made for the Anti Spam Research Group of the IRTF and IETF, further complications showed up. Although the protocol definitions in RFC 821 and RFC 2821 seem to be rather precise, they were not precise enough, and too many SMTP implementations did not conform exactly to the specs. Especially the different ways to forward e-mail from the first recipient's mailbox to other mailboxes, a common practise in the internet, caused problems. Usually, mail forwarders simply replace the recipient's envelope address (and sometimes the address given in the header's To: or Cc: field), but not the sender's address. That way they act on behalf of the original sender without the sender's knowledge and approval, and without taking any care about how the final recipient could verify the origin or trust the forwarder. Technically, there is no difference at all between a forwarder and a spammer forging the sender address, acting the very same way as a forwarder. The process of forwarding of e-mail had never been precisely defined.

Another problem was that the RMX principle primarily focusses on and verifies the the envelope sender address. While most traditional mail transport and user agents in the Unix environment preserve and display the envelope sender address (particularly in the first "From ..." line in the mbox format), some commercial e-mail systems turned out to not be completely compatible with the SMTP world. Some of them internally used proprietary message and storage formats, that did not support the SMTP idea of having and storing separate header and envelope addresses, making it impossible to verify an envelope sender address that is not preserved.

Further discussion showed that due to lack of a sufficient precise and enforced definition of the e-mail protocols, the worldwide e-mail infrastructure had grown uncontrolled and developed a huge variety of flavors, dialects, and versions of SMTP, its extensions, and background mechanisms. Maybe this can be compared to the unmanageable difference in HTML formats and interprations we had for several years with the browser types (until stricter definitions had been published).

Spam and forgery could be seen as trivial exploits of fundamental security flaws of SMTP. IT seems as if the deployed worlwide e-mail infrastructure can neither easily be fixed nor replaced with or updated to better and more secure protocols. Most probably, the worldwide internet infrastructure has grown that far out of any central control, that the problem of spam and forgery cannot be fixed in reasonable time and on a worldwide scale anymore. Maybe the next chance to fix these problems will be when SMTP in common or even the Internet protocol itself will be replaced by more modern and secure protocols.

SPF and IRTF/IETF-specific reasons

Microsoft's role

Economical reasons

Freedom of speech and cultural reasons

Education

Perceptions of identity and juristic reasons

Personal and European reasons

References