User:David MacQuigg/Sandbox/MailTransfer: Difference between revisions

From Citizendium
Jump to navigation Jump to search
imported>David MacQuigg
No edit summary
imported>David MacQuigg
No edit summary
Line 1: Line 1:
== Talk ==
== Talk ==
The challenge in this article is to introduce a topic that has a huge amount of detail without overwhelming the non-expert reader.  We need to avoid the "written by committee" style, where every contributor gets to squeeze in a few facts that he considers important.  Luckily, we have an authoritative reference (RFC-5321) which covers all the details in 94 pages.  We will present just those details that are needed for a coherent presentation of the basics, and refer the reader needing more to the RFC, or to the Wikipedia article, which has many more facts.
The challenge in this article is to introduce a topic that has a huge amount of detail without overwhelming the non-expert reader.  We need to avoid the "written by committee" style, where every contributor gets to squeeze in a few facts that he considers important.  Luckily, we have an authoritative reference (RFC-5321) which covers all the details in 94 pages.  We will include just those details that are needed for a coherent presentation of the basics, or that are interesting enough to outweigh the burden of including them.  The reader needing more facts can also go to the Wikipedia article, which is a lot more verbose than this one.


Terminology is also a challenge.  Should we use the same terms the experts use (MTA, Reverse Path, etc.) or terms that are more meaningful to non-experts (Mail Relay, Return Address, etc.)?  We have chosen the latter, because our articles are intended for non-experts.  Experts will have no trouble understanding what we mean, as long as we avoid mis-using any of their special terminology.  We will capitalize terms that we intend to have a special meaning (e.g. Relay instead of relay).
Terminology is also a challenge.  Should we use the same terms the experts use (MTA, Reverse Path, etc.) or terms that are more meaningful to non-experts (Mail Relay, Return Address, etc.)?  We have chosen the latter, because our articles are intended for non-experts.  Experts will have no trouble understanding what we mean, as long as we avoid mis-using any of their special terminology.  We will capitalize terms that we intend to have a special meaning (e.g. Relay instead of relay).


== Subtopic: Message Transfer ==
== Email System > Message Transfer ==
This subtopic provides a brief explanation of the Simple Mail Transfer Protocol (SMTP) used to move email messages across the Internet.  We will assume the reader has a basic understanding of the [[Email System]].  See RFC-5321 [Klensin08] for the latest revision of this Internet standard.
This subtopic provides a brief explanation of the Simple Mail Transfer Protocol (SMTP) used to move email messages across the Internet.  A complete explanation of SMTP is found in RFC-5321 [Klensin08].  We will assume the reader understands the basic operation of the email system and the role of Mail Relays, as described in the [[Email System|parent article]].


Message transfer is done by establishing a [[TCP]] connection between a Client machine initiating the transfer and a Server machine receiving the message.  The Client may be a [[Relay]], or it may be the email program running on a user's machine.  All transfers are done using SMTP, except the last transfer to a recipient's machine, which uses the [[POP]] or [[IMAP]] protocols.
Message transfer at each "hop" is done by establishing a [[TCP]] connection between a Client SMTP process initiating the transfer and a Server SMTP process receiving the message.  The initial transfer is done using a Client on the message Author's machine, typically a part of his email programIntermediate transfers involve Relays having both Server and Client processes.  The final transfer to a recipient's machine uses the [[POP]] or [[IMAP]] protocols.  SMTP "pushes" messages to the next Relay.  POP and IMAP "pull" messages from a mailstore at the destination.


After establishing a TCP connection, the message transfer is guided by a sequence of plain-text commands from the Client and reply codes from the Server.  The purpose of the commands is to provide "envelope" information so that the message can be handled without having to read its contents.  This separation of function allows the email system to work reliably and efficiently, without putting any constraints on the content or syntax of the message itself.  The content may even be encrypted, making it totally unintelligible to the mail handling system.
After establishing a TCP connection, the message transfer is guided by a sequence of plain-text commands from the Client and reply codes from the Server.  The purpose of the commands is to provide "envelope" information so that the message can be handled without having to read its contents.  This separation of function allows the email system to work reliably and efficiently, without putting any constraints on the content or syntax of the message itself.  The content may even be encrypted, making it totally unintelligible to the mail handling system.
Line 31: Line 31:
  C: HELO mailout1.phrednet.com
  C: HELO mailout1.phrednet.com
  S: 250 example.org Hello ip068.subnet71.gci-net.com [216.183.71.68], pleased to meet you
  S: 250 example.org Hello ip068.subnet71.gci-net.com [216.183.71.68], pleased to meet you
  C: MAIL FROM:<xxxx@box67.com>
  C: MAIL FROM:<xxxx@example.com>
  S: 250 2.1.0 <xxxx@box67.com>... Sender ok
  S: 250 2.1.0 <xxxx@example.com>... Sender ok
  C: RCPT TO:<yyyy@box67.com>
  C: RCPT TO:<yyyy@example.com>
  S: 250 2.1.5 <yyyy@box67.com>... Recipient ok
  S: 250 2.1.5 <yyyy@example.com>... Recipient ok
  C: DATA
  C: DATA
  S: 354 Enter mail, end with "." on a line by itself
  S: 354 Enter mail, end with "." on a line by itself
Line 47: Line 47:


Here is a step-by-step explanation of the session above:
Here is a step-by-step explanation of the session above:
1) The telnet program requests a TCP connection to port 25 at the IP address of a server for example.org.  Telnet uses a [[DNS]] query to find this address.


220 is the standard three-digit reply code for an email server to accept a connection request.  If this were an automated process instead of telnet, the client machine would read the standard code and ignore the rest of the line, which is intended for humans reading a log file.  There is no standard form for the information after a reply code.  The administrator at example.org might decide, for example, that it is not a good idea to advertise exactly what version of Sendmail he is runningIf a vulnerability is discovered in that version, within hours there could be a hundred criminals scanning the Internet for any systems running that version.
'''1)''' The telnet program requests a TCP connection to port 25 at the IP address of a server for example.org.  Telnet uses a [[DNS]] query to find this address.


2) The HELO command {{EHLO}} requests a mail session and identifies the client machine.  The identifier should end in the domain name registered to the organization or individual who is responsible for this machine.
220 is the standard three-digit reply code for an email server to accept a connection request.  If this were an automated process instead of telnet, the Client machine would read the standard code and ignore the rest of the line, which is intended for humans reading a log fileThere is no standard form for the information after a reply code.  The administrator at example.org might decide, for example, that it is not a good idea to advertise exactly what version of Sendmail he is running.  If a vulnerability is discovered in that version, within hours there could be a hundred criminals scanning the Internet for any systems running that version.


250 is the reply code for OK (the mail session is accepted)Following that code Sendmail provides a more complete greeting message with information (the IPname and IP address of the client machine) that might be useful to the sender if there is a problemThe IP address is the source address of the TCP connection.  The IPname is found by a [[Reverse DNS]] query on the IP address.
'''2)''' The HELO command
<ref>
An alternative command '''EHLO''', is actually seen more often.  This is a request to use [[ESMTP]], an "extended" version of the original SMTP that supports new functionalityIf the Server doesn't support ESMTP, the command fails, and the ESMTP Client repeats the request using the old HELO commandThis awkward procedure was necessry because the syntax of the original HELO command did not allow for future options.
</ref>
requests a mail session and identifies the Client machine.  The identifier should end in the domain name registered to the organization or individual who is responsible for this machine.


Notice that the IPname assigned by the client's network owner can be different than the HELO name used by the client, so an IPname is not a good way to identify the clientNetwork owners often assign thousands of these names using a script which generates the name from the IP address.  Savvy mail admins will avoid using these names in their HELO commands, because some anti-spam filters treat these automated names as a sign of spam.
250 is the reply code for OK (the command was run without error).  Following that code Sendmail provides a more complete greeting message with information (the IPname and IP address of the Client machine) that might be useful to the sender if there is a problem.  The IP address is the source address of the TCP connectionThe IPname is found by a [[Reverse DNS]] query on the IP address.


3) The MAIL FROM command provides a Return Address
Notice that the IPname assigned by the Client's network owner can be different than the HELO name used by the Client, so an IPname is not a good way to identify the Client.  Network owners often assign thousands of these names using a script which generates the name from the IP address.  Savvy mail admins will avoid using these names in their HELO commands, because some anti-spam filters treat these automated names as a sign of spam.
 
'''3)''' The MAIL FROM command provides a Return Address
<ref>
<ref>
The Return Address is also known as the Return Path or the Reverse Path.  "Path" however, is a misnomer, since this is really a destination point, not a complete path.  We prefer Return Address, since this SMTP envelope item has exactly the same function as the return address on a paper envelope.
The '''Return Address''' is also known as the Return Path or the Reverse Path.  "Path" however, is a misnomer, since this is really a destination point, not a complete path.  We prefer Return Address, since this SMTP envelope item has the same function as the return address on a paper envelope.
</ref>
</ref>
for an error report if there is a problem at any Relay between the sender and the recipient.  The Return Address is usually the same as the From Address in the headers of the message, but it can be different.  It might, for example, be re-written by a [[Forwarder]] so as to intercept any error reports from downstream [[Agents]].  It might also be null, indicating that no error reports should be sent.  This option should always be used for the error messages themselves, avoiding the risk of one error message generating another, ad infinitum.
for an error report if there is a problem at any Relay between the sender and the recipient.  The Return Address is usually the same as the From Address in the headers of the message, but it can be different.  It might, for example, be re-written by a [[Forwarder]] so as to intercept any error reports from downstream [[Agents]].  It might also be null, indicating that no error reports should be sent.  This option should always be used for the error messages themselves, avoiding the risk of one error message generating another, ad infinitum.


The 250 reply code is common to four of the commands in this session.  Notice there is an additional "enhanced status code"  on three of these commands.  These enhanced codes {{RFC-1893 Enhanced Mail System Status Codes}} were standardized after years of experience using just the original reply codes.  They provide a more detailed machine-readable classification of the various replies to a command.  Enhanced codes are optional, and all mail servers should work with just the original three-digit codes.
The 250 reply code is common to four of the commands in this session.  Notice there is an additional "enhanced status code"  on some of these replies.  These enhanced codes
<ref>
G. Vaudreuil (1996) RFC-1893 Enhanced Mail System Status Codes, http://www.ietf.org/rfc/rfc1893.txt
</ref>
were standardized after years of experience using just the original reply codes.  They provide a more detailed machine-readable classification of the various replies to a command.  Enhanced codes are optional, and all mail servers should work with just the original three-digit codes.


4) The RCPT TO command specifies the address of one recipient.  This command is repeated for each recipient.  Any or all of the recipients can be rejected, and delivery will be attemted for those recipients whose address was accepted.  If none are accepted, the entire message is rejected.
'''4)''' The RCPT TO command specifies the address of one recipient.  This command is repeated for each recipient.  Any or all of the recipients can be rejected, and delivery will be attempted for those recipients whose address was accepted.  If none are accepted, the entire message is rejected.


One of the functions of an email Relay is to group recipient addresses so that only one copy of a message must be sent to each group at a single destination.  This can be a problem if some of those addresses were designated BCC by the message author.  SMTP makes no distinction between TO, CC, and BCC addresses.  Senders should never assume that BCC addresses are truly hidden.
One of the functions of an email Relay is to group recipient addresses so that only one copy of a message must be sent to each group at a single destination.  This can be a problem if some of those addresses were designated BCC by the message author.  SMTP makes no distinction between TO, CC, and BCC addresses.  Senders should never assume that BCC addresses are truly hidden.


5) The DATA command initiates the transfer of the message data, which can include headers, plain text, HTML text, and various other blocks of data [[encoded]] so that even binary data can be sent as simple strings of ASCII characters. {{Raw binary data cannot be sent, since there might be confusion if the end-of-line sequence (\r\n) occurs anywhere in the data.}} Headers come first, terminated by a blank line.  The blank line in our example appears as two consecutive end-of-lines.  The end of the entire message appears as a "." on a line by itself (\r\n.\r\n).
'''5)''' The DATA command initiates the transfer of the message data, which can include headers, plain text, HTML text, and various other blocks of data [[encoded]] so that even binary data can be sent as simple strings of ASCII characters. Raw binary data cannot be sent, since there might be confusion if the end-of-line sequence (\r\n) occurs anywhere in the data.  Headers come first, terminated by a blank line.  The blank line in our example appears as two consecutive end-of-lines.  The end of the entire message appears as a "." on a line by itself (\r\n.\r\n).


6) The QUIT command terminates the mail session.
'''6)''' The QUIT command terminates the mail session.


7) The TCP connection is then closed by the mail server.
'''7)''' The TCP connection is then closed by the mail server.


== Notes ==
== Notes ==
Line 84: Line 93:


== Bibliography ==
== Bibliography ==
{{subpages}}
[Klensin08] J. Klensin, ed. (2008) "Simple Mail Transfer Protocol", RFC-5321, http://tools.ietf.org/html/rfc5321.
[Klensin08] J. Klensin, ed. (2008) "Simple Mail Transfer Protocol", RFC-5321, http://tools.ietf.org/html/rfc5321.



Revision as of 18:38, 20 November 2008

Talk

The challenge in this article is to introduce a topic that has a huge amount of detail without overwhelming the non-expert reader. We need to avoid the "written by committee" style, where every contributor gets to squeeze in a few facts that he considers important. Luckily, we have an authoritative reference (RFC-5321) which covers all the details in 94 pages. We will include just those details that are needed for a coherent presentation of the basics, or that are interesting enough to outweigh the burden of including them. The reader needing more facts can also go to the Wikipedia article, which is a lot more verbose than this one.

Terminology is also a challenge. Should we use the same terms the experts use (MTA, Reverse Path, etc.) or terms that are more meaningful to non-experts (Mail Relay, Return Address, etc.)? We have chosen the latter, because our articles are intended for non-experts. Experts will have no trouble understanding what we mean, as long as we avoid mis-using any of their special terminology. We will capitalize terms that we intend to have a special meaning (e.g. Relay instead of relay).

Email System > Message Transfer

This subtopic provides a brief explanation of the Simple Mail Transfer Protocol (SMTP) used to move email messages across the Internet. A complete explanation of SMTP is found in RFC-5321 [Klensin08]. We will assume the reader understands the basic operation of the email system and the role of Mail Relays, as described in the parent article.

Message transfer at each "hop" is done by establishing a TCP connection between a Client SMTP process initiating the transfer and a Server SMTP process receiving the message. The initial transfer is done using a Client on the message Author's machine, typically a part of his email program. Intermediate transfers involve Relays having both Server and Client processes. The final transfer to a recipient's machine uses the POP or IMAP protocols. SMTP "pushes" messages to the next Relay. POP and IMAP "pull" messages from a mailstore at the destination.

After establishing a TCP connection, the message transfer is guided by a sequence of plain-text commands from the Client and reply codes from the Server. The purpose of the commands is to provide "envelope" information so that the message can be handled without having to read its contents. This separation of function allows the email system to work reliably and efficiently, without putting any constraints on the content or syntax of the message itself. The content may even be encrypted, making it totally unintelligible to the mail handling system.

A good way to understand the message transfer process is to send a small message and issue the commands manually using the 'telnet' program available from the command prompt window on most machines. Telnet is a general-purpose program for communication using TCP. Using telnet, you should be able to connect to your email service provider on port 25, the standard email service port (or port 587 if your provider expects you to authenticate your identity).

Here are the steps in a typical message transfer:

1) Establish a TCP connection.
2) Establish a mail session.
3) Provide a Return Address for the next message.
4) Provide a Recipient Address for the next message.
4a) Repeat step 4 for additional recipients.
5) Transfer the message and all its attachments.
5a) Repeat from 3 for additional messages.
6) Terminate the mail session.
7) Close the TCP connection.

and here is what an actual email session looks like (names changed to avoid abuse). $ is the command prompt. C: means Client, and S: means Server:

$ telnet example.org 25
S: 220 example.org ESMTP Sendmail 8.13.1/8.13.1; Wed, 30 Aug 2006 07:36:42 -0400
C: HELO mailout1.phrednet.com
S: 250 example.org Hello ip068.subnet71.gci-net.com [216.183.71.68], pleased to meet you
C: MAIL FROM:<xxxx@example.com>
S: 250 2.1.0 <xxxx@example.com>... Sender ok
C: RCPT TO:<yyyy@example.com>
S: 250 2.1.5 <yyyy@example.com>... Recipient ok
C: DATA
S: 354 Enter mail, end with "." on a line by itself
From: Dave\r\nTo: Test Recipient\r\nSubject: SPAM SPAM SPAM\r\n\r\nThis is message 1 from our test script.\r\n.\r\n
S: 250 2.0.0 k7TKIBYb024731 Message accepted for delivery
C: QUIT
S: 221 2.0.0 example.org closing connection 
Connection closed by foreign host.
$

In this session, we have used the five most basic SMTP commands (HELO, MAIL FROM, RCTP TO, DATA, and QUIT) to communicate with a server running Sendmail, one of the most popular mail server programs. The commands and the numerical response codes are standard for all server programs, but the response text following the standard codes may differ. Complete details of these and other less common commands are found in [Klensin08].

Here is a step-by-step explanation of the session above:

1) The telnet program requests a TCP connection to port 25 at the IP address of a server for example.org. Telnet uses a DNS query to find this address.

220 is the standard three-digit reply code for an email server to accept a connection request. If this were an automated process instead of telnet, the Client machine would read the standard code and ignore the rest of the line, which is intended for humans reading a log file. There is no standard form for the information after a reply code. The administrator at example.org might decide, for example, that it is not a good idea to advertise exactly what version of Sendmail he is running. If a vulnerability is discovered in that version, within hours there could be a hundred criminals scanning the Internet for any systems running that version.

2) The HELO command [1] requests a mail session and identifies the Client machine. The identifier should end in the domain name registered to the organization or individual who is responsible for this machine.

250 is the reply code for OK (the command was run without error). Following that code Sendmail provides a more complete greeting message with information (the IPname and IP address of the Client machine) that might be useful to the sender if there is a problem. The IP address is the source address of the TCP connection. The IPname is found by a Reverse DNS query on the IP address.

Notice that the IPname assigned by the Client's network owner can be different than the HELO name used by the Client, so an IPname is not a good way to identify the Client. Network owners often assign thousands of these names using a script which generates the name from the IP address. Savvy mail admins will avoid using these names in their HELO commands, because some anti-spam filters treat these automated names as a sign of spam.

3) The MAIL FROM command provides a Return Address [2] for an error report if there is a problem at any Relay between the sender and the recipient. The Return Address is usually the same as the From Address in the headers of the message, but it can be different. It might, for example, be re-written by a Forwarder so as to intercept any error reports from downstream Agents. It might also be null, indicating that no error reports should be sent. This option should always be used for the error messages themselves, avoiding the risk of one error message generating another, ad infinitum.

The 250 reply code is common to four of the commands in this session. Notice there is an additional "enhanced status code" on some of these replies. These enhanced codes [3] were standardized after years of experience using just the original reply codes. They provide a more detailed machine-readable classification of the various replies to a command. Enhanced codes are optional, and all mail servers should work with just the original three-digit codes.

4) The RCPT TO command specifies the address of one recipient. This command is repeated for each recipient. Any or all of the recipients can be rejected, and delivery will be attempted for those recipients whose address was accepted. If none are accepted, the entire message is rejected.

One of the functions of an email Relay is to group recipient addresses so that only one copy of a message must be sent to each group at a single destination. This can be a problem if some of those addresses were designated BCC by the message author. SMTP makes no distinction between TO, CC, and BCC addresses. Senders should never assume that BCC addresses are truly hidden.

5) The DATA command initiates the transfer of the message data, which can include headers, plain text, HTML text, and various other blocks of data encoded so that even binary data can be sent as simple strings of ASCII characters. Raw binary data cannot be sent, since there might be confusion if the end-of-line sequence (\r\n) occurs anywhere in the data. Headers come first, terminated by a blank line. The blank line in our example appears as two consecutive end-of-lines. The end of the entire message appears as a "." on a line by itself (\r\n.\r\n).

6) The QUIT command terminates the mail session.

7) The TCP connection is then closed by the mail server.

Notes

  1. An alternative command EHLO, is actually seen more often. This is a request to use ESMTP, an "extended" version of the original SMTP that supports new functionality. If the Server doesn't support ESMTP, the command fails, and the ESMTP Client repeats the request using the old HELO command. This awkward procedure was necessry because the syntax of the original HELO command did not allow for future options.
  2. The Return Address is also known as the Return Path or the Reverse Path. "Path" however, is a misnomer, since this is really a destination point, not a complete path. We prefer Return Address, since this SMTP envelope item has the same function as the return address on a paper envelope.
  3. G. Vaudreuil (1996) RFC-1893 Enhanced Mail System Status Codes, http://www.ietf.org/rfc/rfc1893.txt

Related Articles

Email System
Message Formats
Email Authentication

Bibliography

[Klensin08] J. Klensin, ed. (2008) "Simple Mail Transfer Protocol", RFC-5321, http://tools.ietf.org/html/rfc5321.

[PnD07] L. Peterson, B. Davie (2007) "Computer Networks: A Systems Approach" 4th ed. Sect. 9.1.1 "Electronic Mail", ISBN 0-12-370548-7.

[Resnick08] P. Resnick, ed. (2008) "Internet Message Format", RFC-5322, http://tools.ietf.org/html/rfc5322.

[Stevens94] W.R. Stevens (1994) "TCP/IP Illustrated, vol. 1, The Protocols", Chapter 28, "SMTP", ISBN 0-201-63346-9.

[Wikipedia08] Simple Mail Transfer Protocol. More details on SMTP, including history, abuse, and related protocols.

Possible Additional Topics

  ESMTP - RFC-5321
  Port 587 - RFC-4409
  Reply Codes
    550, 450 - greylisting
  Options
    MAIL FROM