Network Working Group P. Resnick, Ed. Request for Comments: 5322 Qualcomm Incorporated Obsoletes: 2822 October 2008 Updates: 4021 Category: Standards Track Internet Message Format Status of This Memo This document specifies an Internet standards track protocol for the Internet community, and requests discussion and suggestions for improvements. Please refer to the current edition of the "Internet Official Protocol Standards" (STD 1) for the standardization state and status of this protocol. Distribution of this memo is unlimited.Abstract
This document specifies the Internet Message Format (IMF), a syntax for text messages that are sent between computer users, within the framework of "electronic mail" messages. This specification is a revision of Request For Comments (RFC) 2822, which itself superseded Request For Comments (RFC) 822, "Standard for the Format of ARPA Internet Text Messages", updating it to reflect current practice and incorporating incremental changes that were specified in other RFCs.
Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Scope . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Notational Conventions . . . . . . . . . . . . . . . . . . 5 1.2.1. Requirements Notation . . . . . . . . . . . . . . . . 5 1.2.2. Syntactic Notation . . . . . . . . . . . . . . . . . . 5 1.2.3. Structure of This Document . . . . . . . . . . . . . . 5 2. Lexical Analysis of Messages . . . . . . . . . . . . . . . . . 6 2.1. General Description . . . . . . . . . . . . . . . . . . . 6 2.1.1. Line Length Limits . . . . . . . . . . . . . . . . . . 7 2.2. Header Fields . . . . . . . . . . . . . . . . . . . . . . 8 2.2.1. Unstructured Header Field Bodies . . . . . . . . . . . 8 2.2.2. Structured Header Field Bodies . . . . . . . . . . . . 8 2.2.3. Long Header Fields . . . . . . . . . . . . . . . . . . 8 2.3. Body . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 3. Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.1. Introduction . . . . . . . . . . . . . . . . . . . . . . . 10 3.2. Lexical Tokens . . . . . . . . . . . . . . . . . . . . . . 10 3.2.1. Quoted characters . . . . . . . . . . . . . . . . . . 10 3.2.2. Folding White Space and Comments . . . . . . . . . . . 11 3.2.3. Atom . . . . . . . . . . . . . . . . . . . . . . . . . 12 3.2.4. Quoted Strings . . . . . . . . . . . . . . . . . . . . 13 3.2.5. Miscellaneous Tokens . . . . . . . . . . . . . . . . . 14 3.3. Date and Time Specification . . . . . . . . . . . . . . . 14 3.4. Address Specification . . . . . . . . . . . . . . . . . . 16 3.4.1. Addr-Spec Specification . . . . . . . . . . . . . . . 17 3.5. Overall Message Syntax . . . . . . . . . . . . . . . . . . 18 3.6. Field Definitions . . . . . . . . . . . . . . . . . . . . 19 3.6.1. The Origination Date Field . . . . . . . . . . . . . . 22 3.6.2. Originator Fields . . . . . . . . . . . . . . . . . . 22 3.6.3. Destination Address Fields . . . . . . . . . . . . . . 23 3.6.4. Identification Fields . . . . . . . . . . . . . . . . 25 3.6.5. Informational Fields . . . . . . . . . . . . . . . . . 27 3.6.6. Resent Fields . . . . . . . . . . . . . . . . . . . . 28 3.6.7. Trace Fields . . . . . . . . . . . . . . . . . . . . . 30 3.6.8. Optional Fields . . . . . . . . . . . . . . . . . . . 30 4. Obsolete Syntax . . . . . . . . . . . . . . . . . . . . . . . 31 4.1. Miscellaneous Obsolete Tokens . . . . . . . . . . . . . . 32 4.2. Obsolete Folding White Space . . . . . . . . . . . . . . . 33 4.3. Obsolete Date and Time . . . . . . . . . . . . . . . . . . 33 4.4. Obsolete Addressing . . . . . . . . . . . . . . . . . . . 35 4.5. Obsolete Header Fields . . . . . . . . . . . . . . . . . . 35 4.5.1. Obsolete Origination Date Field . . . . . . . . . . . 36 4.5.2. Obsolete Originator Fields . . . . . . . . . . . . . . 36 4.5.3. Obsolete Destination Address Fields . . . . . . . . . 37 4.5.4. Obsolete Identification Fields . . . . . . . . . . . . 37 4.5.5. Obsolete Informational Fields . . . . . . . . . . . . 37
4.5.6. Obsolete Resent Fields . . . . . . . . . . . . . . . . 38 4.5.7. Obsolete Trace Fields . . . . . . . . . . . . . . . . 38 4.5.8. Obsolete optional fields . . . . . . . . . . . . . . . 38 5. Security Considerations . . . . . . . . . . . . . . . . . . . 38 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 39 Appendix A. Example Messages . . . . . . . . . . . . . . . . . 43 Appendix A.1. Addressing Examples . . . . . . . . . . . . . . . 44 Appendix A.1.1. A Message from One Person to Another with Simple Addressing . . . . . . . . . . . . . . . . 44 Appendix A.1.2. Different Types of Mailboxes . . . . . . . . . . . 45 Appendix A.1.3. Group Addresses . . . . . . . . . . . . . . . . . 45 Appendix A.2. Reply Messages . . . . . . . . . . . . . . . . . . 46 Appendix A.3. Resent Messages . . . . . . . . . . . . . . . . . 47 Appendix A.4. Messages with Trace Fields . . . . . . . . . . . . 48 Appendix A.5. White Space, Comments, and Other Oddities . . . . 49 Appendix A.6. Obsoleted Forms . . . . . . . . . . . . . . . . . 50 Appendix A.6.1. Obsolete Addressing . . . . . . . . . . . . . . . 50 Appendix A.6.2. Obsolete Dates . . . . . . . . . . . . . . . . . . 50 Appendix A.6.3. Obsolete White Space and Comments . . . . . . . . 51 Appendix B. Differences from Earlier Specifications . . . . . 52 Appendix C. Acknowledgements . . . . . . . . . . . . . . . . . 53 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 55 7.1. Normative References . . . . . . . . . . . . . . . . . . . 55 7.2. Informative References . . . . . . . . . . . . . . . . . . 55
1. Introduction
1.1. Scope
This document specifies the Internet Message Format (IMF), a syntax for text messages that are sent between computer users, within the framework of "electronic mail" messages. This specification is an update to [RFC2822], which itself superseded [RFC0822], updating it to reflect current practice and incorporating incremental changes that were specified in other RFCs such as [RFC1123]. This document specifies a syntax only for text messages. In particular, it makes no provision for the transmission of images, audio, or other sorts of structured data in electronic mail messages. There are several extensions published, such as the MIME document series ([RFC2045], [RFC2046], [RFC2049]), which describe mechanisms for the transmission of such data through electronic mail, either by extending the syntax provided here or by structuring such messages to conform to this syntax. Those mechanisms are outside of the scope of this specification. In the context of electronic mail, messages are viewed as having an envelope and contents. The envelope contains whatever information is needed to accomplish transmission and delivery. (See [RFC5321] for a discussion of the envelope.) The contents comprise the object to be delivered to the recipient. This specification applies only to the format and some of the semantics of message contents. It contains no specification of the information in the envelope. However, some message systems may use information from the contents to create the envelope. It is intended that this specification facilitate the acquisition of such information by programs. This specification is intended as a definition of what message content format is to be passed between systems. Though some message systems locally store messages in this format (which eliminates the need for translation between formats) and others use formats that differ from the one specified in this specification, local storage is outside of the scope of this specification. Note: This specification is not intended to dictate the internal formats used by sites, the specific message system features that they are expected to support, or any of the characteristics of user interface programs that create or read messages. In addition, this document does not specify an encoding of the characters for either transport or storage; that is, it does not specify the number of bits used or how those bits are specifically transferred over the wire or stored on disk.
1.2. Notational Conventions
1.2.1. Requirements Notation
This document occasionally uses terms that appear in capital letters. When the terms "MUST", "SHOULD", "RECOMMENDED", "MUST NOT", "SHOULD NOT", and "MAY" appear capitalized, they are being used to indicate particular requirements of this specification. A discussion of the meanings of these terms appears in [RFC2119].1.2.2. Syntactic Notation
This specification uses the Augmented Backus-Naur Form (ABNF) [RFC5234] notation for the formal definitions of the syntax of messages. Characters will be specified either by a decimal value (e.g., the value %d65 for uppercase A and %d97 for lowercase A) or by a case-insensitive literal value enclosed in quotation marks (e.g., "A" for either uppercase or lowercase A).1.2.3. Structure of This Document
This document is divided into several sections. This section, section 1, is a short introduction to the document. Section 2 lays out the general description of a message and its constituent parts. This is an overview to help the reader understand some of the general principles used in the later portions of this document. Any examples in this section MUST NOT be taken as specification of the formal syntax of any part of a message. Section 3 specifies formal ABNF rules for the structure of each part of a message (the syntax) and describes the relationship between those parts and their meaning in the context of a message (the semantics). That is, it lays out the actual rules for the structure of each part of a message (the syntax) as well as a description of the parts and instructions for their interpretation (the semantics). This includes analysis of the syntax and semantics of subparts of messages that have specific structure. The syntax included in section 3 represents messages as they MUST be created. There are also notes in section 3 to indicate if any of the options specified in the syntax SHOULD be used over any of the others. Both sections 2 and 3 describe messages that are legal to generate for purposes of this specification.
Section 4 of this document specifies an "obsolete" syntax. There are references in section 3 to these obsolete syntactic elements. The rules of the obsolete syntax are elements that have appeared in earlier versions of this specification or have previously been widely used in Internet messages. As such, these elements MUST be interpreted by parsers of messages in order to be conformant to this specification. However, since items in this syntax have been determined to be non-interoperable or to cause significant problems for recipients of messages, they MUST NOT be generated by creators of conformant messages. Section 5 details security considerations to take into account when implementing this specification. Appendix A lists examples of different sorts of messages. These examples are not exhaustive of the types of messages that appear on the Internet, but give a broad overview of certain syntactic forms. Appendix B lists the differences between this specification and earlier specifications for Internet messages. Appendix C contains acknowledgements.2. Lexical Analysis of Messages
2.1. General Description
At the most basic level, a message is a series of characters. A message that is conformant with this specification is composed of characters with values in the range of 1 through 127 and interpreted as US-ASCII [ANSI.X3-4.1986] characters. For brevity, this document sometimes refers to this range of characters as simply "US-ASCII characters". Note: This document specifies that messages are made up of characters in the US-ASCII range of 1 through 127. There are other documents, specifically the MIME document series ([RFC2045], [RFC2046], [RFC2047], [RFC2049], [RFC4288], [RFC4289]), that extend this specification to allow for values outside of that range. Discussion of those mechanisms is not within the scope of this specification. Messages are divided into lines of characters. A line is a series of characters that is delimited with the two characters carriage-return and line-feed; that is, the carriage return (CR) character (ASCII value 13) followed immediately by the line feed (LF) character (ASCII value 10). (The carriage return/line feed pair is usually written in this document as "CRLF".)
A message consists of header fields (collectively called "the header section of the message") followed, optionally, by a body. The header section is a sequence of lines of characters with special syntax as defined in this specification. The body is simply a sequence of characters that follows the header section and is separated from the header section by an empty line (i.e., a line with nothing preceding the CRLF). Note: Common parlance and earlier versions of this specification use the term "header" to either refer to the entire header section or to refer to an individual header field. To avoid ambiguity, this document does not use the terms "header" or "headers" in isolation, but instead always uses "header field" to refer to the individual field and "header section" to refer to the entire collection.2.1.1. Line Length Limits
There are two limits that this specification places on the number of characters in a line. Each line of characters MUST be no more than 998 characters, and SHOULD be no more than 78 characters, excluding the CRLF. The 998 character limit is due to limitations in many implementations that send, receive, or store IMF messages which simply cannot handle more than 998 characters on a line. Receiving implementations would do well to handle an arbitrarily large number of characters in a line for robustness sake. However, there are so many implementations that (in compliance with the transport requirements of [RFC5321]) do not accept messages containing more than 1000 characters including the CR and LF per line, it is important for implementations not to create such messages. The more conservative 78 character recommendation is to accommodate the many implementations of user interfaces that display these messages which may truncate, or disastrously wrap, the display of more than 78 characters per line, in spite of the fact that such implementations are non-conformant to the intent of this specification (and that of [RFC5321] if they actually cause information to be lost). Again, even though this limitation is put on messages, it is incumbent upon implementations that display messages to handle an arbitrarily large number of characters in a line (certainly at least up to the 998 character limit) for the sake of robustness.
2.2. Header Fields
Header fields are lines beginning with a field name, followed by a colon (":"), followed by a field body, and terminated by CRLF. A field name MUST be composed of printable US-ASCII characters (i.e., characters that have values between 33 and 126, inclusive), except colon. A field body may be composed of printable US-ASCII characters as well as the space (SP, ASCII value 32) and horizontal tab (HTAB, ASCII value 9) characters (together known as the white space characters, WSP). A field body MUST NOT include CR and LF except when used in "folding" and "unfolding", as described in section 2.2.3. All field bodies MUST conform to the syntax described in sections 3 and 4 of this specification.2.2.1. Unstructured Header Field Bodies
Some field bodies in this specification are defined simply as "unstructured" (which is specified in section 3.2.5 as any printable US-ASCII characters plus white space characters) with no further restrictions. These are referred to as unstructured field bodies. Semantically, unstructured field bodies are simply to be treated as a single line of characters with no further processing (except for "folding" and "unfolding" as described in section 2.2.3).2.2.2. Structured Header Field Bodies
Some field bodies in this specification have a syntax that is more restrictive than the unstructured field bodies described above. These are referred to as "structured" field bodies. Structured field bodies are sequences of specific lexical tokens as described in sections 3 and 4 of this specification. Many of these tokens are allowed (according to their syntax) to be introduced or end with comments (as described in section 3.2.2) as well as the white space characters, and those white space characters are subject to "folding" and "unfolding" as described in section 2.2.3. Semantic analysis of structured field bodies is given along with their syntax.2.2.3. Long Header Fields
Each header field is logically a single line of characters comprising the field name, the colon, and the field body. For convenience however, and to deal with the 998/78 character limitations per line, the field body portion of a header field can be split into a multiple-line representation; this is called "folding". The general rule is that wherever this specification allows for folding white space (not simply WSP characters), a CRLF may be inserted before any WSP.
For example, the header field: Subject: This is a test can be represented as: Subject: This is a test Note: Though structured field bodies are defined in such a way that folding can take place between many of the lexical tokens (and even within some of the lexical tokens), folding SHOULD be limited to placing the CRLF at higher-level syntactic breaks. For instance, if a field body is defined as comma-separated values, it is recommended that folding occur after the comma separating the structured items in preference to other places where the field could be folded, even if it is allowed elsewhere. The process of moving from this folded multiple-line representation of a header field to its single line representation is called "unfolding". Unfolding is accomplished by simply removing any CRLF that is immediately followed by WSP. Each header field should be treated in its unfolded form for further syntactic and semantic evaluation. An unfolded header field has no length restriction and therefore may be indeterminately long.2.3. Body
The body of a message is simply lines of US-ASCII characters. The only two limitations on the body are as follows: o CR and LF MUST only occur together as CRLF; they MUST NOT appear independently in the body. o Lines of characters in the body MUST be limited to 998 characters, and SHOULD be limited to 78 characters, excluding the CRLF. Note: As was stated earlier, there are other documents, specifically the MIME documents ([RFC2045], [RFC2046], [RFC2049], [RFC4288], [RFC4289]), that extend (and limit) this specification to allow for different sorts of message bodies. Again, these mechanisms are beyond the scope of this document.