Independent Submission H. Spencer Request for Comments: 1849 SP Systems Obsoleted by: 5536, 5537 March 2010 Category: Historic ISSN: 2070-1721 "Son of 1036": News Article Format and Transmission Abstract By the early 1990s, it had become clear that RFC 1036, then the specification for the Interchange of USENET Messages, was badly in need of repair. This "Internet-Draft-to-be", though never formally published at that time, was widely circulated and became the de facto standard for implementors of News Servers and User Agents, rapidly acquiring the nickname "Son of 1036". Indeed, under that name, it could fairly be described as the best-known Internet Draft (n)ever published, and it formed the starting point for the recently adopted Proposed Standards for Netnews. It is being published now in order to provide the historical background out of which those standards have grown. Present-day implementors should be aware that it is NOT NOW APPROPRIATE for use in current implementations. Status of This Memo This document is not an Internet Standards Track specification; it is published for the historical record. This document defines a Historic Document for the Internet community. This is a contribution to the RFC Series, independently of any other RFC stream. The RFC Editor has chosen to publish this document at its discretion and makes no statement about its value for implementation or deployment. Documents approved for publication by the RFC Editor are not a candidate for any level of Internet Standard; see Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc1849.
Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. This document may not be modified, and derivative works of it may not be created, except to format it for publication as an RFC or to translate it into languages other than English.
Table of Contents Preface ............................................................5 Original Abstract ..................................................6 1. Introduction ....................................................6 2. Definitions, Notations, and Conventions .........................8 2.1. Textual Notations ..........................................8 2.2. Syntax Notation ............................................9 2.3. Definitions ...............................................10 2.4. End-of-Line ...............................................13 2.5. Case-Sensitivity ..........................................13 2.6. Language ..................................................13 3. Relation to MAIL (RFC822, etc.) ................................14 4. Basic Format ...................................................15 4.1. Overall Syntax ............................................15 4.2. Headers ...................................................16 4.2.1. Names and Contents .................................16 4.2.2. Undesirable Headers ................................18 4.2.3. White Space and Continuations ......................18 4.3. Body ......................................................19 4.3.1. Body Format Issues .................................19 4.3.2. Body Conventions ...................................20 4.4. Characters and Character Sets .............................23 4.5. Non-ASCII Characters in Headers ...........................26 4.6. Size Limits ...............................................28 4.7. Example ...................................................30 5. Mandatory Headers ..............................................30 5.1. Date ......................................................31 5.2. From ......................................................33 5.3. Message-ID ................................................35 5.4. Subject ...................................................36 5.5. Newsgroups ................................................38 5.6. Path ......................................................42 6. Optional Headers ...............................................45 6.1. Followup-To ...............................................45 6.2. Expires ...................................................46 6.3. Reply-To ..................................................47 6.4. Sender ....................................................47 6.5. References ................................................48 6.6. Control ...................................................50 6.7. Distribution ..............................................51 6.8. Keywords ..................................................52 6.9. Summary ...................................................53 6.10. Approved .................................................53 6.11. Lines ....................................................54 6.12. Xref .....................................................55 6.13. Organization .............................................56 6.14. Supersedes ...............................................57
6.15. Also-Control .............................................57 6.16. See-Also .................................................58 6.17. Article-Names ............................................58 6.18. Article-Updates ..........................................60 7. Control Messages ...............................................60 7.1. cancel ....................................................61 7.2. ihave, sendme .............................................64 7.3. newgroup ..................................................66 7.4. rmgroup ...................................................68 7.5. sendsys, version, whogets .................................68 7.6. checkgroups ...............................................73 8. Transmission Formats ...........................................74 8.1. Batches ...................................................74 8.2. Encoded Batches ...........................................75 8.3. News within Mail ..........................................76 8.4. Partial Batches ...........................................77 9. Propagation and Processing .....................................77 9.1. Relayer General Issues ....................................78 9.2. Article Acceptance and Propagation ........................80 9.3. Administrator Contact .....................................82 10. Gatewaying ....................................................83 10.1. General Gatewaying Issues ................................83 10.2. Header Synthesis .........................................85 10.3. Message ID Mapping .......................................86 10.4. Mail to and from News ....................................88 10.5. Gateway Administration ...................................89 11. Security and Related Issues ...................................90 11.1. Leakage ..................................................90 11.2. Attacks ..................................................91 11.3. Anarchy ..................................................92 11.4. Liability ................................................92 12. References ....................................................93 Appendix A. Archaeological Notes ..................................96 A.1. "A News" Article Format ...................................96 A.2. Early "B News" Article Format .............................96 A.3. Obsolete Headers ..........................................97 A.4. Obsolete Control Messages .................................97 Appendix B. A Quick Tour of MIME ..................................98 Appendix C. Summary of Changes Since RFC 1036 ....................103 Appendix D. Summary of Completely New Features ...................104 Appendix E. Summary of Differences from RFCs 822 and 1123.........105
Preface Although [RFC1036] was published in 1987, for many years it remained the only formally published specification for Netnews format and processing. It was widely considered obsolete within a few years, and it has now been superseded by the work of the USEFOR Working Group, leading to the publication of [RFC5536] and [RFC5537]. However, there was an intermediate step that is of some historical interest. In 1993-4, Henry Spencer wrote and informally circulated a document that became known as "Son of 1036", meant as a first draft of a replacement for [RFC1036]. It went no further at the time (although, more recently, the USEFOR Working Group started from it), but has nevertheless seen considerable use as a technical reference and even a de facto standard, despite its informal status. The USEFOR work has eliminated any further relevance of Son of 1036 as a technical reference, but it remains of historical interest. The USEFOR Working Group has asked that it be published as an Historic RFC, to ensure its preservation in an accessible form and facilitate referencing it. This document is identical to the last distributed version of Son of 1036, dated 2 June 1994, except for reformatting, correction of a few minor factual or formatting errors, completion of the then-empty Appendix D and of the References section, minor editing to match preferred RFC style, and changes to leading and trailing material. Remarks enclosed within "{...}" indicate explanatory material not present in the original version. References to the current MIME standards (and a few others) have been added (that was an unresolved issue in 1994). The technical content remains unchanged, including the references to the document itself as a Draft rather than an RFC and the presence of unresolved issues. The original section numbering has been preserved, although the original pagination has not (among other reasons, it did not fully follow IETF formatting standards). READERS ARE CAUTIONED THAT THIS DOCUMENT IS OBSOLETE AND SHOULD NOT BE USED AS A TECHNICAL REFERENCE. Although Son of 1036 largely documented existing practice, it also proposed some changes, some of which did not catch on or are no longer considered good ideas. (Of particular note, the MIME type "message/news" should not be used.) Consult [RFC5536] and [RFC5537] for modern technical information.
Although a number of people contributed useful comments or criticism during the preparation of this document, its contents are entirely the opinions of the author circa 1994. Not even the author himself agrees with them all now. The author thanks Charles Lindsey for his assistance in getting this document cleaned up and formally published at last (not least, for supplying some prodding to actually get it done!). The author thanks Luc Rooijakkers for supplying the MIME summary that Appendix B is based on. Original Abstract This Draft defines the format and procedures for interchange of network news articles. It is hoped that a later version of this Draft will obsolete RFC 1036, reflecting more recent experience and accommodating future directions. Network news articles resemble mail messages but are broadcast to potentially large audiences, using a flooding algorithm that propagates one copy to each interested host (or group thereof), typically stores only one copy per host, and does not require any central administration or systematic registration of interested users. Network news originated as the medium of communication for Usenet, circa 1980. Since then, Usenet has grown explosively, and many Internet sites participate in it. In addition, the news technology is now in widespread use for other purposes, on the Internet and elsewhere. This Draft primarily codifies and organizes existing practice. A few small extensions have been added in an attempt to solve problems that are considered serious. Major extensions (e.g., cryptographic authentication) that need significant development effort are left to be undertaken as independent efforts. 1. Introduction Network news articles resemble mail messages but are broadcast to potentially large audiences, using a flooding algorithm that propagates one copy to each interested host (or groups thereof), typically stores only one copy per host, and does not require any central administration or systematic registration of interested users. Network news originated as the medium of communication for Usenet, circa 1980. Since then, Usenet has grown explosively, and many Internet sites participate in it. In addition, the news technology is now in widespread use for other purposes, on the Internet and elsewhere.
The earliest news interchange used the so-called "A News" article format. Shortly thereafter, an article format vaguely resembling Internet mail was devised and used briefly. Both of those formats are completely obsolete; they are documented in Appendix A for historical reasons only. With the publication of [RFC850] in 1983, news articles came to closely resemble Internet mail messages, with some restrictions and some additional headers. In 1987, [RFC1036] updated [RFC850] without making major changes. In the intervening five years, the [RFC1036] article format has proven quite satisfactory, although minor extensions appear desirable to match recent developments in areas such as multi-media mail. [RFC1036] itself has not proven quite so satisfactory. It is often rather vague and does not address some issues at all; this has caused significant interoperability problems at times, and implementations have diverged somewhat. Worse, although it was intended primarily to document existing practice, it did not precisely match existing practice even at the time it was published, and the deviations have grown since. This Draft attempts to specify the format of articles, and the procedures used to exchange them and process them, in sufficient detail to allow full interoperability. In addition, some tentative suggestions are made about directions for future development, in an attempt to avert unnecessary divergence and consequent loss of interoperability. Major extensions (e.g., cryptographic authentication) that need significant development effort are left to be undertaken as independent efforts. NOTE: One question all of this may raise is: why is there no News- Version header, analogous to MIME-Version, specifying a version number corresponding to this specification? The answer is: it doesn't appear to be useful, given news's backward-compatibility constraints. The major use of a version number is indicating which of several INCOMPATIBLE interpretations is relevant. The impossibility of orchestrating any sort of simultaneous change over news's installed base makes it necessary to avoid such incompatible changes (as opposed to extensions) entirely. MIME has a version number mostly because it introduced incompatible changes to the interpretation of several "Content-" headers. This Draft attempts no changes in interpretation, and it appears doubtful that future Drafts will find it feasible to introduce any. UNRESOLVED ISSUE: Should this be reconsidered? Only if the header has SPECIFIC IDENTIFIABLE uses today. Otherwise, it's just useless added bulk.
As in this Draft's predecessors, the exact means used to transmit articles from one host to another is not specified. Network News Transfer Protocol (NNTP) [RFC977] {since replaced by [RFC3977]} is probably the most common transmission method on the Internet, but a number of others are known to be in use, including the Unix-To-Unix Copy Protocol [UUCP], which was extensively used in the early days of Usenet and is still much used on its fringes today. Several of the mechanisms described in this Draft may seem somewhat strange or even bizarre at first reading. As with Internet mail, there is no reasonable possibility of updating the entire installed base of news software promptly, so interoperability with old software is crucial and will remain so. Compatibility with existing practice and robustness in an imperfect world necessarily take priority over elegance. 2. Definitions, Notations, and Conventions 2.1. Textual Notations Throughout this Draft, "MAIL" is short for "[RFC822] as amended by [RFC1123]". ([RFC1123]'s amendments are mostly relatively small, but they are not insignificant.) See also the discussion in Section 3 about this Draft's relationship to MAIL. "MIME" is short for "[RFC1341] and [RFC1342]" (or their {since} updated replacements {[RFC2045], [RFC2046], and [RFC2047]}). UNRESOLVED ISSUE: Update these numbers {now resolved!}. {NOTE: Since the original publication of this Draft [RFC822] has been updated, firstly to [RFC2822] and more recently to [RFC5322]; however, this Draft is firmly rooted in the original [RFC822]. Similarly, [RFC821] has also received two upgrades in the meantime.} "ASCII" is short for "the ANSI X3.4 character set" [X3.4]. While "ASCII" is often misused to refer to various character sets somewhat similar to X3.4, in this Draft, "ASCII" means [X3.4] and only [X3.4]. NOTE: The name is traditional (to the point where the ANSI standard sanctions it), even though it is no longer an acronym for the name of the standard. NOTE: ASCII, X3.4, contains 128 characters, not all of them printable. Character sets with more characters are not ASCII, although they may include it as a subset.
Certain words used to define the significance of individual requirements are capitalized. "MUST" means that the item is an absolute requirement of the specification. "SHOULD" means that the item is a strong recommendation: there may be valid reasons to ignore it in unusual circumstances, but this should be done only after careful study of the full implications and a firm conclusion that it is necessary, because there are serious disadvantages to doing so. "MAY" means that the item is truly optional, and implementors and users are warned that conformance is possible but not to be relied on. The term "compliant", applied to implementations, etc., indicates satisfaction of all relevant "MUST" and "SHOULD" requirements. The term "conditionally compliant" indicates satisfaction of all relevant "MUST" requirements but violation of at least one relevant "SHOULD" requirement. This Draft contains explanatory notes using the following format. These may be skipped by persons interested solely in the content of the specification. The purpose of the notes is to explain why choices were made, to place them in context, or to suggest possible implementation techniques. NOTE: While such explanatory notes may seem superfluous in principle, they often help the less-than-omniscient reader grasp the purpose of the specification and the constraints involved. Given the limitations of natural language for descriptive purposes, this improves the probability that implementors and users will understand the true intent of the specification in cases where the wording is not entirely clear. All numeric values are given in decimal unless otherwise indicated. Octets are assumed to be unsigned values for this purpose. Large numbers are written using the North American convention, in which "," separates groups of three digits but otherwise has no significance. 2.2. Syntax Notation Although the mechanisms specified in this Draft are all described in prose, most are also described formally in the modified BNF notation of [RFC822]. Implementors will need to be familiar with this notation to fully understand this specification and are referred to [RFC822] for a complete explanation of the modified BNF notation. Here is a brief illustrative example:
sentence = clause *( punct clause ) "." punct = ":" / ";" clause = 1*word [ "(" clause ")" / "," 1*word ] word = <any English word> This defines a sentence as some clauses separated by puncts and ended by a period, a punct as a colon or semicolon, a clause as at least one <word> optionally followed by either a parenthesized clause or a comma and at least one more <word>, and a <word> as (informally) any English word. The characters "<>" are used to enclose names when (and only when) distinguishing them from surrounding text is useful. The full form of the repetition notation is "<m>*<n><thing>", denoting <m> through <n> repetitions of <thing>; <m> defaults to zero, <n> to infinity, and the "*" and <n> can be omitted if <m> and <n> are equal, so 1*word is one or more words, 1*5word is one through five words, and 2word is exactly two words. The character "\" is not special in any way in this notation. This Draft is intended to be self-contained; all syntax rules used in it are defined within it, and a rule with the same name as one found in MAIL does not necessarily have the same definition. The lexical layer of MAIL is NOT, repeat NOT, used in this Draft, and its presence must not be assumed; notably, this Draft spells out all places where white space is permitted/required and all places where constructs resembling MAIL comments can occur. NOTE: News parsers historically have been much less permissive than MAIL parsers. 2.3. Definitions The term "character set", wherever it is used in this Draft, refers to a coded character set, in the sense of ISO character set standardization work, and must not be misinterpreted as meaning merely "a set of characters". In this Draft, ASCII character 32 is referred to as "blank"; the word "space" has a more generic meaning. An "article" is the unit of news, analogous to a MAIL "message". A "poster" is a human being (or software equivalent) submitting a possibly compliant article to be "posted", i.e., made available for reading on all relevant hosts. A "posting agent" is software that assists posters to prepare articles, including determining whether the final article is compliant, passing it on to a relayer for posting if so, and returning it to the poster with an explanation if
not. A "relayer" is software that receives allegedly compliant articles from posting agents and/or other relayers, files copies in a "news database", and possibly passes copies on to other relayers. NOTE: While the same software may well function both as a relayer and as part of a posting agent, the two functions are distinct and should not be confused. The posting agent's purpose is (in part) to validate an article, supply header information that can or should be supplied automatically, and generally take reasonable actions in an attempt to transform the poster's submission into a compliant article. The relayer's purpose is to move already- compliant articles around efficiently without damaging them. A "reader" is a human being reading news articles. A "reading agent" is software that presents articles to a reader. NOTE: Informal usage often uses "reader" for both these meanings, but this introduces considerable potential for confusion and misunderstanding, so this Draft takes care to make the distinction. A "newsgroup" is a single news forum, a logical bulletin board, having a name and nominally intended for articles on a specific topic. An article is "posted to" a single newsgroup or several newsgroups. When an article is posted to more than one newsgroup, it is said to be "cross-posted"; note that this differs from posting the same text as part of each of several articles, one per newsgroup. A "hierarchy" is the set of all newsgroups whose names share a first component (see the name syntax in Section 5.5). A newsgroup may be "moderated", in which case submissions are not posted directly, but mailed to a "moderator" for consideration and possible posting. Moderators are typically human but may be implemented partially or entirely in software. A "followup" is an article containing a response to the contents of an earlier article (the followup's "precursor"). A "followup agent" is a combination of reading agent and posting agent that aids in the preparation and posting of a followup. Text comparisons are "case-sensitive" if they consider uppercase letters (e.g., "A") different from lowercase letters (e.g., "a"), and "case-insensitive" if letters differing only in case (e.g., "A" and "a") are considered identical. Categories of text are said to be case-(in)sensitive if comparisons of such texts to others are case- (in)sensitive.
A "cooperating subnet" is a set of news-exchanging hosts that is sufficiently well-coordinated (typically via a central administration of some sort) that stronger assumptions can be made about hosts in the set than about news hosts in general. This is typically used to relax restrictions that are otherwise required for worst-case interoperability; members of a cooperating subnet MAY interchange articles that do not conform to this Draft's specifications, provided all members have agreed to this and provided the articles are not permitted to leak out of the subnet. The word "subnet" is used to emphasize that a cooperating subnet is typically not an isolated universe; care must be taken that traffic leaving the subnet complies with the restrictions of the larger net, not just those of the cooperating subnet. A "message ID" is a unique identifier for an article, usually supplied by the posting agent that posted it. It distinguishes the article from every other article ever posted anywhere (in theory). Articles with the same message ID are treated as identical copies of the same article even if they are not in fact identical. A "gateway" is software that receives news articles and converts them to messages of some other kind (e.g., mail to a mailing list), or vice versa; in essence, it is a translating relayer that straddles boundaries between different methods of message exchange. The most common type of gateway connects newsgroup(s) to mailing list(s), either unidirectionally or bidirectionally, but there are also gateways between news networks using this Draft's news format and those using other formats. A "control message" is an article that is marked as containing control information; a relayer receiving such an article will (subject to permissions, etc.) take actions beyond just filing and passing on the article. NOTE: "Control article" would be more consistent terminology, but "control message" is already well established. An article's "reply address" is the address to which mailed replies should be sent. This is the address specified in the article's From header (see Section 5.2), unless it also has a Reply-To header (see Section 6.3). The notation (for example) "(ASCII 17)" following a name means "this name refers to the ASCII character having value 17". An "ASCII printable character" is an ASCII character in the range 33-126. An "ASCII control character" is an ASCII character in the range 0-31, or the character DEL (ASCII 127). A "non-ASCII character" is a character having a value exceeding 127.
NOTE: Blank is neither an "ASCII printable character" nor an "ASCII control character". 2.4. End-of-Line How the end of a text line is represented depends on the context and the implementation. For Internet transmission via protocols such as SMTP [RFC821], an end-of-line is a CR (ASCII 13) followed by an LF (ASCII 10). ISO C [ISO/IEC9899] and many modern operating systems indicate end-of-line with a single character, typically ASCII LF (aka "newline"), and this is the normal convention when news is transmitted via UUCP. A variety of other methods are in use, including out-of-band methods in which there is no specific character that means end-of-line. This Draft does not constrain how end-of-line is represented in news, except that characters other than CR and LF MUST NOT be usurped for use in end-of-line representations. Also, obviously, all software dealing with a particular copy of an article must agree on the convention to be used. "EOL" is used to mean "whatever end-of-line representation is appropriate"; it is not necessarily a character or sequence of characters. NOTE: If faced with picking an EOL representation in the absence of other constraints, use of a single character simplifies processing, and the ASCII standard [X3.4] specifies that if one character is to be used for this purpose, it should be LF (ASCII 10). NOTE: Inside MIME encodings, use of the Internet canonical EOL representation (CR followed by LF) is mandatory. See [RFC2049]. 2.5. Case-Sensitivity Text in newsgroup names, header parameters, etc. is case-sensitive unless stated otherwise. NOTE: This is at variance with MAIL, which is case-insensitive unless stated otherwise, but is consistent with news historical practice and existing news software. See the comments on backward compatibility in Section 1. 2.6. Language Various constant strings in this Draft, such as header names and month names, are derived from English words. Despite their derivation, these words do NOT change when the poster or reader employing them is interacting in a language other than English.
Posting and reading agents SHOULD translate as appropriate in their interaction with the poster or reader, but the forms that actually appear in articles are always the English-derived ones defined in this Draft. 3. Relation to MAIL (RFC822, etc.) The primary intent of this Draft is to completely describe the news article format as a subset of MAIL's message format (augmented by some new headers). Unless explicitly noted otherwise, the intent throughout is that an article MUST also be a valid MAIL message. NOTE: Despite obvious similarities between news and mail, opinions vary on whether it is possible or desirable to unify them into a single service. However, it is unquestionably both possible and useful to employ some of the same tools for manipulating both mail messages and news articles, so there is specific advantage to be had in defining them compatibly. Furthermore, there is no apparent need to re-invent the wheel when slight extensions to an existing definition will suffice. Given that this Draft attempts to be self-contained, it inevitably contains considerable repetition of information found in MAIL. This raises the possibility of unintentional conflicts. Unless specifically noted otherwise, any wording in this Draft that permits behavior that is not MAIL-compliant is erroneous and should be followed only to the extent that the result remains compliant with MAIL. NOTE: [RFC1036] said "where this standard conflicts with the Internet Standard, RFC 822 should be considered correct and this standard in error". Taken literally, this was obviously incorrect, since [RFC1036] imposed a number of restrictions not found in [RFC822]. The intent, however, was reasonable: to indicate that UNINTENTIONAL differences were errors in [RFC1036]. Implementors and users should note that MAIL is deliberately an extensible standard, and most extensions devised for mail are also relevant to (and compatible with) news. Note particularly MIME, summarized briefly in Appendix B, which extends MAIL in a number of useful ways that are definitely relevant to news. Also of note is the work in progress on reconciling Privacy Enhanced Mail (PEM), which defines extensions for authentication and security) with MIME, after which this may also be relevant to news. UNRESOLVED ISSUE: Update the MIME/PEM information.
Similarly, descriptions here of MIME facilities should be considered correct only to the extent that they do not require or legitimize practices that would violate those RFCs. (Note that this Draft does extend the application of some MIME facilities, but this is an extension rather than an alteration.)