9. Augmented BNF Syntax for NNTP
9.1. Introduction
Each of the following sections describes the syntax of a major element of NNTP. This syntax extends and refines the descriptions elsewhere in this specification and should be given precedence when resolving apparent conflicts. Note that ABNF [RFC4234] strings are case insensitive. Non-terminals used in several places are defined in a separate section at the end.
Between them, the non-terminals <command-line>, <command-datastream>, <command-continuation>, and <response> specify the text that flows between client and server. A consistent naming scheme is used in this document for the non-terminals relating to each command, and SHOULD be used by the specification of registered extensions. For each command, the sequence is as follows: o The client sends an instance of <command-line>; the syntax for the EXAMPLE command is <example-command>. o If the client is one that immediately streams data, it sends an instance of <command-datastream>; the syntax for the EXAMPLE command is <example-datastream>. o The server sends an instance of <response>. * The initial response line is independent of the command that generated it; if the 000 response has arguments, the syntax of the initial line is <response-000-content>. * If the response is multi-line, the initial line is followed by a <multi-line-data-block>. The syntax for the contents of this block after "dot-stuffing" has been removed is (for the 000 response to the EXAMPLE command) <example-000-ml-content> and is an instance of <multi-line-response-content>. o While the latest response is one that indicates more data is required (in general, a 3xx response): * the client sends an instance of <command-continuation>; the syntax for the EXAMPLE continuation following a 333 response is <example-333-continuation>; * the server sends another instance of <response>, as above. (There are no commands in this specification that immediately stream data, but this non-terminal is defined for the convenience of extensions.)
9.2. Commands
This syntax defines the non-terminal <command-line>, which represents what is sent from the client to the server (see section 3.1 for limits on lengths). command-line = command EOL command = X-command X-command = keyword *(WS token) command =/ article-command / body-command / capabilities-command / date-command / group-command / hdr-command / head-command / help-command / ihave-command / last-command / list-command / listgroup-command / mode-reader-command / newgroups-command / newnews-command / next-command / over-command / post-command / quit-command / stat-command article-command = "ARTICLE" [WS article-ref] body-command = "BODY" [WS article-ref] capabilities-command = "CAPABILITIES" [WS keyword] date-command = "DATE" group-command = "GROUP" [WS newsgroup-name] hdr-command = "HDR" WS header-meta-name [WS range-ref] head-command = "HEAD" [WS article-ref] help-command = "HELP" ihave-command = "IHAVE" WS message-id last-command = "LAST" list-command = "LIST" [WS list-arguments] listgroup-command = "LISTGROUP" [WS newsgroup-name [WS range]] mode-reader-command = "MODE" WS "READER" newgroups-command = "NEWGROUPS" WS date-time newnews-command = "NEWNEWS" WS wildmat WS date-time next-command = "NEXT" over-command = "OVER" [WS range-ref]
post-command = "POST" quit-command = "QUIT" stat-command = "STAT" [WS article-ref] article-ref = article-number / message-id date = date2y / date4y date4y = 4DIGIT 2DIGIT 2DIGIT date2y = 2DIGIT 2DIGIT 2DIGIT date-time = date WS time [WS "GMT"] header-meta-name = header-name / metadata-name list-arguments = keyword [WS token] metadata-name = ":" 1*A-NOTCOLON range = article-number ["-" [article-number]] range-ref = range / message-id time = 2DIGIT 2DIGIT 2DIGIT9.3. Command Continuation
This syntax defines the further material sent by the client in the case of multi-stage commands and those that stream data. command-datastream = UNDEFINED ; not used, provided as a hook for extensions command-continuation = ihave-335-continuation / post-340-continuation ihave-335-continuation = encoded-article post-340-continuation = encoded-article encoded-article = multi-line-data-block ; after undoing the "dot-stuffing", this MUST match <article>9.4. Responses
9.4.1. Generic Responses
This syntax defines the non-terminal <response>, which represents the generic form of responses; that is, what is sent from the server to the client in response to a <command> or a <command-continuation>. response = simple-response / multi-line-response simple-response = initial-response-line multi-line-response = initial-response-line multi-line-data-block initial-response-line = initial-response-content [SP trailing-comment] CRLF initial-response-content = X-initial-response-content X-initial-response-content = 3DIGIT *(SP response-argument)
response-argument = 1*A-CHAR trailing-comment = *U-CHAR9.4.2. Initial Response Line Contents
This syntax defines the specific initial response lines for the various commands in this specification (see section 3.1 for limits on lengths). Only those response codes with arguments are listed. initial-response-content =/ response-111-content / response-211-content / response-220-content / response-221-content / response-222-content / response-223-content / response-401-content response-111-content = "111" SP date4y time response-211-content = "211" 3(SP article-number) SP newsgroup-name response-220-content = "220" SP article-number SP message-id response-221-content = "221" SP article-number SP message-id response-222-content = "222" SP article-number SP message-id response-223-content = "223" SP article-number SP message-id response-401-content = "401" SP capability-label9.4.3. Multi-line Response Contents
This syntax defines the content of the various multi-line responses; more precisely, it defines the part of the response in the multi-line data block after any "dot-stuffing" has been undone. The numeric portion of each non-terminal name indicates the response code that is followed by this data. multi-line-response-content = article-220-ml-content / body-222-ml-content / capabilities-101-ml-content / hdr-225-ml-content / head-221-ml-content / help-100-ml-content / list-215-ml-content / listgroup-211-ml-content / newgroups-231-ml-content / newnews-230-ml-content / over-224-ml-content article-220-ml-content = article body-222-ml-content = body capabilities-101-ml-content = version-line CRLF
*(capability-line CRLF) hdr-225-ml-content = *(article-number SP hdr-content CRLF) head-221-ml-content = 1*header help-100-ml-content = *(*U-CHAR CRLF) list-215-ml-content = list-content listgroup-211-ml-content = *(article-number CRLF) newgroups-231-ml-content = active-groups-list newnews-230-ml-content = *(message-id CRLF) over-224-ml-content = *(article-number over-content CRLF) active-groups-list = *(newsgroup-name SPA article-number SPA article-number SPA newsgroup-status CRLF) hdr-content = *S-NONTAB hdr-n-content = [(header-name ":" / metadata-name) SP hdr-content] list-content = body newsgroup-status = %x79 / %x6E / %x6D / private-status over-content = 1*6(TAB hdr-content) / 7(TAB hdr-content) *(TAB hdr-n-content) private-status = token ; except the values in newsgroup-status9.5. Capability Lines
This syntax defines the generic form of a capability line in the capabilities list (see Section 3.3.1). capability-line = capability-entry capability-entry = X-capability-entry X-capability-entry = capability-label *(WS capability-argument) capability-label = keyword capability-argument = token This syntax defines the specific capability entries for the capabilities in this specification. capability-entry =/ hdr-capability / ihave-capability / implementation-capability / list-capability / mode-reader-capability / newnews-capability / over-capability / post-capability / reader-capability hdr-capability = "HDR" ihave-capability = "IHAVE" implementation-capability = "IMPLEMENTATION" *(WS token)
list-capability = "LIST" 1*(WS keyword) mode-reader-capability = "MODE-READER" newnews-capability = "NEWNEWS" over-capability = "OVER" [WS "MSGID"] post-capability = "POST" reader-capability = "READER" version-line = "VERSION" 1*(WS version-number) version-number = nzDIGIT *5DIGIT9.6. LIST Variants
This section defines more specifically the keywords for the LIST command and the syntax of the corresponding response contents. ; active list-arguments =/ "ACTIVE" [WS wildmat] list-content =/ list-active-content list-active-content = active-groups-list ; active.times list-arguments =/ "ACTIVE.TIMES" [WS wildmat] list-content =/ list-active-times-content list-active-times-content = *(newsgroup-name SPA 1*DIGIT SPA newsgroup-creator CRLF) newsgroup-creator = U-TEXT ; distrib.pats list-arguments =/ "DISTRIB.PATS" list-content =/ list-distrib-pats-content list-distrib-pats-content = *(1*DIGIT ":" wildmat ":" distribution CRLF) distribution = token ; headers list-arguments =/ "HEADERS" [WS ("MSGID" / "RANGE")] list-content =/ list-headers-content list-headers-content = *(header-meta-name CRLF) / *((metadata-name / ":") CRLF) ; newsgroups list-arguments =/ "NEWSGROUPS" [WS wildmat] list-content =/ list-newsgroups-content list-newsgroups-content =
*(newsgroup-name WS newsgroup-description CRLF) newsgroup-description = S-TEXT ; overview.fmt list-arguments =/ "OVERVIEW.FMT" list-content =/ list-overview-fmt-content list-overview-fmt-content = "Subject:" CRLF "From:" CRLF "Date:" CRLF "Message-ID:" CRLF "References:" CRLF ( ":bytes" CRLF ":lines" / "Bytes:" CRLF "Lines:") CRLF *((header-name ":full" / metadata-name) CRLF)9.7. Articles
This syntax defines the non-terminal <article>, which represents the format of an article as described in Section 3.6. article = 1*header CRLF body header = header-name ":" [CRLF] SP header-content CRLF header-content = *(S-CHAR / [CRLF] WS) body = *(*B-CHAR CRLF)9.8. General Non-terminals
These non-terminals are used at various places in the syntax and are collected here for convenience. A few of these non-terminals are not used in this specification but are provided for the consistency and convenience of extension authors. multi-line-data-block = content-lines termination content-lines = *([content-text] CRLF) content-text = (".." / B-NONDOT) *B-CHAR termination = "." CRLF article-number = 1*16DIGIT header-name = 1*A-NOTCOLON keyword = ALPHA 2*(ALPHA / DIGIT / "." / "-") message-id = "<" 1*248A-NOTGT ">" newsgroup-name = 1*wildmat-exact token = 1*P-CHAR wildmat = wildmat-pattern *("," ["!"] wildmat-pattern) wildmat-pattern = 1*wildmat-item wildmat-item = wildmat-exact / wildmat-wild wildmat-exact = %x22-29 / %x2B / %x2D-3E / %x40-5A / %x5E-7E /
UTF8-non-ascii ; exclude ! * , ? [ \ ] wildmat-wild = "*" / "?" base64 = *(4base64-char) [base64-terminal] base64-char = UPPER / LOWER / DIGIT / "+" / "/" base64-terminal = 2base64-char "==" / 3base64-char "=" ; Assorted special character sets ; A- means based on US-ASCII, excluding controls and SP ; P- means based on UTF-8, excluding controls and SP ; U- means based on UTF-8, excluding NUL CR and LF ; B- means based on bytes, excluding NUL CR and LF A-CHAR = %x21-7E A-NOTCOLON = %x21-39 / %x3B-7E ; exclude ":" A-NOTGT = %x21-3D / %x3F-7E ; exclude ">" P-CHAR = A-CHAR / UTF8-non-ascii U-CHAR = CTRL / TAB / SP / A-CHAR / UTF8-non-ascii U-NONTAB = CTRL / SP / A-CHAR / UTF8-non-ascii U-TEXT = P-CHAR *U-CHAR B-CHAR = CTRL / TAB / SP / %x21-FF B-NONDOT = CTRL / TAB / SP / %x21-2D / %x2F-FF ; exclude "." ALPHA = UPPER / LOWER ; use only when case-insensitive CR = %x0D CRLF = CR LF CTRL = %x01-08 / %x0B-0C / %x0E-1F DIGIT = %x30-39 nzDIGIT = %x31-39 EOL = *(SP / TAB) CRLF LF = %x0A LOWER = %x61-7A SP = %x20 SPA = 1*SP TAB = %x09 UPPER = %x41-5A UTF8-non-ascii = UTF8-2 / UTF8-3 / UTF8-4 UTF8-2 = %xC2-DF UTF8-tail UTF8-3 = %xE0 %xA0-BF UTF8-tail / %xE1-EC 2UTF8-tail / %xED %x80-9F UTF8-tail / %xEE-EF 2UTF8-tail UTF8-4 = %xF0 %x90-BF 2UTF8-tail / %xF1-F3 3UTF8-tail / %xF4 %x80-8F 2UTF8-tail UTF8-tail = %x80-BF WS = 1*(SP / TAB) The following non-terminals require special consideration. They represent situations where material SHOULD be restricted to UTF-8, but implementations MUST be able to cope with other character encodings. Therefore, there are two sets of definitions for them.
Implementations MUST accept any content that meets this syntax: S-CHAR = %x21-FF S-NONTAB = CTRL / SP / S-CHAR S-TEXT = (CTRL / S-CHAR) *B-CHAR and MAY pass such content on unaltered. When generating new content or re-encoding existing content, implementations SHOULD conform to this syntax: S-CHAR = P-CHAR S-NONTAB = U-NONTAB S-TEXT = U-TEXT9.9. Extensions and Validation
The specification of a registered extension MUST include formal syntax that defines additional forms for the following non-terminals: command for each new command other than a variant of the LIST command - the syntax of each command MUST be compatible with the definition of <X-command>; command-datastream for each new command that immediately streams data; command-continuation for each new command that sends further material after the initial command line - the syntax of each continuation MUST be exactly what is sent to the server, including any escape mechanisms such as "dot-stuffing"; initial-response-content for each new response code that has arguments - the syntax of each response MUST be compatible with the definition of <X-initial- response-content>; multi-line-response-content for each new response code that has a multi-line response - the syntax MUST show the response after the lines containing the response code and the terminating octet have been removed and any "dot-stuffing" undone; capability-entry for each new capability label - the syntax of each entry MUST be compatible with the definition of <X-capability-entry>;
list-arguments for each new variant of the LIST command - the syntax of each entry MUST be compatible with the definition of <X-command>; list-content for each new variant of the LIST command - the syntax MUST show the response after the lines containing the 215 response code and the terminating octet have been removed and any "dot-stuffing" undone. The =/ notation of ABNF [RFC4234] and the naming conventions described in Section 9.1 SHOULD be used for this. When the syntax in this specification, or syntax based on it, is validated, it should be noted that: o the non-terminals <command-line>, <command-datastream>, <command-continuation>, <response>, and <multi-line-response-content> describe basic concepts of the protocol and are not referred to by any other rule; o the non-terminal <base64> is provided for the convenience of extension authors and is not referred to by any rule in this specification; o for the reasons given above, the non-terminals <S-CHAR>, <S-NONTAB>, and <S-TEXT> each have two definitions; and o the non-terminal <UNDEFINED> is deliberately not defined.10. Internationalisation Considerations
10.1. Introduction and Historical Situation
RFC 977 [RFC977] was written at a time when internationalisation was not seen as a significant issue. As such, it was written on the assumption that all communication would be in ASCII and use only a 7-bit transport layer, although in practice just about all known implementations are 8-bit clean. Since then, Usenet and NNTP have spread throughout the world. In the absence of standards for handling the issues of language and character sets, countries, newsgroup hierarchies, and individuals have found a variety of solutions that work for them but that are not necessarily appropriate elsewhere. For example, some have adopted a default 8-bit character set appropriate to their needs (such as ISO/IEC 8859-1 in Western Europe or KOI-8 in Russia), others have used ASCII (either US-ASCII or national variants) in headers but
local 16-bit character sets in article bodies, and still others have gone for a combination of MIME [RFC2045] and UTF-8. With the increased use of MIME in email, it is becoming more common to find NNTP articles containing MIME headers that identify the character set of the body, but this is far from universal. The resulting confusion does not help interoperability. One point that has been generally accepted is that articles can contain octets with the top bit set, and NNTP is only expected to operate on 8-bit clean transport paths.10.2. This Specification
Part of the role of this present specification is to eliminate this confusion and promote interoperability as far as possible. At the same time, it is necessary to accept the existence of the present situation and not break existing implementations and arrangements gratuitously, even if they are less than optimal. Therefore, the current practice described above has been taken into consideration in producing this specification. This specification extends NNTP from US-ASCII [ANSI1986] to UTF-8 [RFC3629]. Except in the two areas discussed below, UTF-8 (which is a superset of US-ASCII) is mandatory, and implementations MUST NOT use any other encoding. Firstly, the use of MIME for article headers and bodies is strongly recommended. However, given widely divergent existing practices, an attempt to require a particular encoding and tagging standard would be premature at this time. Accordingly, this specification allows the use of arbitrary 8-bit data in articles subject to the following requirements and recommendations. o The names of headers (e.g., "From" or "Subject") MUST be in US-ASCII. o Header values SHOULD use US-ASCII or an encoding based on it, such as RFC 2047 [RFC2047], until such time as another approach has been standardised. At present, 8-bit encodings (including UTF-8) SHOULD NOT be used because they are likely to cause interoperability problems. o The character set of article bodies SHOULD be indicated in the article headers, and this SHOULD be done in accordance with MIME. o Where an article is obtained from an external source, an implementation MAY pass it on and derive data from it (such as the
response to the HDR command), even though the article or the data does not meet the above requirements. Implementations MUST transfer such articles and data correctly and unchanged; they MUST NOT attempt to convert or re-encode the article or derived data. (Nevertheless, a client or server MAY elect not to post or forward the article if, after further examination of the article, it deems it inappropriate to do so.) This requirement affects the ARTICLE (Section 6.2.1), BODY (Section 6.2.3), HDR (Section 8.5), HEAD (Section 6.2.2), IHAVE (Section 6.3.2), OVER (Section 8.3), and POST (Section 6.3.1) commands. Secondly, the following requirements are placed on the newsgroups list returned by the LIST NEWSGROUPS command (Section 7.6.6): o Although this specification allows UTF-8 for newsgroup names, they SHOULD be restricted to US-ASCII until a successor to RFC 1036 [RFC1036] standardises another approach. 8-bit encodings SHOULD NOT be used because they are likely to cause interoperability problems. o The newsgroup description SHOULD be in US-ASCII or UTF-8 unless and until a successor to RFC 1036 standardises other encoding arrangements. 8-bit encodings other than UTF-8 SHOULD NOT be used because they are likely to cause interoperability problems. o Implementations that obtain this data from an external source MUST handle it correctly even if it does not meet the above requirements. Implementations (in particular, clients) MUST handle such data correctly.10.3. Outstanding Issues
While the primary use of NNTP is for transmitting articles that conform to RFC 1036 (Netnews articles), it is also used for other formats (see Appendix A). It is therefore most appropriate that internationalisation issues related to article formats be addressed in the relevant specifications. For Netnews articles, this is any successor to RFC 1036. For email messages, it is RFC 2822 [RFC2822]. Of course, any article transmitted via NNTP needs to conform to this specification as well. Restricting newsgroup names to UTF-8 is not a complete solution. In particular, when new newsgroup names are created or a user is asked to enter a newsgroup name, some scheme of canonicalisation will need to take place. This specification does not attempt to define that
canonicalization; further work is needed in this area, in conjunction with the article format specifications. Until such specifications are published, implementations SHOULD match newsgroup names octet by octet. It is anticipated that any approved scheme will be applied "at the edges", and therefore octet-by-octet comparison will continue to apply to most, if not all, uses of newsgroup names in NNTP. In the meantime, any implementation experimenting with UTF-8 newsgroup names is strongly cautioned that a future specification may require that those names be canonicalized when used with NNTP in a way that is not compatible with their experiments. Since the primary use of NNTP is with Netnews, and since newsgroup descriptions are normally distributed through specially formatted articles, it is recommended that the internationalisation issues related to them be addressed in any successor to RFC 1036.11. IANA Considerations
This specification requires IANA to keep a registry of capability labels. The initial contents of this registry are specified in Section 3.3.4. As described in Section 3.3.3, labels beginning with X are reserved for private use, while all other names are expected to be associated with a specification in an RFC on the standards track or defining an IESG-approved experimental protocol. Different entries in the registry MUST use different capability labels. Different entries in the registry MUST NOT use the same command name. For this purpose, variants distinguished by a second or subsequent keyword (e.g., "LIST HEADERS" and "LIST OVERVIEW.FMT") count as different commands. If there is a need for two extensions to use the same command, a single harmonised specification MUST be registered.12. Security Considerations
This section is meant to inform application developers, information providers, and users of the security limitations in NNTP as described by this document. The discussion does not include definitive solutions to the problems revealed, though it does make some suggestions for reducing security risks.
12.1. Personal and Proprietary Information
NNTP, because it was created to distribute network news articles, will forward whatever information is stored in those articles. Specification of that information is outside this scope of this document, but it is likely that some personal and/or proprietary information is available in some of those articles. It is very important that designers and implementers provide informative warnings to users so that personal and/or proprietary information in material that is added automatically to articles (e.g., in headers) is not disclosed inadvertently. Additionally, effective and easily understood mechanisms to manage the distribution of news articles SHOULD be provided to NNTP Server administrators, so that they are able to report with confidence the likely spread of any particular set of news articles.12.2. Abuse of Server Log Information
A server is in the position to save session data about a user's requests that might identify their reading patterns or subjects of interest. This information is clearly confidential in nature, and its handling can be constrained by law in certain countries. People using this protocol to provide data are responsible for ensuring that such material is not distributed without the permission of any individuals that are identifiable by the published results.12.3. Weak Authentication and Access Control
There is no user-based or token-based authentication in the basic NNTP specification. Access is normally controlled by server configuration files. Those files specify access by using domain names or IP addresses. However, this specification does permit the creation of extensions to NNTP for such purposes; one such extension is [NNTP-AUTH]. While including such mechanisms is optional, doing so is strongly encouraged. Other mechanisms are also available. For example, a proxy server could be put in place that requires authentication before connecting via the proxy to the NNTP server.12.4. DNS Spoofing
Many existing NNTP implementations authorize incoming connections by checking the IP address of that connection against the IP addresses obtained via DNS lookups of lists of domain names given in local configuration files. Servers that use this type of authentication and clients that find a server by doing a DNS lookup of the server name rely very heavily on the Domain Name Service, and are thus
generally prone to security attacks based on the deliberate misassociation of IP addresses and DNS names. Clients and servers need to be cautious in assuming the continuing validity of an IP number/DNS name association. In particular, NNTP clients and servers SHOULD rely on their name resolver for confirmation of an IP number/DNS name association, rather than cache the result of previous host name lookups. Many platforms already can cache host name lookups locally when appropriate, and they SHOULD be configured to do so. It is proper for these lookups to be cached, however, only when the TTL (Time To Live) information reported by the name server makes it likely that the cached information will remain useful. If NNTP clients or servers cache the results of host name lookups in order to achieve a performance improvement, they MUST observe the TTL information reported by DNS. If NNTP clients or servers do not observe this rule, they could be spoofed when a previously accessed server's IP address changes. As network renumbering is expected to become increasingly common, the possibility of this form of attack will increase. Observing this requirement thus reduces this potential security vulnerability. This requirement also improves the load-balancing behaviour of clients for replicated servers using the same DNS name and reduces the likelihood of a user's experiencing failure in accessing sites that use that strategy.12.5. UTF-8 Issues
UTF-8 [RFC3629] permits only certain sequences of octets and designates others as either malformed or "illegal". The Unicode standard identifies a number of security issues related to illegal sequences and forbids their generation by conforming implementations. Implementations of this specification MUST NOT generate malformed or illegal sequences and SHOULD detect them and take some appropriate action. This could include the following: o Generating a 501 response code. o Replacing such sequences by the sequence %xEF.BF.BD, which encodes the "replacement character" U+FFFD. o Closing the connection. o Replacing such sequences by a "guessed" valid sequence (based on properties of the UTF-8 encoding).
In the last case, the implementation MUST ensure that any replacement cannot be used to bypass validity or security checks. For example, the illegal sequence %xC0.A0 is an over-long encoding for space (%x20). If it is replaced by the correct encoding in a command line, this needs to happen before the command line is parsed into individual arguments. If the replacement came after parsing, it would be possible to generate an argument with an embedded space, which is forbidden. Use of the "replacement character" does not have this problem, since it is permitted wherever non-US-ASCII characters are. Implementations SHOULD use one of the first two solutions where the general structure of the NNTP stream remains intact and SHOULD close the connection if it is no longer possible to parse it sensibly.12.6. Caching of Capability Lists
The CAPABILITIES command provides a capability list, which is information about the current capabilities of the server. Whenever there is a relevant change to the server state, the results of this command are required to change accordingly. In most situations, the capabilities list in a given server state will not change from session to session; for example, a given extension will be installed permanently on a server. Some clients may therefore wish to remember which extensions a server supports to avoid the delay of an additional command and response, particularly if they open multiple connections in the same session. However, information about extensions related to security and privacy MUST NOT be cached, since this could allow a variety of attacks. For example, consider a server that permits the use of cleartext passwords on links that are encrypted but not otherwise: [Initial connection set-up completed.] [S] 200 NNTP Service Ready, posting permitted [C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] NEWNEWS [S] POST [S] XENCRYPT [S] LIST ACTIVE NEWSGROUPS [S] . [C] XENCRYPT [Client and server negotiate encryption on the link] [S] 283 Encrypted link established
[C] CAPABILITIES [S] 101 Capability list: [S] VERSION 2 [S] READER [S] NEWNEWS [S] POST [S] XSECRET [S] LIST ACTIVE NEWSGROUPS [S] . [C] XSECRET fred flintstone [S] 290 Password for fred accepted If the client caches the last capabilities list, then on the next session it will attempt to use XSECRET on an unencrypted link: [Initial connection set-up completed.] [S] 200 NNTP Service Ready, posting permitted [C] XSECRET fred flintstone [S] 483 Only permitted on secure links This exposes the password to any eavesdropper. While the primary cause of this is passing a secret without first checking the security of the link, caching of capability lists can increase the risk. Any security extension should include requirements to check the security state of the link in a manner appropriate to that extension. Caching should normally only be considered for anonymous clients that do not use any security or privacy extensions and for which the time required for an additional command and response is a noticeable issue.13. Acknowledgements
This document is the result of much effort by the present and past members of the NNTP Working Group, chaired by Russ Allbery and Ned Freed. It could not have been produced without them. The author acknowledges the original authors of NNTP as documented in RFC 977 [RFC977]: Brian Kantor and Phil Lapsey. The author gratefully acknowledges the following: o The work of the NNTP committee chaired by Eliot Lear. The organization of this document was influenced by the last available version from this working group. A special thanks to Eliot for generously providing the original machine-readable sources for that document.
o The work of the DRUMS working group, specifically RFC 1869 [RFC1869], that drove the original thinking that led to the CAPABILITIES command and the extensions mechanism detailed in this document. o The authors of RFC 2616 [RFC2616] for providing specific and relevant examples of security issues that should be considered for HTTP. Since many of the same considerations exist for NNTP, those examples that are relevant have been included here with some minor rewrites. o The comments and additional information provided by the following individuals in preparing one or more of the progenitors of this document: Russ Allbery <rra@stanford.edu> Wayne Davison <davison@armory.com> Chris Lewis <clewis@bnr.ca> Tom Limoncelli <tal@mars.superlink.net> Eric Schnoebelen <eric@egsner.cirr.com> Rich Salz <rsalz@osf.org> This work was motivated by the work of various news reader authors and news server authors, including those listed below: Rick Adams Original author of the NNTP extensions to the RN news reader and last maintainer of Bnews. Stan Barber Original author of the NNTP extensions to the news readers that are part of Bnews. Geoff Collyer Original author of the OVERVIEW database proposal and one of the original authors of CNEWS. Dan Curry Original author of the xvnews news reader. Wayne Davison Author of the first threading extensions to the RN news reader (commonly called TRN). Geoff Huston Original author of ANU NEWS.
Phil Lapsey Original author of the UNIX reference implementation for NNTP. Iain Lea Original maintainer of the TIN news reader. Chris Lewis First known implementer of the AUTHINFO GENERIC extension. Rich Salz Original author of INN. Henry Spencer One of the original authors of CNEWS. Kim Storm Original author of the NN news reader. Other people who contributed to this document include: Matthias Andree Greg Andruk Daniel Barclay Maurizio Codogno Mark Crispin Andrew Gierth Juergen Helbing Scott Hollenbeck Urs Janssen Charles Lindsey Ade Lovett David Magda Ken Murchison Francois Petillon Peter Robinson Rob Siemborski Howard Swinehart Ruud van Tol Jeffrey Vinocur Erik Warmelink The author thanks them all and apologises to anyone omitted. Finally, the present author gratefully acknowledges the vast amount of work put into previous versions by the previous author: Stan Barber <sob@academ.com>
14. References
14.1. Normative References
[ANSI1986] American National Standards Institute, "Coded Character Set - 7-bit American Standard Code for Information Interchange", ANSI X3.4, 1986. [RFC977] Kantor, B. and P. Lapsley, "Network News Transfer Protocol", RFC 977, February 1986. [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2047] Moore, K., "MIME (Multipurpose Internet Mail Extensions) Part Three: Message Header Extensions for Non-ASCII Text", RFC 2047, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. [RFC4234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", RFC 4234, October 2005. [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, October 2006. [TF.686-1] International Telecommunications Union - Radio, "Glossary, ITU-R Recommendation TF.686-1", ITU-R Recommendation TF.686-1, October 1997.14.2. Informative References
[NNTP-AUTH] Vinocur, J., Murchison, K., and C. Newman, "Network News Transfer Protocol (NNTP) Extension for Authentication", RFC 4643, October 2006. [NNTP-STREAM] Vinocur, J. and K. Murchison, "Network News Transfer Protocol (NNTP) Extension for Streaming Feeds", RFC 4644, October 2006.
[NNTP-TLS] Murchison, K., Vinocur, J., and C. Newman, "Using Transport Layer Security (TLS) with Network News Transfer Protocol (NNTP)", RFC 4642, October 2006. [RFC1036] Horton, M. and R. Adams, "Standard for interchange of USENET messages", RFC 1036, December 1987. [RFC1305] Mills, D., "Network Time Protocol (Version 3) Specification, Implementation and Analysis", RFC 1305, March 1992. [RFC1869] Klensin, J., Freed, N., Rose, M., Stefferud, E., and D. Crocker, "SMTP Service Extensions", STD 10, RFC 1869, November 1995. [RFC2616] Fielding, R., Gettys, J., Mogul, J., Frystyk, H., Masinter, L., Leach, P., and T. Berners-Lee, "Hypertext Transfer Protocol -- HTTP/1.1", RFC 2616, June 1999. [RFC2629] Rose, M., "Writing I-Ds and RFCs using XML", RFC 2629, June 1999. [RFC2822] Resnick, P., "Internet Message Format", RFC 2822, April 2001. [RFC2980] Barber, S., "Common NNTP Extensions", RFC 2980, October 2000. [ROBE1995] Robertson, R., "FAQ: Overview database / NOV General Information", January 1995. There is no definitive copy of this document known to the author. It was previously posted as the Usenet article <news:nov-faq-1-930909720@agate.Berkeley.EDU> [SALZ1992] Salz, R., "Manual Page for wildmat(3) from the INN 1.4 distribution, Revision 1.10", April 1992. There is no definitive copy of this document known to the author.
Appendix A. Interaction with Other Specifications
NNTP is most often used for transferring articles that conform to RFC 1036 [RFC1036] (such articles are called "Netnews articles" here). It is also sometimes used for transferring email messages that conform to RFC 2822 [RFC2822] (such articles are called "email articles" here). In this situation, articles must conform both to this specification and to that other one; this appendix describes some relevant issues.A.1. Header Folding
NNTP allows a header line to be folded (by inserting a CRLF pair) before any space or TAB character. Both email and Netnews articles are required to have at least one octet other than space or TAB on each header line. Thus, folding can only happen at one point in each sequence of consecutive spaces or TABs. Netnews articles are further required to have the header name, colon, and following space all on the first line; folding may only happen beyond that space. Finally, some non-conforming software will remove trailing spaces and TABs from a line. Therefore, it might be inadvisable to fold a header after a space or TAB. For maximum safety, header lines SHOULD conform to the following syntax rather than to that in Section 9.7. header = header-name ":" SP [header-content] CRLF header-content = [WS] token *( [CRLF] WS token )A.2. Message-IDs
Every article handled by an NNTP server MUST have a unique message-id. For the purposes of this specification, a message-id is an arbitrary opaque string that merely needs to meet certain syntactic requirements and is just a way to refer to the article. Because there is a significant risk that old articles will be reinjected into the global Usenet system, RFC 1036 [RFC1036] requires that message-ids are globally unique for all time. This specification states that message-ids are the same if and only if they consist of the same sequence of octets. Other specifications may define two different sequences as being equal because they are putting an interpretation on particular characters. RFC 2822 [RFC2822] has a concept of "quoted" and "escaped" characters. It therefore considers the three message-ids:
<ab.cd@example.com> <"ab.cd"@example.com> <"ab.\cd"@example.com> as being identical. Therefore, an NNTP implementation handing email articles must ensure that only one of these three appears in the protocol and that the other two are converted to it as and when necessary, such as when a client checks the results of a NEWNEWS command against an internal database of message-ids. Note that RFC 1036 [RFC1036] never treats two different strings as being identical. Its successor (as of the time of writing) restricts the syntax of message-ids so that, whenever RFC 2822 would treat two strings as equivalent, only one of them is valid (in the above example, only the first string is valid). This specification does not describe how the message-id of an article is determined; it may be deduced from the contents of the article or derived from some external source. If the server is also conforming to another specification that contains a definition of message-id compatible with this one, the server SHOULD use those message-ids. A common approach, and one that SHOULD be used for email and Netnews articles, is to extract the message-id from the contents of a header with name "Message-ID". This may not be as simple as copying the entire header contents; it may be necessary to strip off comments and undo quoting, or to reduce "equivalent" message-ids to a canonical form. If an article is obtained through the IHAVE command, there will be a message-id provided with the command. The server MAY either use it or determine one from the article contents. However, whichever it does, it SHOULD ensure that, if the IHAVE command is repeated with the same argument and article, it will be recognized as a duplicate. If an article does not contain a message-id that the server can identify, it MUST synthesize one. This could, for example, be a simple sequence number or be based on the date and time when the article arrived. When email or Netnews articles are handled, a Message-ID header SHOULD be added to ensure global consistency and uniqueness. Note that, because the message-id might not have been derived from the Message-ID header in the article, the following example is legitimate (though unusual):
[C] HEAD <45223423@example.com> [S] 221 0 <45223423@example.com> [S] Path: pathost!demo!whitehouse!not-for-mail [S] Message-ID: <1234@example.net> [S] From: "Demo User" <nobody@example.net> [S] Newsgroups: misc.test [S] Subject: I am just a test article [S] Date: 6 Oct 1998 04:38:40 -0500 [S] Organization: An Example Net, Uncertain, Texas [S] .A.3. Article Posting
As far as NNTP is concerned, the POST and IHAVE commands provide the same basic facilities in a slightly different way. However, they have rather different intentions. The IHAVE command is intended for transmitting conforming articles between a system of NNTP servers, with all articles perhaps also conforming to another specification (e.g., all articles are Netnews articles). It is expected that the client will already have done any necessary validation (or that it has in turn obtained the article from a third party that has done so); therefore, the contents SHOULD be left unchanged. In contrast, the POST command is intended for use when an end-user is injecting a newly created article into a such a system. The article being transferred might not be a conforming email or Netnews article, and the server is expected to validate it and, if necessary, to convert it to the right form for onward distribution. This is often done by a separate piece of software on the server installation; if so, the NNTP server SHOULD pass the incoming article to that software unaltered, making no attempt to filter characters, to fold or limit lines, or to process the incoming text otherwise. The POST command can fail in various ways, and clients should be prepared to re-send an article. When doing so, however, it is often important to ensure (as far as possible) that the same message-id is allocated to both attempts so that the server, or other servers, can recognize the two articles as duplicates. In the case of email or Netnews articles, therefore, the posted article SHOULD contain a header with the name "Message-ID", and the contents of this header SHOULD be identical on each attempt. The server SHOULD ensure that two POSTed articles with the same contents for this header are recognized as identical and that the same message-id is allocated, whether or not those contents are suitable for use as the message-id.
Appendix B. Summary of Commands
This section contains a list of every command defined in this document, ordered by command name and by indicating capability. Ordered by command name: +-------------------+-----------------------+---------------+ | Command | Indicating capability | Definition | +-------------------+-----------------------+---------------+ | ARTICLE | READER | Section 6.2.1 | | BODY | READER | Section 6.2.3 | | CAPABILITIES | mandatory | Section 5.2 | | DATE | READER | Section 7.1 | | GROUP | READER | Section 6.1.1 | | HDR | HDR | Section 8.5 | | HEAD | mandatory | Section 6.2.2 | | HELP | mandatory | Section 7.2 | | IHAVE | IHAVE | Section 6.3.2 | | LAST | READER | Section 6.1.3 | | LIST | LIST | Section 7.6.1 | | LIST ACTIVE.TIMES | LIST | Section 7.6.4 | | LIST ACTIVE | LIST | Section 7.6.3 | | LIST DISTRIB.PATS | LIST | Section 7.6.5 | | LIST HEADERS | HDR | Section 8.6 | | LIST NEWSGROUPS | LIST | Section 7.6.6 | | LIST OVERVIEW.FMT | OVER | Section 8.4 | | LISTGROUP | READER | Section 6.1.2 | | MODE READER | MODE-READER | Section 5.3 | | NEWGROUPS | READER | Section 7.3 | | NEWNEWS | NEWNEWS | Section 7.4 | | NEXT | READER | Section 6.1.4 | | OVER | OVER | Section 8.3 | | POST | POST | Section 6.3.1 | | QUIT | mandatory | Section 5.4 | | STAT | mandatory | Section 6.2.4 | +-------------------+-----------------------+---------------+
Ordered by indicating capability: +-------------------+-----------------------+---------------+ | Command | Indicating capability | Definition | +-------------------+-----------------------+---------------+ | CAPABILITIES | mandatory | Section 5.2 | | HEAD | mandatory | Section 6.2.2 | | HELP | mandatory | Section 7.2 | | QUIT | mandatory | Section 5.4 | | STAT | mandatory | Section 6.2.4 | | HDR | HDR | Section 8.5 | | LIST HEADERS | HDR | Section 8.6 | | IHAVE | IHAVE | Section 6.3.2 | | LIST | LIST | Section 7.6.1 | | LIST ACTIVE | LIST | Section 7.6.3 | | LIST ACTIVE.TIMES | LIST | Section 7.6.4 | | LIST DISTRIB.PATS | LIST | Section 7.6.5 | | LIST NEWSGROUPS | LIST | Section 7.6.6 | | MODE READER | MODE-READER | Section 5.3 | | NEWNEWS | NEWNEWS | Section 7.4 | | OVER | OVER | Section 8.3 | | LIST OVERVIEW.FMT | OVER | Section 8.4 | | POST | POST | Section 6.3.1 | | ARTICLE | READER | Section 6.2.1 | | BODY | READER | Section 6.2.3 | | DATE | READER | Section 7.1 | | GROUP | READER | Section 6.1.1 | | LAST | READER | Section 6.1.3 | | LISTGROUP | READER | Section 6.1.2 | | NEWGROUPS | READER | Section 7.3 | | NEXT | READER | Section 6.1.4 | +-------------------+-----------------------+---------------+
Appendix C. Summary of Response Codes
This section contains a list of every response code defined in this document and indicates whether it is multi-line, which commands can generate it, what arguments it has, and what its meaning is. Response code 100 (multi-line) Generated by: HELP Meaning: help text follows. Response code 101 (multi-line) Generated by: CAPABILITIES Meaning: capabilities list follows. Response code 111 Generated by: DATE 1 argument: yyyymmddhhmmss Meaning: server date and time. Response code 200 Generated by: initial connection, MODE READER Meaning: service available, posting allowed. Response code 201 Generated by: initial connection, MODE READER Meaning: service available, posting prohibited. Response code 205 Generated by: QUIT Meaning: connection closing (the server immediately closes the connection). Response code 211 The 211 response code has two completely different forms, depending on which command generated it: (not multi-line) Generated by: GROUP 4 arguments: number low high group Meaning: group selected. (multi-line) Generated by: LISTGROUP 4 arguments: number low high group Meaning: article numbers follow.
Response code 215 (multi-line) Generated by: LIST Meaning: information follows. Response code 220 (multi-line) Generated by: ARTICLE 2 arguments: n message-id Meaning: article follows. Response code 221 (multi-line) Generated by: HEAD 2 arguments: n message-id Meaning: article headers follow. Response code 222 (multi-line) Generated by: BODY 2 arguments: n message-id Meaning: article body follows. Response code 223 Generated by: LAST, NEXT, STAT 2 arguments: n message-id Meaning: article exists and selected. Response code 224 (multi-line) Generated by: OVER Meaning: overview information follows. Response code 225 (multi-line) Generated by: HDR Meaning: headers follow. Response code 230 (multi-line) Generated by: NEWNEWS Meaning: list of new articles follows. Response code 231 (multi-line) Generated by: NEWGROUPS Meaning: list of new newsgroups follows. Response code 235 Generated by: IHAVE (second stage) Meaning: article transferred OK. Response code 240 Generated by: POST (second stage) Meaning: article received OK.
Response code 335 Generated by: IHAVE (first stage) Meaning: send article to be transferred. Response code 340 Generated by: POST (first stage) Meaning: send article to be posted. Response code 400 Generic response and generated by initial connection Meaning: service not available or no longer available (the server immediately closes the connection). Response code 401 Generic response 1 argument: capability-label Meaning: the server is in the wrong mode; the indicated capability should be used to change the mode. Response code 403 Generic response Meaning: internal fault or problem preventing action being taken. Response code 411 Generated by: GROUP, LISTGROUP Meaning: no such newsgroup. Response code 412 Generated by: ARTICLE, BODY, GROUP, HDR, HEAD, LAST, LISTGROUP, NEXT, OVER, STAT Meaning: no newsgroup selected. Response code 420 Generated by: ARTICLE, BODY, HDR, HEAD, LAST, NEXT, OVER, STAT Meaning: current article number is invalid. Response code 421 Generated by: NEXT Meaning: no next article in this group. Response code 422 Generated by: LAST Meaning: no previous article in this group. Response code 423 Generated by: ARTICLE, BODY, HDR, HEAD, OVER, STAT Meaning: no article with that number or in that range.
Response code 430 Generated by: ARTICLE, BODY, HDR, HEAD, OVER, STAT Meaning: no article with that message-id. Response code 435 Generated by: IHAVE (first stage) Meaning: article not wanted. Response code 436 Generated by: IHAVE (either stage) Meaning: transfer not possible (first stage) or failed (second stage); try again later. Response code 437 Generated by: IHAVE (second stage) Meaning: transfer rejected; do not retry. Response code 440 Generated by: POST (first stage) Meaning: posting not permitted. Response code 441 Generated by: POST (second stage) Meaning: posting failed. Response code 480 Generic response Meaning: command unavailable until the client has authenticated itself. Response code 483 Generic response Meaning: command unavailable until suitable privacy has been arranged. Response code 500 Generic response Meaning: unknown command. Response code 501 Generic response Meaning: syntax error in command.
Response code 502 Generic response and generated by initial connection Meaning for the initial connection and the MODE READER command: service permanently unavailable (the server immediately closes the connection). Meaning for all other commands: command not permitted (and there is no way for the client to change this). Response code 503 Generic response Meaning: feature not supported. Response code 504 Generic response Meaning: error in base64-encoding [RFC4648] of an argument.Appendix D. Changes from RFC 977
In general every attempt has been made to ensure that the protocol specification in this document is compatible with the version specified in RFC 977 [RFC977] and the various facilities adopted from RFC 2980 [RFC2980]. However, there have been a number of changes, some compatible and some not. This appendix lists these changes. It is not guaranteed to be exhaustive or correct and MUST NOT be relied on. o A formal syntax specification (Section 9) has been added. o The default character set is changed from US-ASCII [ANSI1986] to UTF-8 [RFC3629] (note that US-ASCII is a subset of UTF-8). This matter is discussed further in Section 10. o All articles are required to have a message-id, eliminating the "<0>" placeholder used in RFC 977 in some responses. o The newsgroup name matching capabilities already documented in RFC 977 ("wildmats", Section 4) are clarified and extended. The new facilities (e.g., the use of commas and exclamation marks) are allowed wherever wildmats appear in the protocol. o Support for pipelining of commands (Section 3.5) is made mandatory.
o The principles behind response codes (Section 3.2) have been tidied up. In particular: * the x8x response code family, formerly used for private extensions, is now reserved for authentication and privacy extensions; * the x9x response code family, formerly intended for debugging facilities, are now reserved for private extensions; * the 502 and 503 generic response codes (Section 3.2.1) have been redefined; * new 401, 403, 480, 483, and 504 generic response codes have been added. o The rules for article numbering (Section 6) have been clarified (also see Section 6.1.1.2). o The SLAVE command (which was ill-defined) is removed from the protocol. o Four-digit years are permitted in the NEWNEWS (Section 7.4) and NEWGROUPS (Section 7.3) commands (two-digit years are still permitted). The optional distribution parameter to these commands has been removed. o The LIST command (Section 7.6.1) is greatly extended; the original is available as LIST ACTIVE, while new variants include ACTIVE.TIMES, DISTRIB.PATS, and NEWSGROUPS. A new "m" status flag is added to the LIST ACTIVE response. o A new CAPABILITIES command (Section 5.2) allows clients to determine what facilities are supported by a server. o The DATE command (Section 7.1) is adopted from RFC 2980 effectively unchanged. o The LISTGROUP command (Section 6.1.2) is adopted from RFC 2980. An optional range argument has been added, and the 211 initial response line now has the same format as the 211 response from the GROUP command. o The MODE READER command (Section 5.3) is adopted from RFC 2980 and its meaning and effects clarified. o The XHDR command in RFC 2980 has been formalised as the new HDR (Section 8.5) and LIST HEADERS (Section 8.6) commands.
o The XOVER command in RFC 2980 has been formalised as the new OVER (Section 8.3) and LIST OVERVIEW.FMT (Section 8.4) commands. The former can be applied to a message-id as well as to a range. o The concept of article metadata (Section 8.1) has been formalised, allowing the Bytes and Lines pseudo-headers to be deprecated. Client authors should note in particular that lack of support for the CAPABILITIES command is a good indication that the server does not support this specification.
Author's Address
Clive D.W. Feather THUS plc 322 Regents Park Road London N3 2QQ United Kingdom Phone: +44 20 8495 6138 Fax: +44 870 051 9937 EMail: clive@demon.net URI: http://www.davros.org/
Full Copyright Statement Copyright (C) The Internet Society (2006). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Intellectual Property The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79. Copies of IPR disclosures made to the IETF Secretariat and any assurances of licenses to be made available, or the result of an attempt made to obtain a general license or permission for the use of such proprietary rights by implementers or users of this specification can be obtained from the IETF on-line IPR repository at http://www.ietf.org/ipr. The IETF invites any interested party to bring to its attention any copyrights, patents or patent applications, or other proprietary rights that may cover technology that may be required to implement this standard. Please address the information to the IETF at ietf- ipr@ietf.org. Acknowledgement Funding for the RFC Editor function is provided by the IETF Administrative Support Activity (IASA).