Internet Architecture Board (IAB) P. Hoffman Request for Comments: 7991 ICANN Obsoletes: 7749 December 2016 Category: Informational ISSN: 2070-1721 The "xml2rfc" Version 3 VocabularyAbstract
This document defines the "xml2rfc" version 3 vocabulary: an XML-based language used for writing RFCs and Internet-Drafts. It is heavily derived from the version 2 vocabulary that is also under discussion. This document obsoletes the v2 grammar described in RFC 7749. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Architecture Board (IAB) and represents information that the IAB has deemed valuable to provide for permanent record. It represents the consensus of the Internet Architecture Board (IAB). Documents approved for publication by the IAB are not a candidate for any level of Internet Standard; see Section 2 of RFC 7841. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc7991. Copyright Notice Copyright (c) 2016 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.
Table of Contents
1. Introduction ....................................................5 1.1. Expected Updates to the Specification ......................5 1.2. Design Criteria for the Changes in v3 ......................6 1.3. Differences from v2 to v3 ..................................6 1.3.1. New Elements in v3 ..................................6 1.3.2. New Attributes for Existing Elements ................7 1.3.3. Elements and Attributes Deprecated from v2 ..........8 1.3.4. Additional Changes from v2 ..........................9 1.4. Syntax Notation ...........................................10 2. Elements .......................................................10 2.1. <abstract> ................................................11 2.2. <address> .................................................12 2.3. <annotation> ..............................................12 2.4. <area> ....................................................13 2.5. <artwork> .................................................13 2.6. <aside> ...................................................17 2.7. <author> ..................................................18 2.8. <back> ....................................................19 2.9. <bcp14> ...................................................20 2.10. <blockquote> .............................................20 2.11. <boilerplate> ............................................22 2.12. <br> .....................................................22 2.13. <city> ...................................................22 2.14. <code> ...................................................22 2.15. <country> ................................................23 2.16. <cref> ...................................................23 2.17. <date> ...................................................24 2.18. <dd> .....................................................25 2.19. <displayreference> .......................................27 2.20. <dl> .....................................................27 2.21. <dt> .....................................................29 2.22. <em> .....................................................30 2.23. <email> ..................................................31 2.24. <eref> ...................................................31 2.25. <figure> .................................................32 2.26. <front> ..................................................34 2.27. <iref> ...................................................35 2.28. <keyword> ................................................36 2.29. <li> .....................................................36 2.30. <link> ...................................................38 2.31. <middle> .................................................39 2.32. <name> ...................................................39 2.33. <note> ...................................................39 2.34. <ol> .....................................................40 2.35. <organization> ...........................................42
2.36. <phone> ..................................................43 2.37. <postal> .................................................43 2.38. <postalLine> .............................................44 2.39. <refcontent> .............................................44 2.40. <reference> ..............................................45 2.41. <referencegroup> .........................................46 2.42. <references> .............................................46 2.43. <region> .................................................47 2.44. <relref> .................................................47 2.45. <rfc> ....................................................51 2.46. <section> ................................................54 2.47. <seriesInfo> .............................................57 2.48. <sourcecode> .............................................59 2.49. <street> .................................................61 2.50. <strong> .................................................61 2.51. <sub> ....................................................62 2.52. <sup> ....................................................63 2.53. <t> ......................................................64 2.54. <table> ..................................................66 2.55. <tbody> ..................................................67 2.56. <td> .....................................................67 2.57. <tfoot> ..................................................69 2.58. <th> .....................................................69 2.59. <thead> ..................................................71 2.60. <title> ..................................................72 2.61. <tr> .....................................................72 2.62. <tt> .....................................................73 2.63. <ul> .....................................................74 2.64. <uri> ....................................................75 2.65. <workgroup> ..............................................75 2.66. <xref> ...................................................75 3. Elements from v2 That Have Been Deprecated .....................78 3.1. <c> .......................................................78 3.2. <facsimile> ...............................................78 3.3. <format> ..................................................79 3.4. <list> ....................................................79 3.5. <postamble> ...............................................80 3.6. <preamble> ................................................81 3.7. <spanx> ...................................................81 3.8. <texttable> ...............................................82 3.9. <ttcol> ...................................................83 3.10. <vspace> .................................................84 4. SVG ............................................................84 5. Use of CDATA Structures and Escaping ...........................85 6. Internationalization Considerations ............................85 7. Security Considerations ........................................85
8. IANA Considerations ............................................86 8.1. Internet Media Type Registration ..........................86 8.2. Link Relation Registration ................................87 9. References .....................................................88 9.1. Normative References ......................................88 9.2. Informative References ....................................88 Appendix A. Front-Page ("Boilerplate") Generation .................93 A.1. The "ipr" Attribute ........................................93 A.1.1. Current Values: "*trust200902" .........................93 A.1.2. Historic Values ........................................95 A.2. The "submissionType" Attribute .............................96 A.3. The "consensus" Attribute ..................................97 Appendix B. The v3 Format and Processing Tools ....................98 B.1. Including External Text with XInclude ......................99 B.2. Anchors and IDs ...........................................100 B.2.1. Overlapping Values ....................................100 B.3. Attributes Controlled by the Prep Tool ....................102 Appendix C. RELAX NG Schema ......................................104 Appendix D. Schema Differences from v2 ...........................127 IAB Members at the Time of Approval ..............................151 Acknowledgements .................................................151 Author's Address .................................................151
1. Introduction
This document describes version 3 ("v3") of the "xml2rfc" vocabulary: an XML-based language ("Extensible Markup Language" [XML]) used for writing RFCs [RFC7322] and Internet-Drafts [IDGUIDE]. This document obsoletes the version 2 vocabulary ("v2") [RFC7749], which contains the extended language definition. That document in turn obsoletes the original version ("v1") [RFC2629]. This document directly copies the material from [RFC7749] where possible. The v3 format will be used as part of the new RFC Series format described in [RFC6949]. The new format will be handled by one or more new tools for preparing the XML and converting it to other representations. Features of the expected tools are described in Appendix B. That section defines some terms used throughout this document, such as "prep tool" and "formatter". Note that the vocabulary contains certain constructs that might not be used when generating the final text; however, they can provide useful data for other uses (such as index generation, populating a keyword database, or syntax checks). In this document, the term "format" is used when describing types of documents, primarily XML and HTML. The term "representation" is used when talking about a specific instantiation of a format, such as an XML document or an HTML document that was created by an XML document.1.1. Expected Updates to the Specification
Non-interoperable changes in later versions of this specification are likely based on experience gained in implementing the new publication toolsets. Revised documents will be published capturing those changes as the toolsets are completed. Other implementers must not expect those changes to remain backwards-compatible with the details described in this document.
1.2. Design Criteria for the Changes in v3
The design criteria of the changes from v2 to v3 are as follows: o The intention is that starting and editing a v3 document will be easier than for a v2 document. o There will be good v2-to-v3 conversion tools for when an author wants to change versions. o There are no current plans to make v3 XML the required submission format for drafts or RFCs. That might happen eventually, but it is likely to be years away. There is a desire to keep as much of the v2 grammar as makes sense within the above design criteria and not to make gratuitous changes to the v2 grammar. Another way to say this is "we would rather encourage backwards compatibility but not be constrained by it." Still, the goal of starting and editing a v3 document being easier than for a v2 document is more important than backwards compatibility with v2, given the latter two design criteria. v3 is upwards compatible with v2, meaning that a v2 document is meant to be a valid v3 document as well. However, some features of v2 are deprecated in v3 in favor of new elements. Deprecated features are listed in Section 1.3.3 and are described in [RFC7749].1.3. Differences from v2 to v3
This is a (hopefully) complete list of all the technical changes between [RFC7749] and this document.1.3.1. New Elements in v3
o Add <dl>, <ul>, and <ol> as new ways to make lists. This is a significant change from v2 in that the child under these elements is <li>, not <t>. <li> has a model of either containing one or more <t> elements, or containing the flowing text normally found in <t>. These lists are children of <section>s and other lists instead of <t>. o Add <strong>, <em>, <tt>, <sub>, and <sup> for character formatting. o Add <aside> for incidental text that will be indented when displayed. o Add <sourcecode> to differentiate from <artwork>.
o Add <table>, <thead>, <tbody>, <tfoot>, <tr>, <td>, and <th> to give table functionality like that in HTML. o Add <boilerplate> to hold the automatically generated boilerplate text. o Add <blockquote> to indicate a quotation as in a paragraph-like format. o Add <name> to sections, notes, figures, and texttables to allow character formatting (fixed-width font) in their titles and to allow references in the names. o Add <postalLine>, free text that represents one line of the address. o Add <displayreference> to allow display of more mnemonic anchor names for automatically included references. o Add <refcontent> to allow better control of text in a reference. o Add <referencegroup> to allow referencing multi-RFC documents such as STDs and BCPs. o Add <relref> to allow referencing specific sections or anchors in references. o Add <link> to point to a resource related to the RFC. o Add <br> to allow line breaks (but not blank lines) in the generated output for table cells. o Add <svg> to allow easy inclusion of SVG drawings in <artwork>.1.3.2. New Attributes for Existing Elements
o Add "sortRefs", "symRefs", "tocDepth", and "tocInclude" attributes to <rfc> to cover Processing Instructions (PIs) that were in v2 that are still needed in the grammar. Add "prepTime" to indicate the time that the XML went through a preparation step. Add "version" to indicate the version of xml2rfc vocabulary used in the document. Add "scripts" to indicate which scripts are needed to render the document. Add "expiresDate" when an Internet-Draft expires.
o Add "ascii" attributes to <email>, <organization>, <street>, <city>, <region>, <country>, and <code>. Also add "asciiFullname", "asciiInitials", and "asciiSurname" to <author>. This allows an author to specify their information in their native scripts as the primary entry and still allow the ASCII-equivalent values to appear in the processed documents. o Add "anchor" attributes to many block elements to allow them to be linked with <relref> and <xref>. o Add the "section", "relative", and "sectionFormat" attributes to <xref>. o Add the "numbered" and "removeInRFC" attributes to <section>. o Add the "removeInRFC" attribute to <note>. o Add "pn" to <artwork>, <aside>, <blockquote>, <boilerplate>, <dt>, <figure>, <iref>, <li>, <references>, <section>, <sourcecode>, <t>, and <table> to hold automatically generated numbers for items in a section that don't have their own numbering (namely figures and tables). o Add "display" to <cref> to indicate to tools whether or not to display the comment. o Add "keepWithNext" and "keepWithPrevious" to <t> as a hint to tools that do pagination that they should try to keep the paragraph with the next/previous element.1.3.3. Elements and Attributes Deprecated from v2
Deprecated elements and attributes are legacy vocabulary from v2 that are supported for input to v3 tools. They are likely to be removed from those tools in the future. Deprecated attributes are still listed in Section 2, and deprecated elements are listed in Section 3. See Appendix B for more information on tools and how they will handle deprecated features. o Deprecate <list> in favor of <dl>, <ul>, and <ol>. o Deprecate <spanx>; replace it with <strong>, <em>, and <tt>. o Deprecate <vspace> because the major use for it, creating pseudo- paragraph-breaks in lists, is now handled properly.
o Deprecate <texttable>, <ttcol>, and <c>; replace them with the new table elements (<table> and the elements that can be contained within it). o Deprecate <facsimile> because it is rarely used. o Deprecate <format> because it is not useful and has caused surprise for authors in the past. If the goal is to provide a single URI (Uniform Resource Identifier) for a reference, use the "target" attribute in <reference> instead. o Deprecate <preamble> and <postamble> in favor of simply using <t> before or after the figure. This also deprecates the "align" attribute in <figure>. o Deprecate the "title" attribute in <section>, <note>, <figure>, <references>, and <texttable> in favor of the new <name>. o Deprecate the "alt" and "src" attributes in <figure> because they overlap with the attributes in <artwork>. o Deprecate the "xml:space" attribute in <artwork> because there was only one useful value. Deprecate the "height" and "width" attributes in both <artwork> and <figure> because they are not needed for the new output formats. o Deprecate the "pageno" attribute in <xref> because it was unused in v2. Deprecate the "none" values for the "format" attribute in <xref> because it makes no sense semantically.1.3.4. Additional Changes from v2
o Allow non-ASCII characters in the format; the characters that are actually allowed will be determined by the RFC Series Editor. o Allow <artwork> and <sourcecode> to be used on their own in <section> (no longer confine them to a figure). o Give more specifics of handling the "type" attribute in <artwork>. o Allow <strong>, <em>, <tt>, <eref>, and <xref> in <cref>. o Allow the sub-elements inside a <reference> to be in any order. o Turn off the autogeneration of anchors in <cref> because there is no use case for them that cannot be achieved in other ways.
o Allow more than one <artwork>, or more than one <sourcecode>, in <figure>. o In <front>, make <date> optional. o In <date>, add restrictions to the "date" and "year" attributes when used in the <front> for the document's boilerplate text. o In <postal>, allow the sub-elements to be in any order. Also allow the inclusion of the new <postalLine> instead of the older elements. o In <section>, restrict the names of the anchors that can be used on some types of sections. o Make <seriesInfo> a child of <front>, and deprecated it as a child of <reference>. This also deprecates some of the attributes from <rfc> and moves them into <seriesInfo>. o <t> now only contains non-block elements, so it no longer contains <figure> elements. o Do not generate the grammar from a DTD, but instead get it directly from the RELAX Next Generation (RNG) grammar [RNG].1.4. Syntax Notation
The XML vocabulary here is defined in prose, based on the RELAX NG schema [RNC] contained in Appendix C (specified in RELAX NG Compact Notation (RNC)). Note that the schema can be used for automated validity checks, but certain constraints are only described in prose (example: the conditionally required presence of the "abbrev" attribute).2. Elements
The sections below describe all elements and their attributes. Note that attributes not labeled "mandatory" are optional. Many elements have an optional "anchor" attribute. In all cases, the value of the "anchor" attribute needs to be a valid XML "Name" (Section 2.3 of [XML]), additionally constrained to US-ASCII characters [USASCII]. Thus, the character repertoire consists of "A-Z", "a-z", "0-9", "_", "-", ".", and ":", where "0-9", ".", and "-" are disallowed as start characters. Anchors are described in more detail in Appendix B.2.
Tools interpreting the XML described here will collapse horizontal whitespace and line breaks to a single whitespace (except inside <artwork> and <sourcecode>) and will trim leading and trailing whitespace. Tab characters (U+0009) inside <artwork> and <sourcecode> are prohibited. Some of the elements have attributes that are not described in this section because those attributes are specific to the prep tool. People writing tools to process this format should read all of the appendices for a complete description of these attributes. Every element in the v3 vocabulary can have an "xml:lang" attribute, an "xml:base" attribute, or both. The xml:lang attribute specifies the language used in the element. This is sometimes useful for renderers that display different fonts for ideographic characters used in China and Japan. The xml:base attribute is sometimes added to an XML file when doing XML-to-XML conversion where the base file has XInclude attributes (see Appendix B.1).2.1. <abstract>
Contains the Abstract of the document. See [RFC7322] for more information on restrictions for the Abstract. This element appears as a child element of <front> (Section 2.26). Content model: In any order, but at least one of: o <dl> elements (Section 2.20) o <ol> elements (Section 2.34) o <t> elements (Section 2.53) o <ul> elements (Section 2.63)2.1.1. "anchor" Attribute
Document-wide unique identifier for the Abstract.
2.2. <address>
Provides address information for the author. This element appears as a child element of <author> (Section 2.7). Content model: In this order: 1. One optional <postal> element (Section 2.37) 2. One optional <phone> element (Section 2.36) 3. One optional <facsimile> element (Section 3.2) 4. One optional <email> element (Section 2.23) 5. One optional <uri> element (Section 2.64)2.3. <annotation>
Provides additional prose augmenting a bibliographic reference. This text is intended to be shown after the rest of the generated reference text. This element appears as a child element of <reference> (Section 2.40). Content model: In any order: o Text o <bcp14> elements (Section 2.9) o <cref> elements (Section 2.16) o <em> elements (Section 2.22) o <eref> elements (Section 2.24) o <iref> elements (Section 2.27) o <relref> elements (Section 2.44) o <spanx> elements (Section 3.7)
o <strong> elements (Section 2.50) o <sub> elements (Section 2.51) o <sup> elements (Section 2.52) o <tt> elements (Section 2.62) o <xref> elements (Section 2.66)2.4. <area>
Provides information about the IETF area to which this document relates (currently not used when generating documents). The value ought to be either the full name or the abbreviation of one of the IETF areas as listed on <http://www.ietf.org/iesg/area.html>. A list of full names and abbreviations will be kept by the RFC Series Editor. This element appears as a child element of <front> (Section 2.26). Content model: only text content.2.5. <artwork>
This element allows the inclusion of "artwork" in the document. <artwork> provides full control of horizontal whitespace and line breaks; thus, it is used for a variety of things, such as diagrams ("line art") and protocol unit diagrams. Tab characters (U+0009) inside of this element are prohibited. Alternatively, the "src" attribute allows referencing an external graphics file, such as a vector drawing in SVG or a bitmap graphic file, using a URI. In this case, the textual content acts as a fallback for output representations that do not support graphics; thus, it ought to contain either (1) a "line art" variant of the graphics or (2) prose that describes the included image in sufficient detail. In [RFC7749], the <artwork> element was also used for source code and formal languages; in v3, this is now done with <sourcecode>.
There are at least five ways to include SVG in artwork in Internet-Drafts: o Inline, by including all of the SVG in the content of the element, such as: <artwork type="svg"><svg xmlns="http://www.w3.org/2000/ svg..."> o Inline, but using XInclude (see Appendix B.1), such as: <artwork type="svg"><xi:include href=...> o As a data: URI, such as: <artwork type="svg" src="data:image/ svg+xml,%3Csvg%20xmlns%3D%22http%3A%2F%2Fwww.w3..."> o As a URI to an external entity, such as: <artwork type="svg" src="http://www.example.com/..."> o As a local file, such as: <artwork type="svg" src="diagram12.svg"> The use of SVG in Internet-Drafts and RFCs is covered in much more detail in [RFC7996]. The above methods for inclusion of SVG art can also be used for including text artwork, but using a data: URI is probably confusing for text artwork. Formatters that do pagination should attempt to keep artwork on a single page. This is to prevent artwork that is split across pages from looking like two separate pieces of artwork. See Section 5 for a description of how to deal with issues of using "&" and "<" characters in artwork. This element appears as a child element of <aside> (Section 2.6), <blockquote> (Section 2.10), <dd> (Section 2.18), <figure> (Section 2.25), <li> (Section 2.29), <section> (Section 2.46), <td> (Section 2.56), and <th> (Section 2.58). Content model: Either: Text Or: <svg> elements (Section 4)
2.5.1. "align" Attribute
Controls whether the artwork appears left justified (default), centered, or right justified. Artwork is aligned relative to the left margin of the document. Allowed values: o "left" (default) o "center" o "right"2.5.2. "alt" Attribute
Alternative text description of the artwork (which is more than just a summary or caption). When the art comes from the "src" attribute and the format of that artwork supports alternate text, the alternative text comes from the text of the artwork itself, not from this attribute. The contents of this attribute are important to readers who are visually impaired, as well as those reading on devices that cannot show the artwork well, or at all.2.5.3. "anchor" Attribute
Document-wide unique identifier for this artwork.2.5.4. "height" Attribute
Deprecated.2.5.5. "name" Attribute
A filename suitable for the contents (such as for extraction to a local file). This attribute can be helpful for other kinds of tools (such as automated syntax checkers, which work by extracting the artwork). Note that the "name" attribute does not need to be unique for <artwork> elements in a document. If multiple <artwork> elements have the same "name" attribute, a processing tool might assume that the elements are all fragments of a single file, and the tool can collect those fragments for later processing. See Section 7 for a discussion of possible problems with the value of this attribute.
2.5.6. "src" Attribute
The URI reference of a graphics file [RFC3986], or the name of a file on the local disk. This can be a "data" URI [RFC2397] that contains the contents of the graphics file. Note that the inclusion of art with the "src" attribute depends on the capabilities of the processing tool reading the XML document. Tools need to be able to handle the file: URI, and they should be able to handle http: and https: URIs as well. The prep tool will be able to handle reading the "src" attribute. If no URI scheme is given in the attribute, the attribute is considered to be a local filename relative to the current directory. Processing tools must be careful to not accept dangerous values for the filename, particularly those that contain absolute references outside the current directory. Document creators should think hard before using relative URIs due to possible later problems if files move around on the disk. Also, documents should most likely use explicit URI schemes wherever possible. In some cases, the prep tool may remove the "src" attribute after processing its value. See [RFC7998] for a description of this. It is an error to have both a "src" attribute and content in the <artwork> element.2.5.7. "type" Attribute
Specifies the type of the artwork. The value of this attribute is free text with certain values designated as preferred. The preferred values for <artwork> types are: o ascii-art o binary-art o call-flow o hex-dump o svg
The RFC Series Editor will maintain a complete list of the preferred values on the RFC Editor web site, and that list is expected to be updated over time. Thus, a consumer of v3 XML should not cause a failure when it encounters an unexpected type or no type is specified. The table will also indicate which type of art can appear in plain-text output (for example, type="svg" cannot).2.5.8. "width" Attribute
Deprecated.2.5.9. "xml:space" Attribute
Deprecated.2.6. <aside>
This element is a container for content that is semantically less important or tangential to the content that surrounds it. This element appears as a child element of <section> (Section 2.46). Content model: In any order: o <artwork> elements (Section 2.5) o <dl> elements (Section 2.20) o <figure> elements (Section 2.25) o <iref> elements (Section 2.27) o <list> elements (Section 3.4) o <ol> elements (Section 2.34) o <t> elements (Section 2.53) o <table> elements (Section 2.54) o <ul> elements (Section 2.63)2.6.1. "anchor" Attribute
Document-wide unique identifier for this aside.