Network Working Group S. Legg Request for Comments: 4910 eB2Bcom Category: Experimental D. Prager July 2007 Robust XML Encoding Rules (RXER) for Abstract Syntax Notation One (ASN.1) Status of This Memo This memo defines an Experimental Protocol for the Internet community. It does not specify an Internet standard of any kind. Discussion and suggestions for improvement are requested. Distribution of this memo is unlimited. Copyright Notice Copyright (C) The IETF Trust (2007).Abstract
This document defines a set of Abstract Syntax Notation One (ASN.1) encoding rules, called the Robust XML Encoding Rules or RXER, that produce an Extensible Markup Language (XML) representation for values of any given ASN.1 data type. Rules for producing a canonical RXER encoding are also defined.
Table of Contents
1. Introduction ....................................................3 2. Conventions .....................................................4 3. Definitions .....................................................5 4. Additional Basic Types ..........................................6 4.1. The Markup Type ............................................6 4.1.1. Self-Containment ....................................9 4.1.2. Normalization for Canonical Encoding Rules .........12 4.2. The AnyURI Type ...........................................13 4.3. The NCName Type ...........................................14 4.4. The Name Type .............................................14 4.5. The QName Type ............................................14 5. Expanded Names for ASN.1 Types .................................15 6. Encoding Rules .................................................17 6.1. Identifiers ...............................................19 6.2. Component Encodings .......................................20 6.2.1. Referenced Components ..............................20 6.2.2. Element Components .................................20 6.2.2.1. Namespace Properties for Elements .........22 6.2.2.2. Namespace Prefixes for Element Names ......24 6.2.3. Attribute Components ...............................25 6.2.3.1. Namespace Prefixes for Attribute Names ....26 6.2.4. Unencapsulated Components ..........................26 6.2.5. Examples ...........................................27 6.3. Standalone Encodings ......................................28 6.4. Embedded ASN.1 Values .....................................28 6.5. Type Referencing Notations ................................32 6.6. TypeWithConstraint, SEQUENCE OF Type, and SET OF Type .....33 6.7. Character Data Translations ...............................34 6.7.1. Restricted Character String Types ..................35 6.7.2. BIT STRING .........................................36 6.7.3. BOOLEAN ............................................38 6.7.4. ENUMERATED .........................................38 6.7.5. GeneralizedTime ....................................39 6.7.6. INTEGER ............................................41 6.7.7. NULL ...............................................42 6.7.8. ObjectDescriptor ...................................43 6.7.9. OBJECT IDENTIFIER and RELATIVE-OID .................43 6.7.10. OCTET STRING ......................................43 6.7.11. QName .............................................44 6.7.11.1. Namespace Prefixes for Qualified Names ...44 6.7.12. REAL ..............................................45 6.7.13. UTCTime ...........................................46 6.7.14. CHOICE as UNION ...................................47 6.7.15. SEQUENCE OF as LIST ...............................50 6.8. Combining Types ...........................................50 6.8.1. CHARACTER STRING ...................................51
6.8.2. CHOICE .............................................51 6.8.3. EMBEDDED PDV .......................................52 6.8.4. EXTERNAL ...........................................52 6.8.5. INSTANCE OF ........................................52 6.8.6. SEQUENCE and SET ...................................52 6.8.7. SEQUENCE OF and SET OF .............................54 6.8.8. Extensible Combining Types .........................55 6.8.8.1. Unknown Elements in Extensions ............55 6.8.8.2. Unknown Attributes in Extensions ..........59 6.9. Open Type .................................................60 6.10. Markup ...................................................61 6.11. Namespace Prefixes for CRXER .............................63 6.12. Serialization ............................................65 6.12.1. Non-Canonical Serialization .......................65 6.12.2. Canonical Serialization ...........................68 6.12.3. Unicode Normalization in XML Version 1.1 ..........70 6.13. Syntax-Based Canonicalization ............................70 7. Transfer Syntax Identifiers ....................................71 7.1. RXER Transfer Syntax ......................................71 7.2. CRXER Transfer Syntax .....................................71 8. Relationship to XER ............................................71 9. Security Considerations ........................................73 10. Acknowledgements ..............................................74 11. IANA Considerations ...........................................75 12. References ....................................................75 12.1. Normative References .....................................75 12.2. Informative References ...................................77 Appendix A. Additional Basic Definitions Module ...................781. Introduction
This document defines a set of Abstract Syntax Notation One (ASN.1) [X.680] encoding rules, called the Robust XML Encoding Rules or RXER, that produce an Extensible Markup Language (XML) [XML10][XML11] representation of ASN.1 values of any given ASN.1 type. An ASN.1 value is regarded as analogous to the content and attributes of an XML element, or in some cases, just an XML attribute value. The RXER encoding of an ASN.1 value is the well-formed and valid content and attributes of an element, or an attribute value, in an XML document [XML10][XML11] conforming to XML namespaces [XMLNS10][XMLNS11]. Simple ASN.1 data types such as PrintableString, INTEGER, and BOOLEAN define character data content or attribute values, while the ASN.1 combining types (i.e., SET, SEQUENCE, SET OF, SEQUENCE OF, and CHOICE) define element content and attributes. The attribute and child element names are generally provided by the identifiers of the components in combining type definitions, i.e., elements and attributes correspond to the NamedType notation.
RXER leaves some formatting details to the discretion of the encoder, so there is not a single unique RXER encoding for an ASN.1 value. However, this document also defines a restriction of RXER, called the Canonical Robust XML Encoding Rules (CRXER), which does produce a single unique encoding for an ASN.1 value. Obviously, the CRXER encoding of a value is also a valid RXER encoding of that value. The restrictions on RXER to produce the CRXER encoding are interspersed with the description of the rules for RXER. Note that "ASN.1 value" does not mean a Basic Encoding Rules (BER) [X.690] encoding. The ASN.1 value is an abstract concept that is independent of any particular encoding. BER is just one possible way to encode an ASN.1 value. This document defines an alternative way to encode an ASN.1 value. A separate document [RXEREI] defines encoding instructions [X.680-1] that may be used in an ASN.1 specification to modify how values are encoded in RXER, for example, to encode a component of a combining ASN.1 type as an attribute rather than as a child element. A pre-existing ASN.1 specification will not have RXER encoding instructions, so any mention of encoding instructions in this document can be ignored when dealing with such specifications. Encoding instructions for other encoding rules have no effect on RXER encodings.2. Conventions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", and "MAY" in this document are to be interpreted as described in BCP 14, RFC 2119 [BCP14]. The key word "OPTIONAL" is exclusively used with its ASN.1 meaning. A reference to an ASN.1 production [X.680] (e.g., Type, NamedType) is a reference to the text in an ASN.1 specification corresponding to that production. The specification of RXER makes use of definitions from the XML Information Set (Infoset) [INFOSET]. In particular, information item property names follow the Infoset convention of being shown in square brackets, e.g., [local name]. Literal values of Infoset properties are enclosed in double quotes; however, the double quotes are not part of the property values. In the sections that follow, "information item" will be abbreviated to "item", e.g., "element information item" is abbreviated to "element item". The term "element" or "attribute" (without the "item") is referring to an element or attribute in an XML document, rather than an information item.
Literal character strings to be used in an RXER encoding appear within double quotes; however, the double quotes are not part of the literal value and do not appear in the encoding. This document uses the namespace prefix [XMLNS10][XMLNS11] "asnx:" to stand for the namespace name "urn:ietf:params:xml:ns:asnx", uses the namespace prefix "xs:" to stand for the namespace name "http://www.w3.org/2001/XMLSchema", and uses the namespace prefix "xsi:" to stand for the namespace name "http://www.w3.org/2001/XMLSchema-instance". However, in practice, any valid namespace prefixes are permitted in non-canonical RXER encodings (namespace prefixes are deterministically generated for CRXER). The encoding instructions [X.680-1] referenced by name in this specification are encoding instructions for RXER [RXEREI]. Throughout this document, references to the Markup, AnyURI, NCName, Name, and QName ASN.1 types are references to the types described in Section 4 and consolidated in the AdditionalBasicDefinitions module in Appendix A. Any provisions associated with the reference do not apply to types defined in other ASN.1 modules that happen to have these same names. Code points for characters [UCS][UNICODE] are expressed using the Unicode convention U+n, where n is four to six hexadecimal digits, e.g., the space character is U+0020.3. Definitions
Definition (white space character): A white space character is a space (U+0020), tab (U+0009), carriage return (U+000D), or line feed (U+000A) character. Definition (white space): White space is a sequence of one or more white space characters. Definition (line break): A line break is any sequence of characters that is normalized to a line feed by XML End-of-Line Handling [XML10][XML11]. Definition (serialized white space): Serialized white space is a sequence of one or more white space characters and/or line breaks. Definition (declaring the default namespace): A namespace declaration attribute item is declaring the default namespace if the [prefix] of the attribute item has no value, the [local name] of the attribute item is "xmlns" and the [normalized value] is not empty.
Definition (undeclaring the default namespace): A namespace declaration attribute item is undeclaring the default namespace if the [prefix] of the attribute item has no value, the [local name] of the attribute item is "xmlns" and the [normalized value] is empty (i.e., xmlns=""). Definition (canonical namespace prefix): A canonical namespace prefix is an NCName [XMLNS10] beginning with the letter 'n' (U+006E) followed by a non-negative number string. A non-negative number string is either the digit character '0' (U+0030), or a non-zero decimal digit character (U+0031-U+0039) followed by zero, one, or more of the decimal digit characters '0' to '9' (U+0030-U+0039). For convenience, a CHOICE type where the ChoiceType is subject to a UNION encoding instruction will be referred to as a UNION type, and a SEQUENCE OF type where the SequenceOfType is subject to a LIST encoding instruction will be referred to as a LIST type.4. Additional Basic Types
This section defines an ASN.1 type for representing markup in abstract values, as well as basic types that are useful in encoding instructions [RXEREI] and other related specifications [ASN.X]. The ASN.1 definitions in this section are consolidated in the AdditionalBasicDefinitions ASN.1 module in Appendix A.4.1. The Markup Type
A value of the Markup ASN.1 type holds the [prefix], [attributes], [namespace attributes], and [children] of an element item, i.e., the content and attributes of an element. RXER has special provisions for encoding values of the Markup type (see Section 6.10). For other encoding rules, a value of the Markup type is encoded according to the following ASN.1 type definition (with AUTOMATIC TAGS): Markup ::= CHOICE { text SEQUENCE { prolog UTF8String (SIZE(1..MAX)) OPTIONAL, prefix NCName OPTIONAL, attributes UTF8String (SIZE(1..MAX)) OPTIONAL, content UTF8String (SIZE(1..MAX)) OPTIONAL } }
The text alternative of the Markup CHOICE type provides for the [prefix], [attributes], [namespace attributes], and [children] of an element item to be represented as serialized XML using the UTF-8 character encoding [UTF-8]. Aside: The CHOICE allows for one or more alternative compact representations of the content and attributes of an element to be supported in a future specification. With respect to some element item whose content and attributes are represented by a value of the text alternative of the Markup type: (1) the prolog component of the value contains text that, after line break normalization, conforms to the XML prolog production [XML10][XML11], (2) the prefix component is absent if the [prefix] of the element item has no value; otherwise, the prefix component contains the [prefix] of the element item, (3) the attributes component of the value contains an XML serialization of the [attributes] and [namespace attributes] of the element item, if any, with each attribute separated from the next by serialized white space, and (4) the content component is absent if the [children] property of the element item is empty; otherwise, the content component of the value contains an XML serialization of the [children] of the element item. All the components of a value of the Markup type MUST use the same version of XML, either version 1.0 [XML10] or version 1.1 [XML11]. If XML version 1.1 is used, then the prolog component MUST be present and MUST have an XMLDecl for version 1.1. If the prolog component is absent, then XML version 1.0 is assumed. If the prefix component is present, then there MUST be a namespace declaration attribute in the attributes component that defines that namespace prefix (since an element whose content and attributes are described by a value of Markup is required to be self-contained; see Section 4.1.1). Note that the prefix component is critically related to the NamedType that has Markup as its type. If a Markup value is extracted from one enclosing abstract value and embedded in another enclosing abstract value (i.e., becomes associated with a different NamedType), then the prefix may no longer be appropriate, in which case it will need to be revised. It may also be necessary to add another namespace
declaration attribute to the attributes component so as to declare a new namespace prefix. Leading and/or trailing serialized white space is permitted in the attributes component. A value of the attributes component consisting only of serialized white space (i.e., no actual attributes) is permitted. The attributes and content components MAY contain entity references [XML10][XML11]. If any entity references are used (other than references to the predefined entities), then the prolog component MUST be present and MUST contain entity declarations for those entities in the internal or external subset of the document type definition. Example Given the following ASN.1 module: MyModule DEFINITIONS AUTOMATIC TAGS ::= BEGIN Message ::= SEQUENCE { messageType INTEGER, messageValue Markup } ENCODING-CONTROL RXER TARGET-NAMESPACE "http://example.com/ns/MyModule" COMPONENT message Message -- a top-level NamedType END consider the following XML document: <?xml version='1.0'?> <!DOCTYPE message [ <!ENTITY TRUE 'true'> ]> <message> <messageType>1</messageType> <messageValue xmlns:ns="http://www.example.com/ABD" ns:foo="1" bar="0"> <this>&TRUE;</this> <that/>
</messageValue> </message> A Markup value corresponding to the content and attributes of the <messageValue> element is, in ASN.1 value notation [X.680] (where "lf" represents the line feed character): text:{ prolog { "<?xml version='1.0'?>", lf, "<!DOCTYPE message [", lf, " <!ENTITY TRUE 'true'>", lf, "]>", lf }, attributes { " xmlns:ns=""http://www.example.com/ABD""", lf, " ns:foo=""1"" bar=""0""" }, content { lf, " <this>&TRUE;</this>", lf, " <that/>", lf, " " } } The following Markup value is an equivalent representation of the content and attributes of the <messageValue> element: text:{ attributes { "bar=""0"" ns:foo=""1"" ", "xmlns:ns=""http://www.example.com/ABD""" }, content { lf, " <this>true</this>", lf, " <that/>", lf, " " } } By itself, the Markup ASN.1 type imposes no data type restriction on the markup contained by its values and is therefore analogous to the XML Schema anyType [XSD1]. There is no ASN.1 basic notation that can directly impose the constraint that the markup represented by a value of the Markup type must conform to the markup allowed by a specific type definition. However, certain encoding instructions (i.e., the reference encoding instructions [RXEREI]) have been defined to have this effect.4.1.1. Self-Containment
An element, its attributes and its content, including descendent elements, may contain qualified names [XMLNS10][XMLNS11] as the names of elements and attributes, in the values of attributes, and as character data content of elements. The binding between namespace
prefix and namespace name for these qualified names is potentially determined by the namespace declaration attributes of ancestor elements (which in the Infoset representation are inherited as namespace items in the [in-scope namespaces]). In the absence of complete knowledge of the data type of an element item whose content and attributes are described by a value of the Markup type, it is not possible to determine with absolute certainty which of the namespace items inherited from the [in-scope namespaces] of the [parent] element item are significant in interpreting the Markup value. The safe and easy option would be to assume that all the namespace items from the [in-scope namespaces] of the [parent] element item are significant and need to be retained within the Markup value. When the Markup value is re-encoded, any of the retained namespace items that do not appear in the [in-scope namespaces] of the enclosing element item in the new encoding could be made to appear by outputting corresponding namespace declaration attribute items in the [namespace attributes] of the enclosing element item. From the perspective of the receiver of the new encoding, this enlarges the set of attribute items in the [namespace attributes] represented by the Markup value. In addition, there is no guarantee that the sender of the new encoding has recreated the original namespace declaration attributes on the ancestor elements, so the [in-scope namespaces] of the enclosing element item is likely to have new namespace declarations that the receiver will retain and pass on in the [namespace attributes] when it in turn re-encodes the Markup value. This unbounded growth in the set of attribute items in the [namespace attributes] defeats any attempt to produce a canonical encoding. The principle of self-containment is introduced to avoid this problem. An element item (the subject element item) is self-contained if the constraints of Namespaces in XML 1.0 [XMLNS10] are satisfied (i.e., that prefixes are properly declared) and none of the following bindings are determined by a namespace declaration attribute item in the [namespace attributes] of an ancestor element item of the subject element item: (1) the binding between the [prefix] and [namespace name] of the subject element item, (2) the binding between the [prefix] and [namespace name] of any descendant element item of the subject element item,
(3) the binding between the [prefix] and [namespace name] of any attribute item in the [attributes] of the subject element item or the [attributes] of any descendant element item of the subject element item, (4) the binding between the namespace prefix and namespace name of any qualified name in the [normalized value] of any attribute item in the [attributes] of the subject element item or the [attributes] of any descendant element item of the subject element item, or (5) the binding between the namespace prefix and namespace name of any qualified name represented by a series of character items (ignoring processing instruction and comment items) in the [children] of the subject element item or the [children] of any descendant element item of the subject element item. Aside: If an element is self-contained, then separating the element from its parent does not change the semantic interpretation of its name and any names in its content and attributes. A supposedly self-contained element in a received RXER encoding that is in fact not self-contained SHALL be treated as an ASN.1 constraint violation. Aside: ASN.1 does not require an encoding with a constraint violation to be immediately rejected; however, the constraint violation must be reported at some point, possibly in a separate validation step. Implementors should note that an RXER decoder will be able to detect some, but not all, violations of self-containment. For example, it can detect element and attribute names that depend on namespace declarations appearing in the ancestors of a supposedly self-contained element. Similarly, where type information is available, it can detect qualified names in character data that depend on the namespace declarations of ancestor elements. However, type information is not always available, so some qualified names will escape constraint checking. Thus, the onus is on the creator of the original encoding to ensure that element items required to be self-contained really are completely self-contained. An element item whose content and attributes are described by a value of the Markup type MUST be self-contained.
Aside: The procedures in Section 6 take account of the requirements for self-containment so that an RXER encoder following these procedures will not create violations of self-containment.4.1.2. Normalization for Canonical Encoding Rules
Implementations are given some latitude in how the content and attributes of an element are represented as an abstract value of the Markup type, in part because an Infoset can have different equivalent serializations. For example, the order of attributes and the amount and kind of white space characters between attributes are irrelevant to the Infoset representation. The content can also include one or more elements corresponding to an ASN.1 top-level NamedType or having a data type that is an ASN.1 type. It is only necessary to preserve the abstract value for such elements, and a particular abstract value can have different Infoset representations. These two characteristics mean that when an RXER encoded value of the Markup type is decoded, the components of the recovered Markup value may not be exactly the same, character for character, as the original value that was encoded, though the recovered value will be semantically equivalent. However, canonical ASN.1 encoding rules such as the Distinguished Encoding Rules (DER) and the Canonical Encoding Rules (CER) [X.690], which encode Markup values according to the ASN.1 definition of the Markup type, depend on character-for-character preservation of string values. This requirement can be accommodated if values of the Markup type are normalized when they are encoded according to a set of canonical encoding rules. Aside: The RXER encoding and decoding of a Markup value might change the character string components of the value from the perspective of BER, but there will be a single, repeatable encoding for DER. A value of the Markup type will appear as the content and attributes of an element in an RXER encoding. When the value is encoded using a set of ASN.1 canonical encoding rules other than CRXER, the components of the text alternative of the value MUST be normalized as follows, by reference to the element as it would appear in a CRXER encoding: (1) The value of the prolog component SHALL be the XMLDecl <?xml version="1.1"?> with no other leading or trailing characters.
(2) If the element's name is unprefixed in the CRXER encoding, then the prefix component SHALL be absent; otherwise, the value of the prefix component SHALL be the prefix of the element's name in the CRXER encoding. (3) Take the character string representing the element's attributes, including namespace declarations, in the CRXER encoding. If the first attribute is a namespace declaration that undeclares the default namespace (i.e., xmlns=""), then remove it. Remove any leading space characters. If the resulting character string is empty, then the attributes component SHALL be absent; otherwise, the value of the attributes component SHALL be the resulting character string. Aside: Note that the attributes of an element can change if an RXER encoding is re-encoded in CRXER. (4) If the element has no characters between the start-tag and end-tag [XML11] in the CRXER encoding, then the content component SHALL be absent; otherwise, the value of the content component SHALL be identical to the character string in the CRXER encoding bounded by the element's start-tag and end-tag. Aside: A consequence of invoking the CRXER encoding is that any nested element corresponding to an ASN.1 top-level NamedType, or indeed the element itself, will be normalized according to its ASN.1 value rather than its Infoset representation. Likewise for an element whose data type is an ASN.1 type. Section 6.4 describes how these situations can arise. Aside: It is only through values of the Markup type that processing instructions and comments can appear in CRXER encodings. If an application uses DER, but has no knowledge of RXER, then it will not know to normalize values of the Markup type. If RXER is deployed into an environment containing such applications, then Markup values SHOULD be normalized, even when encoding using non-canonical encoding rules.4.2. The AnyURI Type
A value of the AnyURI ASN.1 type is a character string conforming to the format of a Uniform Resource Identifier (URI) [URI]. AnyURI ::= UTF8String (CONSTRAINED BY { -- conforms to the format of a URI -- })
4.3. The NCName Type
A value of the NCName ASN.1 type is a character string conforming to the NCName production of Namespaces in XML 1.0 [XMLNS10]. NCName ::= UTF8String (CONSTRAINED BY { -- conforms to the NCName production of -- Namespaces in XML 1.0 -- }) Aside: The NCName production for Namespaces in XML 1.1 [XMLNS11] allows a wider range of characters than the NCName production for Namespaces in XML 1.0. The NCName type for ASN.1 is currently restricted to the characters allowed by Namespaces in XML 1.0, though this may change in a future specification of RXER.4.4. The Name Type
A value of the Name ASN.1 type is a character string conforming to the Name production of XML version 1.0 [XML10]. Name ::= UTF8String (CONSTRAINED BY { -- conforms to the Name production of XML -- })4.5. The QName Type
A value of the QName ASN.1 type describes an expanded name [XMLNS10], which appears as a qualified name [XMLNS10] in an RXER encoding. RXER has special provisions for encoding values of the QName type (see Section 6.7.11). For other encoding rules, a value of the Qname type is encoded according to the following ASN.1 type definition (with AUTOMATIC TAGS): QName ::= SEQUENCE { namespace-name AnyURI OPTIONAL, local-name NCName } The namespace-name component holds the namespace name of the expanded name. If the namespace name of the expanded name has no value, then the namespace-name component is absent. Aside: A namespace name can be associated with ASN.1 types and top-level NamedType instances by using the TARGET-NAMESPACE encoding instruction. The local-name component holds the local name of the expanded name.