Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7940

Representing Label Generation Rulesets Using XML

Pages: 82
Proposed Standard
Errata
Part 4 of 4 – Pages 63 to 82
First   Prev   None

Top   ToC   RFC7940 - Page 63   prevText

Appendix B. How to Translate Tables Based on RFC 3743 into the XML Format

As background, the rules specified in [RFC3743] work as follows: 1. The original (requested) label is checked to make sure that all the code points are a subset of the repertoire. 2. If it passes the check, the original label is allocatable. 3. Generate the all-simplified and all-traditional variant labels (union of all the labels generated using all the simplified variants of the code points) for allocation. To illustrate by example, here is one of the more complicated set of variants: U+4E7E U+4E81 U+5E72 U+5E79 U+69A6 U+6F27 The following shows the relevant section of the Chinese language table published by the .ASIA registry [ASIA-TABLE]. Its entries read: <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)> These are the lines corresponding to the set of variants listed above: U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6 U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6 U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27 U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27 U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27 U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6
Top   ToC   RFC7940 - Page 64
   The corresponding "data" section XML format would look like this:

     <data>
       <char cp="4E7E">
       <var cp="4E7E" type="both" comment="identity" />
       <var cp="4E81" type="blocked" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="4E81">
       <var cp="4E7E" type="trad" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="5E72">
       <var cp="4E7E" type="trad"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="both" comment="identity"/>
       <var cp="5E79" type="trad"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="5E79">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="trad" comment="identity"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="69A6">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="trad" comment="identity"/>
       <var cp="6F27" type="blocked"/>
       </char>
Top   ToC   RFC7940 - Page 65
       <char cp="6F27">
       <var cp="4E7E" type="simp"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="blocked"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="trad" comment="identity"/>
       </char>
     </data>

   Here, the simplified variants have been given a type of "simp" and
   the traditional variants one of "trad", and all other ones are given
   "blocked".

   Because some variant mappings show in more than one column, while the
   XML format allows only a single type value, they have been given the
   type of "both".

   Note that some variant mappings map to themselves (identity); that
   is, the mapping is reflexive (see Section 5.3.4).  In creating the
   permutation of all variant labels, these mappings have no effect,
   other than adding a value to the variant type list for the variant
   label containing them.

   In the example so far, all of the entries with type="both" are also
   mappings where source and target are identical.  That is, they are
   reflexive mappings as defined in Section 5.3.4.

   Given a label "U+4E7E U+4E81", the following labels would be ruled
   allocatable per [RFC3743], based on how that standard is commonly
   implemented in domain registries:

       Original label:     U+4E7E U+4E81
       Simplified label 1: U+4E7E U+5E72
       Simplified label 2: U+5E72 U+5E72
       Traditional label:  U+4E7E U+4E7E

   However, if allocatable labels were generated simply by a straight
   permutation of all variants with type other than type="blocked" and
   without regard to the simplified and traditional variants, we would
   end up with an extra allocatable label of "U+5E72 U+4E7E".  This
   label is composed of both a Simplified Chinese character and a
   Traditional Chinese code point and therefore shouldn't be
   allocatable.
Top   ToC   RFC7940 - Page 66
   To more fully resolve the dispositions requires several actions to be
   defined, as described in Section 7.2.2, that will override the
   default actions from Section 7.6.  After blocking all labels that
   contain a variant with type "blocked", these actions will set to
   "allocatable" labels based on the following variant types: "simp",
   "trad", and "both".  Note that these variant types do not directly
   relate to dispositions for the variant label, but that the actions
   will resolve them to the Standard Dispositions on labels, i.e.,
   "blocked" and "allocatable".

   To resolve label dispositions requires five actions to be defined (in
   the "rules" section of the XML document in question); these actions
   apply in order, and the first one triggered defines the disposition
   for the label.  The actions are as follows:

   1.  Block all variant labels containing at least one blocked variant.

   2.  Allocate all labels that consist entirely of variants that are
       "simp" or "both".

   3.  Also allocate all labels that are entirely "trad" or "both".

   4.  Block all surviving labels containing any one of the dispositions
       "simp" or "trad" or "both", because they are now known to be part
       of an undesirable mixed simplified/traditional label.

   5.  Allocate any remaining label; the original label would be such a
       label.

   The rules declarations would be represented as:

     <rules>
       <!--"action" elements - order defines precedence-->
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="simp both" />
       <action disp="allocatable" only-variants="trad both" />
       <action disp="blocked" any-variant="simp trad" />
       <action disp="allocatable" comment="catch-all" />
     </rules>

   Up to now, variants with type "both" have occurred only associated
   with reflexive variant mappings.  The "action" elements defined above
   rely on the assumption that this is always the case.  However,
   consider the following set of variants:

       U+62E0;U+636E;U+636E;U+64DA
       U+636E;U+636E;U+64DA;U+62E0
       U+64DA;U+636E;U+64DA;U+62E0
Top   ToC   RFC7940 - Page 67
   The corresponding XML would be:

       <char cp="62E0">
       <var cp="636E" type="both" comment="both, but not reflexive" />
       <var cp="64DA" type="blocked" />
       </char>
       <char cp="636E">
       <var cp="636E" type="simp" comment="reflexive, but not both" />
       <var cp="64DA" type="trad" />
       <var cp="62E0" type="blocked" />
       </char>
       <char cp="64DA">
       <var cp="636E" type="simp" />
       <var cp="64DA" type="trad" comment="reflexive" />
       <var cp="62E0" type="blocked" />
       </char>

   To make such variant sets work requires a way to selectively trigger
   an action based on whether a variant type is associated with an
   identity or reflexive mapping, or is associated with an ordinary
   variant mapping.  This can be done by adding a prefix "r-" to the
   "type" attribute on reflexive variant mappings.  For example, the
   "trad" for code point U+64DA in the preceding figure would become
   "r-trad".

   With the dispositions prepared in this way, only a slight
   modification to the actions is needed to yield the correct set of
   allocatable labels:

   <action disp="blocked" any-variant="blocked" />
   <action disp="allocatable" only-variants="simp r-simp both r-both" />
   <action disp="allocatable" only-variants="trad r-trad both r-both" />
   <action disp="blocked" all-variants="simp trad both" />
   <action disp="allocatable" />

   The first three actions get triggered by the same labels as before.

   The fourth action blocks any label that combines an original code
   point with any mix of ordinary variant mappings; however, no labels
   that are a combination of only original code points (code points
   having either no variant mappings or a reflexive mapping) would be
   affected.  These are the original labels, and they are allocated in
   the last action.
Top   ToC   RFC7940 - Page 68
   Using this scheme of assigning types to ordinary and reflexive
   variants, all tables in the style of RFC 3743 can be converted to
   XML.  By defining a set of actions as outlined above, the LGR will
   yield the correct set of allocatable variants: all variants
   consisting completely of variant code points preferred for simplified
   or traditional, respectively, will be allocated, as will be the
   original label.  All other variant labels will be blocked.

Appendix C. Indic Syllable Structure Example

In LGRs for Indic scripts, it may be desirable to restrict valid labels to sequences of valid Indic syllables, or aksharas. This appendix gives a sample set of rules designed to enforce this restriction. Below is an example of BNF for an akshara, which has been published in "Devanagari Script Behaviour for Hindi" [TDIL-HINDI]. The rules for other languages and scripts used in India are expected to be generally similar. For Hindi, the BNF has the form: V[m]|{C[N]H}C[N](H|[v][m]) Where: V (uppercase) is any independent vowel m is any vowel modifier (Devanagari Anusvara, Visarga, and Candrabindu) C is any consonant (with inherent vowel) N is Nukta H is a halant (or virama) v (lowercase) is any dependent vowel sign (matra) {} encloses items that may be repeated one or more times [ ] encloses items that may or may not be present | separates items, out of which only one can be present
Top   ToC   RFC7940 - Page 69
   By using the Unicode character property "InSC" or
   "Indic_Syllabic_Category", which corresponds rather directly to the
   classification of characters in the BNF above, we can translate the
   BNF into a set of WLE rules matching the definition of an akshara.

     <rules>
       <!--Character class definitions go here-->
       <class name="halant" property="InSC:Virama" />
       <union name="vowel-modifier">
         <class property="InSC:Visarga" />
         <class property="InSC:Bindu" comment="includes anusvara" />
       </union>
       <!--Whole label evaluation and context rules go here-->
       <rule name="consonant-with-optional-nukta">
           <class by-ref="InSC:Consonant" />
           <class by-ref="InSC:Nukta" count="0:1"/>
       </rule>
       <rule name="independent-vowel-with-optional-modifier">
           <class by-ref="InSC:Vowel_Independent" />
           <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="optional-dependent-vowel-with-opt-modifier" >
         <class by-ref="InSC:Vowel_Dependent" count="0:1" />
         <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="consonant-cluster">
         <rule count="0+">
           <rule by-ref="consonant-with-optional-nukta" />
           <class by-ref="halant" />
         </rule>
         <rule by-ref="consonant-with-optional-nukta" />
         <choice>
           <class by-ref="halant" />
           <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
         </choice>
       </rule>
       <rule name="akshara">
         <choice>
           <rule by-ref="independent-vowel-with-optional-modifier" />
           <rule by-ref="consonant-cluster" />
         </choice>
       </rule>
Top   ToC   RFC7940 - Page 70
       <rule name="WLE-akshara-or-other" comment="series of one or
           more aksharas, possibly alternating with other types of
           code points such as digits">
         <start />
         <choice count="1+">
           <class property="InSC:other" />
           <rule by-ref="akshara" />
         </choice>
         <end />
       </rule>
       <!--"action" elements go here - order defines precedence-->
       <action disp="invalid" not-match="WLE-akshara-or-other" />
     </rules>

   With the rules and classes as defined above, the final action assigns
   a disposition of "invalid" to all labels that are not composed of a
   sequence of well-formed aksharas, optionally interspersed with other
   characters, perhaps digits, for example.

   The relevant Unicode character property could be replicated by
   tagging repertoire values directly in the LGR; this would remove the
   dependency on any specific version of the Unicode Standard.

   Generally, dependent vowels may only follow consonant expressions;
   however, for some scripts, like Bengali, the Unicode Standard
   supports sequences of dependent vowels or their application on
   independent vowels.  This makes the definition of akshara less
   restrictive.

C.1. Reducing Complexity

As presented in this example, the rules are rather complex -- although useful in demonstrating the features of the XML format, such complexity would be an undesirable feature in an actual LGR. It is possible to reduce the complexity of the rules in this example by defining alternate rules that simply define the permissible pair-wise context of adjacent code points by character class, such as a rule that a halant can only follow a (nuktated) consonant. Such pair-wise contexts are easier to understand, implement, and verify, and have the additional benefit of allowing tools to better pinpoint why a label failed to validate. They also tend to correspond more directly to the kind of well-formedness requirements that are most relevant to DNS security, like the requirement to limit the application of a combining mark (such as a vowel modifier) to only selected base characters (in this case, vowels). (See the example and discussion in [WLE-RULES].)
Top   ToC   RFC7940 - Page 71

Appendix D. RELAX NG Compact Schema

This schema is provided in RELAX NG Compact format [RELAX-NG]. <CODE BEGINS> # # LGR XML Schema 1.0 # default namespace = "urn:ietf:params:xml:ns:lgr-1.0" # # SIMPLE TYPES # # RFC 5646 language tag (e.g., "de", "und-Latn") language-tag = xsd:token # The scope to which the LGR applies. For the "domain" scope type, # it should be a fully qualified domain name. scope-value = xsd:token { minLength = "1" } ## a single code point code-point = xsd:token { pattern = "[0-9A-F]{4,6}" } ## a space-separated sequence of code points code-point-sequence = xsd:token { pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+" } ## single code point, or a sequence of code points, or empty string code-point-literal = code-point | code-point-sequence | "" ## code point or sequence only non-empty-code-point-literal = code-point | code-point-sequence ## code point sent represented in short form code-point-set-shorthand = xsd:token { pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})" ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*" }
Top   ToC   RFC7940 - Page 72
   ## dates are used in information fields in the meta
   ## section ("YYYY-MM-DD")
   date-pattern = xsd:token {
       pattern = "\d{4}-\d\d-\d\d"
   }

   ## variant type
   ## the variant type MUST be non-empty and MUST NOT
   ## start with a "_"; using xsd:NMTOKEN here because
   ## we need space-separated lists of them
   variant-type = xsd:NMTOKEN

   ## variant type list for action triggers
   ## the list MUST NOT be empty, and entries MUST NOT
   ## start with a "_"
   variant-type-list = xsd:NMTOKENS

   ## reference to a rule name (used in "when" and "not-when"
   ## attributes, as well as the "by-ref" attribute of the "rule"
   ## element).
   rule-ref = xsd:IDREF

   ## a space-separated list of tags.  Tags should generally follow
   ## xsd:Name syntax.  However, we are using the xsd:NMTOKENS here
   ## because there is no native XSD datatype for space-separated
   ## xsd:Name
   tags = xsd:NMTOKENS

   ## The value space of a "from-tag" attribute.  Although it is closer
   ## to xsd:IDREF lexically and semantically, tags are not unique in
   ## the document.  As such, we are unable to take advantage of
   ## facilities provided by a validator.  xsd:NMTOKEN is used instead
   ## of the stricter xsd:Names here so as to be consistent with
   ## the above.
   tag-ref = xsd:NMTOKEN

   ## an identifier type (used by "name" attributes).
   identifier = xsd:ID

   ## used in the class "by-ref" attribute to reference another class of
   ## the same "name" attribute value.
   class-ref = xsd:IDREF

   ## "count" attribute pattern ("n", "n+", or "n:m")
   count-pattern = xsd:token {
       pattern = "\d+(\+|:\d+)?"
   }
Top   ToC   RFC7940 - Page 73
   ## "ref" attribute pattern
   ## space-separated list of "id" attribute values for
   ## "reference" elements.  These reference ids
   ## must be declared in a "reference" element
   ## before they can be used in a "ref" attribute
   ref-pattern = xsd:token {
       pattern = "[\-_.:0-9A-Z]+( [\-_.:0-9A-Z]+)*"
   }

   #
   # STRUCTURES
   #

   ## Representation of a single code point or a sequence of code
   ## points
   char = element char {
       attribute cp { code-point-literal },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?,
         variant*
   }

   ## Representation of a range of code points
   range = element range {
       attribute first-cp { code-point },
       attribute last-cp { code-point },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?
   }

   ## Representation of a variant code point or sequence
   variant = element var {
       attribute cp { code-point-literal },
       attribute type { xsd:NMTOKEN }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }
Top   ToC   RFC7940 - Page 74
   #
   # Classes
   #

   ## a "class" element that references the name of another "class"
   ## (or set-operator like "union") defined elsewhere.
   ## If used as a matcher (appearing under a "rule" element),
   ## the "count" attribute may be present.
   class-invocation = element class { class-invocation-content }

   class-invocation-content =
       attribute by-ref { class-ref },
       attribute count { count-pattern }?,
       attribute comment { text }?

   ## defines a new class (set of code points) using Unicode property
   ## or code points of the same tag value or code point literals
   class-declaration = element class { class-declaration-content }

   class-declaration-content =
       # "name" attribute MUST be present if this is a "top-level"
       # class declaration, i.e., appearing directly under the "rules"
       # element.  Otherwise, it MUST be absent.
       attribute name { identifier }?,
       # If used as a matcher (appearing in a "rule" element, but not
       # when nested inside a set-operator or class), the "count"
       # attribute may be present.  Otherwise, it MUST be absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (
         # define the class by property (e.g., property="sc:Latn"), OR
         attribute property { xsd:NMTOKEN }
         # define the class by tagged code points, OR
         | attribute from-tag { tag-ref }
         # text node to allow for shorthand notation
         # e.g., "0061 0062-0063"
         | code-point-set-shorthand
       )
Top   ToC   RFC7940 - Page 75
   class-invocation-or-declaration = element class {
     class-invocation-content | class-declaration-content
   }

   class-or-set-operator-nested =
     class-invocation-or-declaration | set-operator

   class-or-set-operator-declaration =
     # a "class" element or set-operator (effectively defining a class)
     # directly in the "rules" element.
     class-declaration | set-operator


   #
   # set-operators
   #

   complement-operator = element complement {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested
   }

   union-operator = element union {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       # needs two or more child elements
       class-or-set-operator-nested+
   }
Top   ToC   RFC7940 - Page 76
   intersection-operator = element intersection {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }

   difference-operator = element difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }

   symmetric-difference-operator = element symmetric-difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }

   ## operators that transform class(es) into a new class.
   set-operator = complement-operator
                  | union-operator
                  | intersection-operator
                  | difference-operator
                  | symmetric-difference-operator
Top   ToC   RFC7940 - Page 77
   #
   # Match operators (matchers)
   #

   any-matcher = element any {
       attribute count { count-pattern }?,
       attribute comment { text }?
   }

   choice-matcher = element choice {
       ## "count" attribute MUST only be used when the choice-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       # two or more match operators
       match-operator-choice,
       match-operator-choice+
   }

   char-matcher =
     # for use as a matcher - like "char" but without a "tag" attribute
     element char {
       attribute cp { non-empty-code-point-literal },
       # If used as a matcher (appearing in a "rule" element), the
       # "count" attribute may be present.  Otherwise, it MUST be
       # absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }

   start-matcher = element start {
       attribute comment { text }?
   }

   end-matcher = element end {
       attribute comment { text }?
   }

   anchor-matcher = element anchor {
       attribute comment { text }?
   }
Top   ToC   RFC7940 - Page 78
   look-ahead-matcher = element look-ahead {
       attribute comment { text }?,
       match-operators-non-pos
   }
   look-behind-matcher = element look-behind {
       attribute comment { text }?,
       match-operators-non-pos
   }

   ## non-positional match operator that can be used as a direct child
   ## element of the choice-matcher.
   match-operator-choice = (
     any-matcher | choice-matcher | start-matcher | end-matcher
     | char-matcher | class-or-set-operator-nested | rule-matcher
   )

   ## non-positional match operators do not contain any "anchor",
   ## "look-behind", or "look-ahead" elements.
   match-operators-non-pos = (
     start-matcher?,
     (any-matcher | choice-matcher | char-matcher
      | class-or-set-operator-nested | rule-matcher)*,
     end-matcher?
   )

   ## positional match operators have an "anchor" element, which may be
   ## preceded by a "look-behind" element, or followed by a "look-ahead"
   ## element, or both.
   match-operators-pos =
     look-behind-matcher?, anchor-matcher, look-ahead-matcher?

   match-operators = match-operators-non-pos | match-operators-pos
Top   ToC   RFC7940 - Page 79
   #
   # Rules
   #

   # top-level rule must have "name" attribute
   rule-declaration-top = element rule {
       attribute name { identifier },
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       match-operators
   }

   ## "rule" element used as a matcher (either "by-ref" or contains
   ## other match operators itself)
   rule-matcher =
     element rule {
       ## "count" attribute MUST only be used when the rule-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (attribute by-ref { rule-ref } | match-operators)
     }

   #
   # Actions
   #

   action-declaration = element action {
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # dispositions are often named after variant types or vice versa
       attribute disp { variant-type },
       ( attribute match { rule-ref }
         | attribute not-match { rule-ref } )?,
       ( attribute any-variant { variant-type-list }
         | attribute all-variants { variant-type-list }
         | attribute only-variants { variant-type-list } )?
   }
Top   ToC   RFC7940 - Page 80
   # DOCUMENT STRUCTURE

   start = lgr
   lgr = element lgr {
       meta-section?,
       data-section,
       rules-section?
   }

   ## Meta section - information recorded with an LGR that generally
   ## does not affect machine processing (except for "unicode-version").
   ## However, if any "class-declaration" uses the "property" attribute,
   ## a "unicode-version" element MUST be present.
   meta-section = element meta {
       element version {
           attribute comment { text }?,
           text
       }?
       & element date { date-pattern }?
       & element language { language-tag }*
       & element scope {
           # type may by "domain" or an application-defined value
           attribute type { xsd:NCName },
           scope-value
       }*
       & element validity-start { date-pattern }?
       & element validity-end { date-pattern }?
       & element unicode-version {
           xsd:token {
               pattern = "\d+\.\d+\.\d+"
           }
       }?
       & element description {
           # this SHOULD be a valid MIME type
           attribute type { text }?,
           text
       }?
Top   ToC   RFC7940 - Page 81
       & element references {
           element reference {
               attribute id {
                   xsd:token {
                       # limit "id" attribute to uppercase letters,
                       # digits, and a few punctuation marks; use of
                       # integers is RECOMMENDED
                       pattern = "[\-_.:0-9A-Z]*"
                       minLength = "1"
                   }
                },
                attribute comment { text }?,
                text
           }*
       }?
   }

   data-section = element data { (char | range)+ }

   ## Note that action declarations are strictly order dependent.
   ## class-or-set-operator-declaration and rule-declaration-top
   ## are weakly order dependent; they must precede first use of the
   ## identifier via "by-ref".
   rules-section = element rules {
     ( class-or-set-operator-declaration
       | rule-declaration-top
       | action-declaration)*
   }

   <CODE ENDS>
Top   ToC   RFC7940 - Page 82

Acknowledgements

This format builds upon the work on documenting IDN tables by many different registry operators. Notably, a comprehensive language table for Chinese, Japanese, and Korean was developed by the "Joint Engineering Team" [RFC3743]; this table is the basis of many registry policies. Also, a set of guidelines for Arabic script registrations [RFC5564] was published by the Arabic-language community. Contributions that have shaped this document have been provided by Francisco Arias, Julien Bernard, Mark Davis, Martin Duerst, Paul Hoffman, Sarmad Hussain, Barry Leiba, Alexander Mayrhofer, Alexey Melnikov, Nicholas Ostler, Thomas Roessler, Audric Schiltknecht, Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan, and John Yunker.

Authors' Addresses

Kim Davies Internet Corporation for Assigned Names and Numbers 12025 Waterfront Drive Los Angeles, CA 90094 United States of America Phone: +1 310 301 5800 Email: kim.davies@icann.org URI: http://www.icann.org/ Asmus Freytag ASMUS, Inc. Email: asmus@unicode.org