RFC 7940

Representing Label Generation Rulesets Using XML

Pages: 82
Proposed Standard
→ Errata

Part 4 of 4 – Pages 63 to 82

RFC7940 - Page 63 prevText

Appendix B.  How to Translate Tables Based on RFC 3743 into the XML
             Format

   As background, the rules specified in [RFC3743] work as follows:

   1.  The original (requested) label is checked to make sure that all
       the code points are a subset of the repertoire.

   2.  If it passes the check, the original label is allocatable.

   3.  Generate the all-simplified and all-traditional variant labels
       (union of all the labels generated using all the simplified
       variants of the code points) for allocation.

   To illustrate by example, here is one of the more complicated set of
   variants:

       U+4E7E
       U+4E81
       U+5E72
       U+5E79
       U+69A6
       U+6F27

   The following shows the relevant section of the Chinese language
   table published by the .ASIA registry [ASIA-TABLE].  Its
   entries read:

    <codepoint>;<simpl-variant(s)>;<trad-variant(s)>;<other-variant(s)>

   These are the lines corresponding to the set of variants
   listed above:

   U+4E7E;U+4E7E,U+5E72;U+4E7E;U+4E81,U+5E72,U+6F27,U+5E79,U+69A6
   U+4E81;U+5E72;U+4E7E;U+5E72,U+6F27,U+5E79,U+69A6
   U+5E72;U+5E72;U+5E72,U+4E7E,U+5E79;U+4E7E,U+4E81,U+69A6,U+6F27
   U+5E79;U+5E72;U+5E79;U+69A6,U+4E7E,U+4E81,U+6F27
   U+69A6;U+5E72;U+69A6;U+5E79,U+4E7E,U+4E81,U+6F27
   U+6F27;U+4E7E;U+6F27;U+4E81,U+5E72,U+5E79,U+69A6

RFC7940 - Page 64

   The corresponding "data" section XML format would look like this:

     <data>
       <char cp="4E7E">
       <var cp="4E7E" type="both" comment="identity" />
       <var cp="4E81" type="blocked" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="4E81">
       <var cp="4E7E" type="trad" />
       <var cp="5E72" type="simp" />
       <var cp="5E79" type="blocked" />
       <var cp="69A6" type="blocked" />
       <var cp="6F27" type="blocked" />
       </char>
       <char cp="5E72">
       <var cp="4E7E" type="trad"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="both" comment="identity"/>
       <var cp="5E79" type="trad"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="5E79">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="trad" comment="identity"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="blocked"/>
       </char>
       <char cp="69A6">
       <var cp="4E7E" type="blocked"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="simp"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="trad" comment="identity"/>
       <var cp="6F27" type="blocked"/>
       </char>

RFC7940 - Page 65

       <char cp="6F27">
       <var cp="4E7E" type="simp"/>
       <var cp="4E81" type="blocked"/>
       <var cp="5E72" type="blocked"/>
       <var cp="5E79" type="blocked"/>
       <var cp="69A6" type="blocked"/>
       <var cp="6F27" type="trad" comment="identity"/>
       </char>
     </data>

   Here, the simplified variants have been given a type of "simp" and
   the traditional variants one of "trad", and all other ones are given
   "blocked".

   Because some variant mappings show in more than one column, while the
   XML format allows only a single type value, they have been given the
   type of "both".

   Note that some variant mappings map to themselves (identity); that
   is, the mapping is reflexive (see Section 5.3.4).  In creating the
   permutation of all variant labels, these mappings have no effect,
   other than adding a value to the variant type list for the variant
   label containing them.

   In the example so far, all of the entries with type="both" are also
   mappings where source and target are identical.  That is, they are
   reflexive mappings as defined in Section 5.3.4.

   Given a label "U+4E7E U+4E81", the following labels would be ruled
   allocatable per [RFC3743], based on how that standard is commonly
   implemented in domain registries:

       Original label:     U+4E7E U+4E81
       Simplified label 1: U+4E7E U+5E72
       Simplified label 2: U+5E72 U+5E72
       Traditional label:  U+4E7E U+4E7E

   However, if allocatable labels were generated simply by a straight
   permutation of all variants with type other than type="blocked" and
   without regard to the simplified and traditional variants, we would
   end up with an extra allocatable label of "U+5E72 U+4E7E".  This
   label is composed of both a Simplified Chinese character and a
   Traditional Chinese code point and therefore shouldn't be
   allocatable.

RFC7940 - Page 66

   To more fully resolve the dispositions requires several actions to be
   defined, as described in Section 7.2.2, that will override the
   default actions from Section 7.6.  After blocking all labels that
   contain a variant with type "blocked", these actions will set to
   "allocatable" labels based on the following variant types: "simp",
   "trad", and "both".  Note that these variant types do not directly
   relate to dispositions for the variant label, but that the actions
   will resolve them to the Standard Dispositions on labels, i.e.,
   "blocked" and "allocatable".

   To resolve label dispositions requires five actions to be defined (in
   the "rules" section of the XML document in question); these actions
   apply in order, and the first one triggered defines the disposition
   for the label.  The actions are as follows:

   1.  Block all variant labels containing at least one blocked variant.

   2.  Allocate all labels that consist entirely of variants that are
       "simp" or "both".

   3.  Also allocate all labels that are entirely "trad" or "both".

   4.  Block all surviving labels containing any one of the dispositions
       "simp" or "trad" or "both", because they are now known to be part
       of an undesirable mixed simplified/traditional label.

   5.  Allocate any remaining label; the original label would be such a
       label.

   The rules declarations would be represented as:

     <rules>
       <!--"action" elements - order defines precedence-->
       <action disp="blocked" any-variant="blocked" />
       <action disp="allocatable" only-variants="simp both" />
       <action disp="allocatable" only-variants="trad both" />
       <action disp="blocked" any-variant="simp trad" />
       <action disp="allocatable" comment="catch-all" />
     </rules>

   Up to now, variants with type "both" have occurred only associated
   with reflexive variant mappings.  The "action" elements defined above
   rely on the assumption that this is always the case.  However,
   consider the following set of variants:

       U+62E0;U+636E;U+636E;U+64DA
       U+636E;U+636E;U+64DA;U+62E0
       U+64DA;U+636E;U+64DA;U+62E0

RFC7940 - Page 67

   The corresponding XML would be:

       <char cp="62E0">
       <var cp="636E" type="both" comment="both, but not reflexive" />
       <var cp="64DA" type="blocked" />
       </char>
       <char cp="636E">
       <var cp="636E" type="simp" comment="reflexive, but not both" />
       <var cp="64DA" type="trad" />
       <var cp="62E0" type="blocked" />
       </char>
       <char cp="64DA">
       <var cp="636E" type="simp" />
       <var cp="64DA" type="trad" comment="reflexive" />
       <var cp="62E0" type="blocked" />
       </char>

   To make such variant sets work requires a way to selectively trigger
   an action based on whether a variant type is associated with an
   identity or reflexive mapping, or is associated with an ordinary
   variant mapping.  This can be done by adding a prefix "r-" to the
   "type" attribute on reflexive variant mappings.  For example, the
   "trad" for code point U+64DA in the preceding figure would become
   "r-trad".

   With the dispositions prepared in this way, only a slight
   modification to the actions is needed to yield the correct set of
   allocatable labels:

   <action disp="blocked" any-variant="blocked" />
   <action disp="allocatable" only-variants="simp r-simp both r-both" />
   <action disp="allocatable" only-variants="trad r-trad both r-both" />
   <action disp="blocked" all-variants="simp trad both" />
   <action disp="allocatable" />

   The first three actions get triggered by the same labels as before.

   The fourth action blocks any label that combines an original code
   point with any mix of ordinary variant mappings; however, no labels
   that are a combination of only original code points (code points
   having either no variant mappings or a reflexive mapping) would be
   affected.  These are the original labels, and they are allocated in
   the last action.

RFC7940 - Page 68

   Using this scheme of assigning types to ordinary and reflexive
   variants, all tables in the style of RFC 3743 can be converted to
   XML.  By defining a set of actions as outlined above, the LGR will
   yield the correct set of allocatable variants: all variants
   consisting completely of variant code points preferred for simplified
   or traditional, respectively, will be allocated, as will be the
   original label.  All other variant labels will be blocked.

Appendix C.  Indic Syllable Structure Example

   In LGRs for Indic scripts, it may be desirable to restrict valid
   labels to sequences of valid Indic syllables, or aksharas.  This
   appendix gives a sample set of rules designed to enforce this
   restriction.

   Below is an example of BNF for an akshara, which has been published
   in "Devanagari Script Behaviour for Hindi" [TDIL-HINDI].  The rules
   for other languages and scripts used in India are expected to be
   generally similar.

   For Hindi, the BNF has the form:

       V[m]|{C[N]H}C[N](H|[v][m])

   Where:

   V    (uppercase) is any independent vowel

   m    is any vowel modifier (Devanagari Anusvara, Visarga, and
        Candrabindu)

   C    is any consonant (with inherent vowel)

   N    is Nukta

   H    is a halant (or virama)

   v    (lowercase) is any dependent vowel sign (matra)

   {}   encloses items that may be repeated one or more times

   [ ]  encloses items that may or may not be present

   |    separates items, out of which only one can be present

RFC7940 - Page 69

   By using the Unicode character property "InSC" or
   "Indic_Syllabic_Category", which corresponds rather directly to the
   classification of characters in the BNF above, we can translate the
   BNF into a set of WLE rules matching the definition of an akshara.

     <rules>
       <!--Character class definitions go here-->
       <class name="halant" property="InSC:Virama" />
       <union name="vowel-modifier">
         <class property="InSC:Visarga" />
         <class property="InSC:Bindu" comment="includes anusvara" />
       </union>
       <!--Whole label evaluation and context rules go here-->
       <rule name="consonant-with-optional-nukta">
           <class by-ref="InSC:Consonant" />
           <class by-ref="InSC:Nukta" count="0:1"/>
       </rule>
       <rule name="independent-vowel-with-optional-modifier">
           <class by-ref="InSC:Vowel_Independent" />
           <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="optional-dependent-vowel-with-opt-modifier" >
         <class by-ref="InSC:Vowel_Dependent" count="0:1" />
         <class by-ref="vowel-modifier" count="0:1" />
       </rule>
       <rule name="consonant-cluster">
         <rule count="0+">
           <rule by-ref="consonant-with-optional-nukta" />
           <class by-ref="halant" />
         </rule>
         <rule by-ref="consonant-with-optional-nukta" />
         <choice>
           <class by-ref="halant" />
           <rule by-ref="optional-dependent-vowel-with-opt-modifier" />
         </choice>
       </rule>
       <rule name="akshara">
         <choice>
           <rule by-ref="independent-vowel-with-optional-modifier" />
           <rule by-ref="consonant-cluster" />
         </choice>
       </rule>

RFC7940 - Page 70

       <rule name="WLE-akshara-or-other" comment="series of one or
           more aksharas, possibly alternating with other types of
           code points such as digits">
         <start />
         <choice count="1+">
           <class property="InSC:other" />
           <rule by-ref="akshara" />
         </choice>
         <end />
       </rule>
       <!--"action" elements go here - order defines precedence-->
       <action disp="invalid" not-match="WLE-akshara-or-other" />
     </rules>

   With the rules and classes as defined above, the final action assigns
   a disposition of "invalid" to all labels that are not composed of a
   sequence of well-formed aksharas, optionally interspersed with other
   characters, perhaps digits, for example.

   The relevant Unicode character property could be replicated by
   tagging repertoire values directly in the LGR; this would remove the
   dependency on any specific version of the Unicode Standard.

   Generally, dependent vowels may only follow consonant expressions;
   however, for some scripts, like Bengali, the Unicode Standard
   supports sequences of dependent vowels or their application on
   independent vowels.  This makes the definition of akshara less
   restrictive.

C.1.  Reducing Complexity

   As presented in this example, the rules are rather complex --
   although useful in demonstrating the features of the XML format, such
   complexity would be an undesirable feature in an actual LGR.

   It is possible to reduce the complexity of the rules in this example
   by defining alternate rules that simply define the permissible
   pair-wise context of adjacent code points by character class, such as
   a rule that a halant can only follow a (nuktated) consonant.  Such
   pair-wise contexts are easier to understand, implement, and verify,
   and have the additional benefit of allowing tools to better pinpoint
   why a label failed to validate.  They also tend to correspond more
   directly to the kind of well-formedness requirements that are most
   relevant to DNS security, like the requirement to limit the
   application of a combining mark (such as a vowel modifier) to only
   selected base characters (in this case, vowels).  (See the example
   and discussion in [WLE-RULES].)

RFC7940 - Page 71

Appendix D.  RELAX NG Compact Schema

   This schema is provided in RELAX NG Compact format [RELAX-NG].

   <CODE BEGINS>
   #
   # LGR XML Schema 1.0
   #

   default namespace = "urn:ietf:params:xml:ns:lgr-1.0"

   #
   # SIMPLE TYPES
   #

   # RFC 5646 language tag (e.g., "de", "und-Latn")
   language-tag = xsd:token

   # The scope to which the LGR applies.  For the "domain" scope type,
   # it should be a fully qualified domain name.
   scope-value = xsd:token {
       minLength = "1"
   }

   ## a single code point
   code-point = xsd:token {
       pattern = "[0-9A-F]{4,6}"
   }

   ## a space-separated sequence of code points
   code-point-sequence = xsd:token {
       pattern = "[0-9A-F]{4,6}( [0-9A-F]{4,6})+"
   }

   ## single code point, or a sequence of code points, or empty string
   code-point-literal = code-point | code-point-sequence | ""

   ## code point or sequence only
   non-empty-code-point-literal = code-point | code-point-sequence

   ## code point sent represented in short form
   code-point-set-shorthand = xsd:token {
       pattern = "([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6})"
                 ~ "( ([0-9A-F]{4,6}|[0-9A-F]{4,6}-[0-9A-F]{4,6}))*"
   }

RFC7940 - Page 72

   ## dates are used in information fields in the meta
   ## section ("YYYY-MM-DD")
   date-pattern = xsd:token {
       pattern = "\d{4}-\d\d-\d\d"
   }

   ## variant type
   ## the variant type MUST be non-empty and MUST NOT
   ## start with a "_"; using xsd:NMTOKEN here because
   ## we need space-separated lists of them
   variant-type = xsd:NMTOKEN

   ## variant type list for action triggers
   ## the list MUST NOT be empty, and entries MUST NOT
   ## start with a "_"
   variant-type-list = xsd:NMTOKENS

   ## reference to a rule name (used in "when" and "not-when"
   ## attributes, as well as the "by-ref" attribute of the "rule"
   ## element).
   rule-ref = xsd:IDREF

   ## a space-separated list of tags.  Tags should generally follow
   ## xsd:Name syntax.  However, we are using the xsd:NMTOKENS here
   ## because there is no native XSD datatype for space-separated
   ## xsd:Name
   tags = xsd:NMTOKENS

   ## The value space of a "from-tag" attribute.  Although it is closer
   ## to xsd:IDREF lexically and semantically, tags are not unique in
   ## the document.  As such, we are unable to take advantage of
   ## facilities provided by a validator.  xsd:NMTOKEN is used instead
   ## of the stricter xsd:Names here so as to be consistent with
   ## the above.
   tag-ref = xsd:NMTOKEN

   ## an identifier type (used by "name" attributes).
   identifier = xsd:ID

   ## used in the class "by-ref" attribute to reference another class of
   ## the same "name" attribute value.
   class-ref = xsd:IDREF

   ## "count" attribute pattern ("n", "n+", or "n:m")
   count-pattern = xsd:token {
       pattern = "\d+(\+|:\d+)?"
   }

RFC7940 - Page 73

   ## "ref" attribute pattern
   ## space-separated list of "id" attribute values for
   ## "reference" elements.  These reference ids
   ## must be declared in a "reference" element
   ## before they can be used in a "ref" attribute
   ref-pattern = xsd:token {
       pattern = "[\-_.:0-9A-Z]+( [\-_.:0-9A-Z]+)*"
   }

   #
   # STRUCTURES
   #

   ## Representation of a single code point or a sequence of code
   ## points
   char = element char {
       attribute cp { code-point-literal },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?,
         variant*
   }

   ## Representation of a range of code points
   range = element range {
       attribute first-cp { code-point },
       attribute last-cp { code-point },
       attribute comment { text }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute tag { tags }?,
       attribute ref { ref-pattern }?
   }

   ## Representation of a variant code point or sequence
   variant = element var {
       attribute cp { code-point-literal },
       attribute type { xsd:NMTOKEN }?,
       attribute when { rule-ref }?,
       attribute not-when { rule-ref }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }

RFC7940 - Page 74

   #
   # Classes
   #

   ## a "class" element that references the name of another "class"
   ## (or set-operator like "union") defined elsewhere.
   ## If used as a matcher (appearing under a "rule" element),
   ## the "count" attribute may be present.
   class-invocation = element class { class-invocation-content }

   class-invocation-content =
       attribute by-ref { class-ref },
       attribute count { count-pattern }?,
       attribute comment { text }?

   ## defines a new class (set of code points) using Unicode property
   ## or code points of the same tag value or code point literals
   class-declaration = element class { class-declaration-content }

   class-declaration-content =
       # "name" attribute MUST be present if this is a "top-level"
       # class declaration, i.e., appearing directly under the "rules"
       # element.  Otherwise, it MUST be absent.
       attribute name { identifier }?,
       # If used as a matcher (appearing in a "rule" element, but not
       # when nested inside a set-operator or class), the "count"
       # attribute may be present.  Otherwise, it MUST be absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (
         # define the class by property (e.g., property="sc:Latn"), OR
         attribute property { xsd:NMTOKEN }
         # define the class by tagged code points, OR
         | attribute from-tag { tag-ref }
         # text node to allow for shorthand notation
         # e.g., "0061 0062-0063"
         | code-point-set-shorthand
       )

RFC7940 - Page 75

   class-invocation-or-declaration = element class {
     class-invocation-content | class-declaration-content
   }

   class-or-set-operator-nested =
     class-invocation-or-declaration | set-operator

   class-or-set-operator-declaration =
     # a "class" element or set-operator (effectively defining a class)
     # directly in the "rules" element.
     class-declaration | set-operator


   #
   # set-operators
   #

   complement-operator = element complement {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested
   }

   union-operator = element union {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       # needs two or more child elements
       class-or-set-operator-nested+
   }

RFC7940 - Page 76

   intersection-operator = element intersection {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }

   difference-operator = element difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }

   symmetric-difference-operator = element symmetric-difference {
       attribute name { identifier }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # "count" attribute MUST only be used when this set-operator is
       # used as a matcher (i.e., nested in a "rule" element but not
       # inside a set-operator or class)
       attribute count { count-pattern }?,
       class-or-set-operator-nested,
       class-or-set-operator-nested
   }

   ## operators that transform class(es) into a new class.
   set-operator = complement-operator
                  | union-operator
                  | intersection-operator
                  | difference-operator
                  | symmetric-difference-operator

RFC7940 - Page 77

   #
   # Match operators (matchers)
   #

   any-matcher = element any {
       attribute count { count-pattern }?,
       attribute comment { text }?
   }

   choice-matcher = element choice {
       ## "count" attribute MUST only be used when the choice-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       # two or more match operators
       match-operator-choice,
       match-operator-choice+
   }

   char-matcher =
     # for use as a matcher - like "char" but without a "tag" attribute
     element char {
       attribute cp { non-empty-code-point-literal },
       # If used as a matcher (appearing in a "rule" element), the
       # "count" attribute may be present.  Otherwise, it MUST be
       # absent.
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?
   }

   start-matcher = element start {
       attribute comment { text }?
   }

   end-matcher = element end {
       attribute comment { text }?
   }

   anchor-matcher = element anchor {
       attribute comment { text }?
   }

RFC7940 - Page 78

   look-ahead-matcher = element look-ahead {
       attribute comment { text }?,
       match-operators-non-pos
   }
   look-behind-matcher = element look-behind {
       attribute comment { text }?,
       match-operators-non-pos
   }

   ## non-positional match operator that can be used as a direct child
   ## element of the choice-matcher.
   match-operator-choice = (
     any-matcher | choice-matcher | start-matcher | end-matcher
     | char-matcher | class-or-set-operator-nested | rule-matcher
   )

   ## non-positional match operators do not contain any "anchor",
   ## "look-behind", or "look-ahead" elements.
   match-operators-non-pos = (
     start-matcher?,
     (any-matcher | choice-matcher | char-matcher
      | class-or-set-operator-nested | rule-matcher)*,
     end-matcher?
   )

   ## positional match operators have an "anchor" element, which may be
   ## preceded by a "look-behind" element, or followed by a "look-ahead"
   ## element, or both.
   match-operators-pos =
     look-behind-matcher?, anchor-matcher, look-ahead-matcher?

   match-operators = match-operators-non-pos | match-operators-pos

RFC7940 - Page 79

   #
   # Rules
   #

   # top-level rule must have "name" attribute
   rule-declaration-top = element rule {
       attribute name { identifier },
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       match-operators
   }

   ## "rule" element used as a matcher (either "by-ref" or contains
   ## other match operators itself)
   rule-matcher =
     element rule {
       ## "count" attribute MUST only be used when the rule-matcher
       ## contains no nested "start", "end", "anchor", "look-behind",
       ## or "look-ahead" operators and no nested rule-matchers
       ## containing any of these elements
       attribute count { count-pattern }?,
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       (attribute by-ref { rule-ref } | match-operators)
     }

   #
   # Actions
   #

   action-declaration = element action {
       attribute comment { text }?,
       attribute ref { ref-pattern }?,
       # dispositions are often named after variant types or vice versa
       attribute disp { variant-type },
       ( attribute match { rule-ref }
         | attribute not-match { rule-ref } )?,
       ( attribute any-variant { variant-type-list }
         | attribute all-variants { variant-type-list }
         | attribute only-variants { variant-type-list } )?
   }

RFC7940 - Page 80

   # DOCUMENT STRUCTURE

   start = lgr
   lgr = element lgr {
       meta-section?,
       data-section,
       rules-section?
   }

   ## Meta section - information recorded with an LGR that generally
   ## does not affect machine processing (except for "unicode-version").
   ## However, if any "class-declaration" uses the "property" attribute,
   ## a "unicode-version" element MUST be present.
   meta-section = element meta {
       element version {
           attribute comment { text }?,
           text
       }?
       & element date { date-pattern }?
       & element language { language-tag }*
       & element scope {
           # type may by "domain" or an application-defined value
           attribute type { xsd:NCName },
           scope-value
       }*
       & element validity-start { date-pattern }?
       & element validity-end { date-pattern }?
       & element unicode-version {
           xsd:token {
               pattern = "\d+\.\d+\.\d+"
           }
       }?
       & element description {
           # this SHOULD be a valid MIME type
           attribute type { text }?,
           text
       }?

RFC7940 - Page 81

       & element references {
           element reference {
               attribute id {
                   xsd:token {
                       # limit "id" attribute to uppercase letters,
                       # digits, and a few punctuation marks; use of
                       # integers is RECOMMENDED
                       pattern = "[\-_.:0-9A-Z]*"
                       minLength = "1"
                   }
                },
                attribute comment { text }?,
                text
           }*
       }?
   }

   data-section = element data { (char | range)+ }

   ## Note that action declarations are strictly order dependent.
   ## class-or-set-operator-declaration and rule-declaration-top
   ## are weakly order dependent; they must precede first use of the
   ## identifier via "by-ref".
   rules-section = element rules {
     ( class-or-set-operator-declaration
       | rule-declaration-top
       | action-declaration)*
   }

   <CODE ENDS>

RFC7940 - Page 82

Acknowledgements

   This format builds upon the work on documenting IDN tables by many
   different registry operators.  Notably, a comprehensive language
   table for Chinese, Japanese, and Korean was developed by the "Joint
   Engineering Team" [RFC3743]; this table is the basis of many registry
   policies.  Also, a set of guidelines for Arabic script registrations
   [RFC5564] was published by the Arabic-language community.

   Contributions that have shaped this document have been provided by
   Francisco Arias, Julien Bernard, Mark Davis, Martin Duerst, Paul
   Hoffman, Sarmad Hussain, Barry Leiba, Alexander Mayrhofer, Alexey
   Melnikov, Nicholas Ostler, Thomas Roessler, Audric Schiltknecht,
   Steve Sheng, Michel Suignard, Andrew Sullivan, Wil Tan, and John
   Yunker.

Authors' Addresses

   Kim Davies
   Internet Corporation for Assigned Names and Numbers
   12025 Waterfront Drive
   Los Angeles, CA  90094
   United States of America

   Phone: +1 310 301 5800
   Email: kim.davies@icann.org
   URI:   http://www.icann.org/


   Asmus Freytag
   ASMUS, Inc.

   Email: asmus@unicode.org

RFC 7940

Representing Label Generation Rulesets Using XML

Appendix B. How to Translate Tables Based on RFC 3743 into the XML Format

Appendix C. Indic Syllable Structure Example

C.1. Reducing Complexity

Appendix D. RELAX NG Compact Schema

Acknowledgements

Authors' Addresses