Tech-invite3GPPspaceIETFspace
96959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 7564

PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols

Pages: 40
Obsoletes:  3454
Obsoleted by:  8264
Part 2 of 3 – Pages 17 to 26
First   Prev   Next

ToP   noToC   RFC7564 - Page 17   prevText

6. Applications

6.1. How to Use PRECIS in Applications

Although PRECIS has been designed with applications in mind, internationalization is not suddenly made easy through the use of PRECIS. Application developers still need to give some thought to how they will use the PRECIS string classes, or profiles thereof, in their applications. This section provides some guidelines to application developers (and to expert reviewers of application protocol specifications). o Don't define your own profile unless absolutely necessary (see Section 5.1). Existing profiles have been designed for wide reuse. It is highly likely that an existing profile will meet your needs, especially given the ability to specify further excluded characters (Section 6.2) and to build application-layer constructs (see Section 6.3). o Do specify: * Exactly which entities are responsible for preparation, enforcement, and comparison of internationalized strings (e.g., servers or clients). * Exactly when those entities need to complete their tasks (e.g., a server might need to enforce the rules of a profile before allowing a client to gain network access).
ToP   noToC   RFC7564 - Page 18
      *  Exactly which protocol slots need to be checked against which
         profiles (e.g., checking the address of a message's intended
         recipient against the UsernameCaseMapped profile
         [PRECIS-Users-Pwds] of the IdentifierClass, or checking the
         password of a user against the OpaqueString profile
         [PRECIS-Users-Pwds] of the FreeformClass).

      See [PRECIS-Users-Pwds] and [XMPP-Addr-Format] for definitions of
      these matters for several applications.

6.2. Further Excluded Characters

An application protocol that uses a profile MAY specify particular code points that are not allowed in relevant slots within that application protocol, above and beyond those excluded by the string class or profile. That is, an application protocol MAY do either of the following: 1. Exclude specific code points that are allowed by the relevant string class. 2. Exclude characters matching certain Unicode properties (e.g., math symbols) that are included in the relevant PRECIS string class. As a result of such exclusions, code points that are defined as valid for the PRECIS string class or profile will be defined as disallowed for the relevant protocol slot. Typically, such exclusions are defined for the purpose of backward compatibility with legacy formats within an application protocol. These are defined for application protocols, not profiles, in order to prevent multiplication of profiles beyond necessity (see Section 5.1).

6.3. Building Application-Layer Constructs

Sometimes, an application-layer construct does not map in a straightforward manner to one of the base string classes or a profile thereof. Consider, for example, the "simple user name" construct in the Simple Authentication and Security Layer (SASL) [RFC4422]. Depending on the deployment, a simple user name might take the form of a user's full name (e.g., the user's personal name followed by a space and then the user's family name). Such a simple user name cannot be defined as an instance of the IdentifierClass or a profile thereof, since space characters are not allowed in the
ToP   noToC   RFC7564 - Page 19
   IdentifierClass; however, it could be defined using a space-separated
   sequence of IdentifierClass instances, as in the following ABNF
   [RFC5234] from [PRECIS-Users-Pwds]:

      username   = userpart *(1*SP userpart)
      userpart   = 1*(idbyte)
                   ;
                   ; an "idbyte" is a byte used to represent a
                   ; UTF-8 encoded Unicode code point that can be
                   ; contained in a string that conforms to the
                   ; PRECIS "IdentifierClass"
                   ;

   Similar techniques could be used to define many application-layer
   constructs, say of the form "user@domain" or "/path/to/file".

7. Order of Operations

To ensure proper comparison, the rules specified for a particular string class or profile MUST be applied in the following order: 1. Width Mapping Rule 2. Additional Mapping Rule 3. Case Mapping Rule 4. Normalization Rule 5. Directionality Rule 6. Behavioral rules for determining whether a code point is valid, allowed under a contextual rule, disallowed, or unassigned As already described, the width mapping, additional mapping, case mapping, normalization, and directionality rules are specified for each profile, whereas the behavioral rules are specified for each string class. Some of the logic behind this order is provided under Section 5.2.1 (see also the PRECIS mappings document [PRECIS-Mappings]).
ToP   noToC   RFC7564 - Page 20

8. Code Point Properties

In order to implement the string classes described above, this document does the following: 1. Reviews and classifies the collections of code points in the Unicode character set by examining various code point properties. 2. Defines an algorithm for determining a derived property value, which can vary depending on the string class being used by the relevant application protocol. This document is not intended to specify precisely how derived property values are to be applied in protocol strings. That information is the responsibility of the protocol specification that uses or profiles a PRECIS string class from this document. The value of the property is to be interpreted as follows. PROTOCOL VALID Those code points that are allowed to be used in any PRECIS string class (currently, IdentifierClass and FreeformClass). The abbreviated term "PVALID" is used to refer to this value in the remainder of this document. SPECIFIC CLASS PROTOCOL VALID Those code points that are allowed to be used in specific string classes. In the remainder of this document, the abbreviated term *_PVAL is used, where * = (ID | FREE), i.e., either "FREE_PVAL" or "ID_PVAL". In practice, the derived property ID_PVAL is not used in this specification, since every ID_PVAL code point is PVALID. CONTEXTUAL RULE REQUIRED Some characteristics of the character, such as its being invisible in certain contexts or problematic in others, require that it not be used in labels unless specific other characters or properties are present. As in IDNA2008, there are two subdivisions of CONTEXTUAL RULE REQUIRED -- the first for Join_controls (called "CONTEXTJ") and the second for other characters (called "CONTEXTO"). A character with the derived property value CONTEXTJ or CONTEXTO MUST NOT be used unless an appropriate rule has been established and the context of the character is consistent with that rule. The most notable of the CONTEXTUAL RULE REQUIRED characters are the Join Control characters U+200D ZERO WIDTH JOINER and U+200C ZERO WIDTH NON-JOINER, which have a derived property value of CONTEXTJ. See Appendix A of [RFC5892] for more information. DISALLOWED Those code points that are not permitted in any PRECIS string class.
ToP   noToC   RFC7564 - Page 21
   SPECIFIC CLASS DISALLOWED  Those code points that are not to be
      included in one of the string classes but that might be permitted
      in others.  In the remainder of this document, the abbreviated
      term *_DIS is used, where * = (ID | FREE), i.e., either "FREE_DIS"
      or "ID_DIS".  In practice, the derived property FREE_DIS is not
      used in this specification, since every FREE_DIS code point is
      DISALLOWED.

   UNASSIGNED  Those code points that are not designated (i.e., are
      unassigned) in the Unicode Standard.

   The algorithm to calculate the value of the derived property is as
   follows (implementations MUST NOT modify the order of operations
   within this algorithm, since doing so would cause inconsistent
   results across implementations):

   If .cp. .in. Exceptions Then Exceptions(cp);
   Else If .cp. .in. BackwardCompatible Then BackwardCompatible(cp);
   Else If .cp. .in. Unassigned Then UNASSIGNED;
   Else If .cp. .in. ASCII7 Then PVALID;
   Else If .cp. .in. JoinControl Then CONTEXTJ;
   Else If .cp. .in. OldHangulJamo Then DISALLOWED;
   Else If .cp. .in. PrecisIgnorableProperties Then DISALLOWED;
   Else If .cp. .in. Controls Then DISALLOWED;
   Else If .cp. .in. HasCompat Then ID_DIS or FREE_PVAL;
   Else If .cp. .in. LetterDigits Then PVALID;
   Else If .cp. .in. OtherLetterDigits Then ID_DIS or FREE_PVAL;
   Else If .cp. .in. Spaces Then ID_DIS or FREE_PVAL;
   Else If .cp. .in. Symbols Then ID_DIS or FREE_PVAL;
   Else If .cp. .in. Punctuation Then ID_DIS or FREE_PVAL;
   Else DISALLOWED;

   The value of the derived property calculated can depend on the string
   class; for example, if an identifier used in an application protocol
   is defined as profiling the PRECIS IdentifierClass then a space
   character such as U+0020 would be assigned to ID_DIS, whereas if an
   identifier is defined as profiling the PRECIS FreeformClass then the
   character would be assigned to FREE_PVAL.  For the sake of brevity,
   the designation "FREE_PVAL" is used herein, instead of the longer
   designation "ID_DIS or FREE_PVAL".  In practice, the derived
   properties ID_PVAL and FREE_DIS are not used in this specification,
   since every ID_PVAL code point is PVALID and every FREE_DIS code
   point is DISALLOWED.

   Use of the name of a rule (such as "Exceptions") implies the set of
   code points that the rule defines, whereas the same name as a
   function call (such as "Exceptions(cp)") implies the value that the
   code point has in the Exceptions table.
ToP   noToC   RFC7564 - Page 22
   The mechanisms described here allow determination of the value of the
   property for future versions of Unicode (including characters added
   after Unicode 5.2 or 7.0 depending on the category, since some
   categories mentioned in this document are simply pointers to IDNA2008
   and therefore were defined at the time of Unicode 5.2).  Changes in
   Unicode properties that do not affect the outcome of this process
   therefore do not affect this framework.  For example, a character can
   have its Unicode General_Category value (at the time of this writing,
   see Chapter 4 of [Unicode7.0]) change from So to Sm, or from Lo to
   Ll, without affecting the algorithm results.  Moreover, even if such
   changes were to result, the BackwardCompatible list (Section 9.7) can
   be adjusted to ensure the stability of the results.

9. Category Definitions Used to Calculate Derived Property

The derived property obtains its value based on a two-step procedure: 1. Characters are placed in one or more character categories either (1) based on core properties defined by the Unicode Standard or (2) by treating the code point as an exception and addressing the code point based on its code point value. These categories are not mutually exclusive. 2. Set operations are used with these categories to determine the values for a property specific to a given string class. These operations are specified under Section 8. Note: Unicode property names and property value names might have short abbreviations, such as "gc" for the General_Category property and "Ll" for the Lowercase_Letter property value of the gc property. In the following specification of character categories, the operation that returns the value of a particular Unicode character property for a code point is designated by using the formal name of that property (from the Unicode PropertyAliases.txt file [PropertyAliases] followed by "(cp)" for "code point". For example, the value of the General_Category property for a code point is indicated by General_Category(cp). The first ten categories (A-J) shown below were previously defined for IDNA2008 and are referenced from [RFC5892] to ease the understanding of how PRECIS handles various characters. Some of these categories are reused in PRECIS, and some of them are not; however, the lettering of categories is retained to prevent overlap and to ease implementation of both IDNA2008 and PRECIS in a single software application. The next eight categories (K-R) are specific to PRECIS.
ToP   noToC   RFC7564 - Page 23

9.1. LetterDigits (A)

This category is defined in Section 2.1 of [RFC5892] and is included by reference for use in PRECIS.

9.2. Unstable (B)

This category is defined in Section 2.2 of [RFC5892]. However, it is not used in PRECIS.

9.3. IgnorableProperties (C)

This category is defined in Section 2.3 of [RFC5892]. However, it is not used in PRECIS. Note: See the PrecisIgnorableProperties ("M") category below for a more inclusive category used in PRECIS identifiers.

9.4. IgnorableBlocks (D)

This category is defined in Section 2.4 of [RFC5892]. However, it is not used in PRECIS.

9.5. LDH (E)

This category is defined in Section 2.5 of [RFC5892]. However, it is not used in PRECIS. Note: See the ASCII7 ("K") category below for a more inclusive category used in PRECIS identifiers.

9.6. Exceptions (F)

This category is defined in Section 2.6 of [RFC5892] and is included by reference for use in PRECIS.

9.7. BackwardCompatible (G)

This category is defined in Section 2.7 of [RFC5892] and is included by reference for use in PRECIS. Note: Management of this category is handled via the processes specified in [RFC5892]. At the time of this writing (and also at the time that RFC 5892 was published), this category consisted of the empty set; however, that is subject to change as described in RFC 5892.
ToP   noToC   RFC7564 - Page 24

9.8. JoinControl (H)

This category is defined in Section 2.8 of [RFC5892] and is included by reference for use in PRECIS.

9.9. OldHangulJamo (I)

This category is defined in Section 2.9 of [RFC5892] and is included by reference for use in PRECIS.

9.10. Unassigned (J)

This category is defined in Section 2.10 of [RFC5892] and is included by reference for use in PRECIS.

9.11. ASCII7 (K)

This PRECIS-specific category consists of all printable, non-space characters from the 7-bit ASCII range. By applying this category, the algorithm specified under Section 8 exempts these characters from other rules that might be applied during PRECIS processing, on the assumption that these code points are in such wide use that disallowing them would be counter-productive. K: cp is in {0021..007E}

9.12. Controls (L)

This PRECIS-specific category consists of all control characters. L: Control(cp) = True

9.13. PrecisIgnorableProperties (M)

This PRECIS-specific category is used to group code points that are discouraged from use in PRECIS string classes. M: Default_Ignorable_Code_Point(cp) = True or Noncharacter_Code_Point(cp) = True The definition for Default_Ignorable_Code_Point can be found in the DerivedCoreProperties.txt file [DerivedCoreProperties].
ToP   noToC   RFC7564 - Page 25

9.14. Spaces (N)

This PRECIS-specific category is used to group code points that are space characters. N: General_Category(cp) is in {Zs}

9.15. Symbols (O)

This PRECIS-specific category is used to group code points that are symbols. O: General_Category(cp) is in {Sm, Sc, Sk, So}

9.16. Punctuation (P)

This PRECIS-specific category is used to group code points that are punctuation characters. P: General_Category(cp) is in {Pc, Pd, Ps, Pe, Pi, Pf, Po}

9.17. HasCompat (Q)

This PRECIS-specific category is used to group code points that have compatibility equivalents as explained in the Unicode Standard (at the time of this writing, see Chapters 2 and 3 of [Unicode7.0]). Q: toNFKC(cp) != cp The toNFKC() operation returns the code point in normalization form KC. For more information, see Section 5 of Unicode Standard Annex #15 [UAX15].

9.18. OtherLetterDigits (R)

This PRECIS-specific category is used to group code points that are letters and digits other than the "traditional" letters and digits grouped under the LetterDigits (A) class (see Section 9.1). R: General_Category(cp) is in {Lt, Nl, No, Me}
ToP   noToC   RFC7564 - Page 26

10. Guidelines for Designated Experts

Experience with internationalization in application protocols has shown that protocol designers and application developers usually do not understand the subtleties and tradeoffs involved with internationalization and that they need considerable guidance in making reasonable decisions with regard to the options before them. Therefore: o Protocol designers are strongly encouraged to question the assumption that they need to define new profiles, since existing profiles are designed for wide reuse (see Section 5 for further discussion). o Those who persist in defining new profiles are strongly encouraged to clearly explain a strong justification for doing so, and to publish a stable specification that provides all of the information described under Section 11.3. o The designated experts for profile registration requests ought to seek answers to all of the questions provided under Section 11.3 and to encourage applicants to provide a stable specification documenting the profile (even though the registration policy for PRECIS profiles is Expert Review and a stable specification is not strictly required). o Developers of applications that use PRECIS are strongly encouraged to apply the guidelines provided under Section 6 and to seek out the advice of the designated experts or other knowledgeable individuals in doing so. o All parties are strongly encouraged to help prevent the multiplication of profiles beyond necessity, as described under Section 5.1, and to use PRECIS in ways that will minimize user confusion and insecure application behavior. Internationalization can be difficult and contentious; designated experts, profile registrants, and application developers are strongly encouraged to work together in a spirit of good faith and mutual understanding to achieve rough consensus on profile registration requests and the use of PRECIS in particular applications. They are also encouraged to bring additional expertise into the discussion if that would be helpful in adding perspective or otherwise resolving issues.


(next page on part 3)

Next Section