RFC 4997

Formal Notation for RObust Header Compression (ROHC-FN)

Pages: 62
Proposed Standard

Part 3 of 3 – Pages 41 to 62

RFC4997 - Page 41 prevText

5.  Security Considerations

   This document describes a formal notation similar to ABNF [RFC4234],
   and hence is not believed to raise any security issues (note that
   ABNF has a completely separate purpose to the ROHC formal notation).

6.  Contributors

   Richard Price did much of the foundational work on the formal
   notation.  He authored the initial document describing a formal
   notation on which this document is based.

   Kristofer Sandlund contributed to this work by applying new ideas to
   the ROHC-TCP profile, by providing feedback, and by helping resolve
   different issues during the entire development of the notation.

   Carsten Bormann provided the translation of the formal notation
   syntax using ABNF in Appendix A, and also contributed with feedback
   and reviews to validate the completeness and correctness of the
   notation.

7.  Acknowledgements

   A number of important concepts and ideas have been borrowed from ROHC
   [RFC3095].

   Thanks to Mark West, Eilert Brinkmann, Alan Ford, and Lars-Erik
   Jonsson for their contributions, reviews, and feedback that led to
   significant improvements to the readability, completeness, and
   overall quality of the notation.

   Thanks to Stewart Sadler, Caroline Daniels, Alan Finney, and David
   Findlay for their reviews and comments.  Thanks to Rob Hancock and
   Stephen McCann for their early work on the formal notation.  The

RFC4997 - Page 42

   authors would also like to thank Christian Schmidt, Qian Zhang,
   Hongbin Liao, and Max Riegel for their comments and valuable input.

   Additional thanks: this document was reviewed during working group
   last-call by committed reviewers Mark West, Carsten Bormann, and Joe
   Touch, as well as by Sally Floyd who provided a review at the request
   of the Transport Area Directors.  Thanks also to Magnus Westerlund
   for his feedback in preparation for the IESG review.

8.  References

8.1.  Normative References

   [C90]      ISO/IEC, "ISO/IEC 9899:1990 Information technology --
              Programming Language C", ISO 9899:1990, April 1990.

   [RFC2822]  Resnick, P., Ed., "STANDARD FOR THE FORMAT OF ARPA
              INTERNET TEXT MESSAGES", RFC 2822, April 2001.

   [RFC4234]  Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax
              Specifications: ABNF", RFC 4234, October 2005.

   [RFC4995]  Jonsson, L-E., Pelletier, G., and K. Sandlund, "The RObust
              Header Compression (ROHC) Framework", RFC 4995, July 2007.

8.2.  Informative References

   [RFC3095]  Bormann, C., Burmeister, C., Degermark, M., Fukushima, H.,
              Hannu, H., Jonsson, L-E., Hakenberg, R., Koren, T., Le,
              K., Liu, Z., Martensson, A., Miyazaki, A., Svanbro, K.,
              Wiebke, T., Yoshimura, T., and H. Zheng, "RObust Header
              Compression (ROHC): Framework and four profiles: RTP, UDP,
              ESP, and uncompressed", RFC 3095, July 2001.

   [RFC791]   University of Southern California, "DARPA INTERNET PROGRAM
              PROTOCOL SPECIFICATION", RFC 791, September 1981.

RFC4997 - Page 43

Appendix A.  Formal Syntax of ROHC-FN

   This section gives a definition of the syntax of ROHC-FN in ABNF
   [RFC4234], using "fnspec" as the start rule.

   ; overall structure
   fnspec     = S *(constdef S) [globctl S] 1*(methdef S)
   constdef   = constname S "=" S expn S ";"
   globctl    = CONTROL S formbody
   methdef    = id S [parmlist S] "{" S 1*(formatdef S) "}"
              / id S [parmlist S] STRQ *STRCHAR STRQ S ";"
   parmlist   = "(" S id S *( "," S id S ) ")"
   formatdef  = formhead S formbody
   formhead   = UNCOMPRESSED [ 1*WS id ]
              / COMPRESSED [ 1*WS id ]
              / CONTROL / INITIAL / DEFAULT
   formbody   = "{" S *((fielddef/enforcer) S) "}"
   fielddef   = fieldgroup S ["=:=" S encspec S] [lenspec S] ";"
   fieldgroup = fieldname *( S ":" S fieldname )
   fieldname  = id
   encspec    = "'" *("0"/"1") "'"
              / id [ S "(" S expn S *( "," S expn S ) ")"]
   lenspec    = "[" S expn S *("," S expn S) "]"
   enforcer   = ENFORCE S "(" S expn S ")" S ";"


   ; expressions
   expn  = *(expnb S "||" S) expnb
   expnb = *(expna S "&&" S) expna
   expna = *(expn7 S ("=="/"!=") S) expn7
   expn7 = *(expn6 S ("<"/"<="/">"/">=") S) expn6
   expn6 = *(expn4 S ("+"/"-") S) expn4
   expn4 = *(expn3 S ("*"/"/"/"%") S) expn3
   expn3 = expn2 [S "^" S expn3]
   expn2 = ["!" S] expn1
   expn1 = expn0 / attref / constname / litval / id
   expn0 = "(" S expn S ")" / VARIABLE
   attref       = fieldnameref "." attname
   fieldnameref = fieldname / THIS
   attname      = ( U / C ) ( LENGTH / VALUE )
   litval       = ["-"] "0b" 1*("0"/"1")
                / ["-"] "0x" 1*(DIGIT/"a"/"b"/"c"/"d"/"e"/"f")
                / ["-"] 1*DIGIT
                / false / true

RFC4997 - Page 44

   ; lexical categories
   constname = UPCASE *(UPCASE / DIGIT / "_")
   id        = ALPHA *(ALPHA / DIGIT / "_")
   ALPHA     = %x41-5A / %x61-7A
   UPCASE    = %x41-5A
   DIGIT     = %x30-39
   COMMENT   = "//" *(SP / HTAB / VCHAR) CRLF
   SP        = %x20
   HTAB      = %x09
   VCHAR     = %x21-7E
   CRLF      = %x0A / %x0D.0A
   NL        = COMMENT / CRLF
   WS        = SP / HTAB / NL
   S         = *WS
   STRCHAR   = SP / HTAB / %x21 / %x23-7E
   STRQ      = %x22


   ; case-sensitive literals
   C            = %d67
   COMPRESSED   = %d67.79.77.80.82.69.83.83.69.68
   CONTROL      = %d67.79.78.84.82.79.76
   DEFAULT      = %d68.69.70.65.85.76.84
   ENFORCE      = %d69.78.70.79.82.67.69
   INITIAL      = %d73.78.73.84.73.65.76
   LENGTH       = %d76.69.78.71.84.72
   THIS         = %d84.72.73.83
   U            = %d85
   UNCOMPRESSED = %d85.78.67.79.77.80.82.69.83.83.69.68
   VALUE        = %d86.65.76.85.69
   VARIABLE     = %d86.65.82.73.65.66.76.69
   false        = %d102.97.108.115.101
   true         = %d116.114.117.101

RFC4997 - Page 45

Appendix B.  Bit-level Worked Example

   This section gives a worked example at the bit level, showing how a
   simple ROHC-FN specification describes the compression of real data
   from an imaginary protocol header.  The example used has been kept
   fairly simple, whilst still aiming to illustrate some of the
   intricacies that arise in use of the notation.  In particular, fields
   have been kept short to make it possible to read the binary
   representation of the headers without too much difficulty.

B.1.  Example Packet Format

   Our imaginary header is just 16 bits long, and consists of the
   following fields:

   1.  version number -- 2 bits

   2.  type -- 2 bits

   3.  flow id -- 4 bits

   4.  sequence number -- 4 bits

   5.  flag bits -- 4 bits

   So for example 0101000100010000 indicates a header with a version
   number of one, a type of one, a flow id of one, a sequence number of
   one, and all flag bits set to zero.

   Here is an ASCII box notation diagram of the imaginary header:

     0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+
   |version| type  |    flow_id    |
   +---+---+---+---+---+---+---+---+
   |  sequence_no  |   flag_bits   |
   +---+---+---+---+---+---+---+---+

RFC4997 - Page 46

B.2.  Initial Encoding

   An initial definition based solely on the above information is as
   follows:

     eg_header
     {
       UNCOMPRESSED {
         version_no   [ 2 ];
         type         [ 2 ];
         flow_id      [ 4 ];
         sequence_no  [ 4 ];
         flag_bits    [ 4 ];
       }

       COMPRESSED initial_definition {
         version_no  =:= irregular(2);
         type        =:= irregular(2);
         flow_id     =:= irregular(4);
         sequence_no =:= irregular(4);
         flag_bits   =:= irregular(4);
       }
     }

   This defines the format nicely, but doesn't actually offer any
   compression.  If we use it to encode the above header, we get:

     Uncompressed header: 0101000100010000
     Compressed header:   0101000100010000

   This is because we have stated that all fields are "irregular" --
   i.e., we haven't specified anything about their behaviour.

   Note that since we have only one compressed format and one
   uncompressed format, it makes no difference whether the encoding
   methods for each field are specified in the compressed or
   uncompressed format.  It would make no difference at all if we wrote
   the following instead:

     eg_header
     {
       UNCOMPRESSED {
         version_no  =:= irregular(2);
         type        =:= irregular(2);
         flow_id     =:= irregular(4);
         sequence_no =:= irregular(4);
         flag_bits   =:= irregular(4);
       }

RFC4997 - Page 47

       COMPRESSED initial_definition {
         version_no   [ 2 ];
         type         [ 2 ];
         flow_id      [ 4 ];
         sequence_no  [ 4 ];
         flag_bits    [ 4 ];
       }
     }

B.3.  Basic Compression

   In order to achieve any compression we need to notate more knowledge
   about the header and its behaviour in a flow.  For example, we may
   know the following facts about the header:

   1.  version number -- indicates which version of the protocol this
       is: always one for this version of the protocol.

   2.  type -- may take any value.

   3.  flow id -- may take any value.

   4.  sequence number -- make take any value.

   5.  flag bits -- contains three flags, a, b, and c, each of which may
       be set or clear, and a reserved flag bit, which is always clear
       (i.e., zero).

   We could notate this knowledge as follows:

     eg_header
     {
       UNCOMPRESSED {
         version_no     [ 2 ];
         type           [ 2 ];
         flow_id        [ 4 ];
         sequence_no    [ 4 ];
         abc_flag_bits  [ 3 ];
         reserved_flag  [ 1 ];
       }

       COMPRESSED basic {
         version_no    =:= uncompressed_value(2, 1)  [ 0 ];
         type          =:= irregular(2)              [ 2 ];
         flow_id       =:= irregular(4)              [ 4 ];
         sequence_no   =:= irregular(4)              [ 4 ];
         abc_flag_bits =:= irregular(3)              [ 3 ];
         reserved_flag =:= uncompressed_value(1, 0)  [ 0 ];

RFC4997 - Page 48

       }
     }

   Using this simple scheme, we have successfully encoded the fact that
   one of the fields has a permanently fixed value of one, and therefore
   contains no useful information.  We have also encoded the fact that
   the final flag bit is always zero, which again contains no useful
   information.  Both of these facts have been notated using the
   "uncompressed_value" encoding method (see Section 4.11.1).

   Using this new encoding on the above header, we get:

     Uncompressed header: 0101000100010000
     Compressed header:   0100010001000

   This reduces the amount of data we need to transmit by roughly 20%.
   However, this encoding fails to take advantage of relationships
   between values of a field in one packet and its value in subsequent
   packets.  For example, every header in the following sequence is
   compressed by the same amount despite the similarities between them:

     Uncompressed header: 0101000100010000
     Compressed header:   0100010001000


     Uncompressed header: 0101000101000000
     Compressed header:   0100010100000


     Uncompressed header: 0110000101110000
     Compressed header:   1000010111000

B.4.  Inter-Packet Compression

   The profile we have defined so far has not compressed the sequence
   number or flow ID fields at all, since they can take any value.
   However the value of each of these fields in one header has a very
   simple relationship to their values in previous headers:

   o  the sequence number is unusual -- it increases by three each time,

   o  the flow_id stays the same -- it always has the same value that it
      did in the previous header in the flow,

   o  the abc_flag_bits stay the same most of the time -- they usually
      have the same value that they did in the previous header in the
      flow.

RFC4997 - Page 49

   An obvious way of notating this is as follows:

     // This obvious encoding will not work (correct encoding below)
     eg_header
     {
       UNCOMPRESSED {
         version_no     [ 2 ];
         type           [ 2 ];
         flow_id        [ 4 ];
         sequence_no    [ 4 ];
         abc_flag_bits  [ 3 ];
         reserved_flag  [ 1 ];
       }

       COMPRESSED obvious {
         version_no    =:= uncompressed_value(2, 1);
         type          =:= irregular(2);
         flow_id       =:= static;
         sequence_no   =:= lsb(0, -3);
         abc_flag_bits =:= irregular(3);
         reserved_flag =:= uncompressed_value(1, 0);
       }
     }

   The dependency on previous packets is notated using the "static" and
   "lsb" encoding methods (see Section 4.11.4 and Section 4.11.5
   respectively).  However there are a few problems with the above
   notation.

   Firstly, and most importantly, the "flow_id" field is notated as
   "static", which means that it doesn't change from packet to packet.
   However, the notation does not indicate how to communicate the value
   of the field initially.  There is no point saying "it's the same
   value as last time" if there has not been a first time where we
   define what that value is, so that it can be referred back to.  The
   above notation provides no way of communicating that.  Similarly with
   the sequence number -- there needs to be a way of communicating its
   initial value.  In fact, except for the explicit notation indicating
   their lengths, even the lengths of these two fields would be left
   undefined.  This problem will be solved below, in Appendix B.5.

   Secondly, the sequence number field is communicated very efficiently
   in zero bits, but it is not at all robust against packet loss.  If a
   packet is lost then there is no way to handle the missing sequence
   number.  When communicating sequence numbers, or any other field
   encoded with "lsb" encoding, a very important consideration for the
   notator is how robust against packet loss the compressed protocol
   should be.  This will vary a lot from protocol stack to protocol

RFC4997 - Page 50

   stack.  For the example protocol we'll assume short, low overhead
   flows and say we need to be robust to the loss of just one packet,
   which we can achieve with two bits of "lsb" encoding (one bit isn't
   enough since the sequence number increases by three each time -- see
   Section 4.11.5).  This will be addressed below in Appendix B.5.

   Finally, although the flag bits are usually the same as in the
   previous header in the flow, the profile doesn't make any use of this
   fact; since they are sometimes not the same as those in the previous
   header, it is not safe to say that they are always the same, so
   "static" encoding can't be used exclusively.  This problem will be
   solved later through the use of multiple formats in Appendix B.6.

B.5.  Specifying Initial Values

   To communicate initial values for fields compressed with a context
   dependent encoding such as "static" or "lsb" we use an "INITIAL"
   field list.  This can help with fields whose start value is fixed and
   known.  For example, if we knew that at the start of the flow that
   "flow_id" would always be 1 and "sequence_no" would always be 0, we
   could notate that like this:

     // This encoding will not work either (correct encoding below)
     eg_header
     {
       UNCOMPRESSED {
         version_no     [ 2 ];
         type           [ 2 ];
         flow_id        [ 4 ];
         sequence_no    [ 4 ];
         abc_flag_bits  [ 3 ];
         reserved_flag  [ 1 ];
       }

       INITIAL {
         // set initial values of fields before flow starts
         flow_id     =:= uncompressed_value(4, 1);
         sequence_no =:= uncompressed_value(4, 0);
       }

       COMPRESSED obvious {
         version_no    =:= uncompressed_value(2, 1);
         type          =:= irregular(2);
         flow_id       =:= static;
         sequence_no   =:= lsb(2, -3);
         abc_flag_bits =:= irregular(3);
         reserved_flag =:= uncompressed_value(1, 0);
       }

RFC4997 - Page 51

     }

   However, this use of "INITIAL" is no good since the initial values of
   both "flow_id" and "sequence_no" vary from flow to flow.  "INITIAL"
   is only applicable where the initial value of a field is fixed, as is
   often the case with control fields.

B.6.  Multiple Packet Formats

   To communicate initial values for the sequence number and flow ID
   fields correctly, and to take advantage of the fact that the flag
   bits are usually the same as in the previous header, we need to
   depart from the single format encoding we are currently using and
   instead use multiple formats.  Here, we have expressed the encodings
   for two of the fields in the uncompressed format, since they will
   always be true for uncompressed headers of that format.  The
   remaining fields, whose encoding method may depend on exactly how the
   header is being compressed, have their encodings specified in the
   compressed formats.

     eg_header
     {
       UNCOMPRESSED {
         version_no    =:= uncompressed_value(2, 1) [ 2 ];
         type                                       [ 2 ];
         flow_id                                    [ 4 ];
         sequence_no                                [ 4 ];
         abc_flag_bits                              [ 3 ];
         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
       }


       COMPRESSED irregular_format {
         discriminator =:= '0'          [ 1 ];
         version_no                     [ 0 ];
         type          =:= irregular(2) [ 2 ];
         flow_id       =:= irregular(4) [ 4 ];
         sequence_no   =:= irregular(4) [ 4 ];
         abc_flag_bits =:= irregular(3) [ 3 ];
         reserved_flag                  [ 0 ];
       }

       COMPRESSED compressed_format {
         discriminator =:= '1'          [ 1 ];
         version_no                     [ 0 ];
         type          =:= irregular(2) [ 2 ];
         flow_id       =:= static       [ 0 ];
         sequence_no   =:= lsb(2, -3)   [ 2 ];

RFC4997 - Page 52

         abc_flag_bits =:= static       [ 0 ];
         reserved_flag                  [ 0 ];
       }
     }

   Note that we have added a discriminator field, so that the
   decompressor can tell which format has been used by the compressor.
   The format with a "static" flow ID and "lsb" encoded sequence number
   is now 5 bits long.  Note that despite having to add the
   discriminator field, this format is still the same size as the
   original incorrect "obvious" format because it takes advantage of the
   fact that the abc flag bits rarely change.

   However, the original "basic" format has also grown by one bit due to
   the addition of the discriminator ("irregular_format").  An important
   consideration when creating multiple formats is whether each format
   occurs frequently enough that the average compressed header length is
   shorter as a result of its usage.  For example, if in fact the flag
   bits always changed between packets, the "compressed_format" encoding
   could never be used; all we would have achieved is lengthening the
   "basic" format by one bit.

   Using the above notation, we now get:

     Uncompressed header: 0101000100010000
     Compressed header:   00100010001000


     Uncompressed header: 0101000101000000
     Compressed header:   10100 ; 00100010100000


     Uncompressed header: 0110000101110000
     Compressed header:   11011 ; 01000010111000

   The first header in the stream is compressed the same way as before,
   except that it now has the extra 1-bit discriminator at the start
   (0).  When a second header arrives with the same flow ID as the first
   and its sequence number three higher, it can be compressed in two
   possible ways: either by using "compressed_format" or, in the same
   way as previously, by using "irregular_format".

   Note that we show all theoretically possible encodings of a header as
   defined by the ROHC-FN specification, separated by semi-colons.
   Either of the above encodings for each header could be produced by a
   valid implementation, although a good implementation would always aim
   to pick the encoding that leads to the best compression.  A good
   implementation would also take robustness into account and therefore

RFC4997 - Page 53

   probably wouldn't assume on the second packet that the decompressor
   had available the context necessary to decompress the shorter
   "compressed_format" form.

   Finally, note that the fields whose encoding methods are specified in
   the uncompressed format have zero length when compressed.  This means
   their position in the compressed format is not significant.  In this
   case, there is no need to notate them when defining the compressed
   formats.  In the next part of the example we will see that they have
   been removed from the compressed formats altogether.

B.7.  Variable Length Discriminators

   Suppose we do some analysis on flows of our example protocol and
   discover that whilst it is usual for successive packets to have the
   same flags, on the occasions when they don't, the packet is almost
   always a "flags set" packet in which all three of the abc flags are
   set.  To encode the flow more efficiently a format needs to be
   written to reflect this.

   This now gives a total of three formats, which means we need three
   discriminators to differentiate between them.  The obvious solution
   here is to increase the number of bits in the discriminator from one
   to two and use discriminators 00, 01, and 10 for example.  However we
   can do slightly better than this.

   Any uniquely identifiable discriminator will suffice, so we can use
   00, 01, and 1.  If the discriminator starts with 1, that's the whole
   thing.  If it starts with 0, the decompressor knows it has to check
   one more bit to determine the kind of format.

   Note that care must be taken when using variable length
   discriminators.  For example, it would be erroneous to use 0, 01, and
   10 as discriminators since after reading an initial 0, the
   decompressor would have no way of knowing if the next bit was a
   second bit of discriminator, or the first bit of the next field in
   the format.  However, 0, 10, and 11 would be correct, as the first
   bit again indicates whether or not there are further discriminator
   bits to follow.

RFC4997 - Page 54

   This gives us the following:

     eg_header
     {
       UNCOMPRESSED {
         version_no    =:= uncompressed_value(2, 1) [ 2 ];
         type                                       [ 2 ];
         flow_id                                    [ 4 ];
         sequence_no                                [ 4 ];
         abc_flag_bits                              [ 3 ];
         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
       }

       COMPRESSED irregular_format {
         discriminator =:= '00'         [ 2 ];
         type          =:= irregular(2) [ 2 ];
         flow_id       =:= irregular(4) [ 4 ];
         sequence_no   =:= irregular(4) [ 4 ];
         abc_flag_bits =:= irregular(3) [ 3 ];
       }

       COMPRESSED flags_set {
         discriminator =:= '01'                     [ 2 ];
         type          =:= irregular(2)             [ 2 ];
         flow_id       =:= static                   [ 0 ];
         sequence_no   =:= lsb(2, -3)               [ 2 ];
         abc_flag_bits =:= uncompressed_value(3, 7) [ 0 ];
       }

       COMPRESSED flags_static {
         discriminator =:= '1'          [ 1 ];
         type          =:= irregular(2) [ 2 ];
         flow_id       =:= static       [ 0 ];
         sequence_no   =:= lsb(2, -3)   [ 2 ];
         abc_flag_bits =:= static       [ 0 ];
       }
     }

   Here is some example output:

     Uncompressed header: 0101000100010000
     Compressed header:   000100010001000


     Uncompressed header: 0101000101000000
     Compressed header:   10100 ; 000100010100000

RFC4997 - Page 55

     Uncompressed header: 0110000101110000
     Compressed header:   11011 ; 001000010111000


     Uncompressed header: 0111000110101110
     Compressed header:   011110 ; 001100011010111

   Here we have a very similar sequence to last time, except that there
   is now an extra message on the end that has the flag bits set.  The
   encoding for the first message in the stream is now one bit larger,
   the encoding for the next two messages is the same as before, since
   that format has not grown; thanks to the use of variable length
   discriminators.  Finally, the packet that comes through with all the
   flag bits set can be encoded in just six bits, only one bit more than
   the most common format.  Without the extra format, this last packet
   would have to be encoded using the longest format and would have
   taken up 14 bits.

B.8.  Default Encoding

   Some of the common encoding methods used so far have been "factored
   out" into the definition of the uncompressed format, meaning that
   they don't need to be defined for every compressed format.  However,
   there is still some redundancy in the notation.  For a number of
   fields, the same encoding method is used several times in different
   formats (though not necessarily in all of them), but the field
   encoding is redefined explicitly each time.  If the encoding for any
   of these fields changed in the future, then every format that uses
   that encoding would have to be modified to reflect this change.

   This problem can be avoided by specifying default encoding methods
   for these fields.  Doing so can also lead to a more concisely notated
   profile:

     eg_header
     {
       UNCOMPRESSED {
         version_no    =:= uncompressed_value(2, 1) [ 2 ];
         type                                       [ 2 ];
         flow_id                                    [ 4 ];
         sequence_no                                [ 4 ];
         abc_flag_bits                              [ 3 ];
         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
       }

       DEFAULT {
         type          =:= irregular(2);
         flow_id       =:= static;

RFC4997 - Page 56

         sequence_no   =:= lsb(2, -3);
       }

       COMPRESSED irregular_format {
         discriminator =:= '00'         [ 2 ];
         type                           [ 2 ]; // Uses default
         flow_id       =:= irregular(4) [ 4 ]; // Overrides default
         sequence_no   =:= irregular(4) [ 4 ]; // Overrides default
         abc_flag_bits =:= irregular(3) [ 3 ];
       }

       COMPRESSED flags_set {
         discriminator =:= '01' [ 2 ];
         type                   [ 2 ]; // Uses default
         sequence_no            [ 2 ]; // Uses default
         abc_flag_bits =:= uncompressed_value(3, 7);
       }

       COMPRESSED flags_static {
         discriminator =:= '1' [ 1 ];
         type                  [ 2 ]; // Uses default
         sequence_no           [ 2 ]; // Uses default
         abc_flag_bits =:= static;
       }
     }

   The above profile behaves in exactly the same way as the one notated
   previously, since it has the same meaning.  Note that the purpose
   behind the different formats becomes clearer with the default
   encoding methods factored out: all that remains are the encodings
   that are specific to each format.  Note also that default encoding
   methods that compress down to zero bits have become completely
   implicit.  For example the compressed formats using the default
   encoding for "flow_id" don't mention it (the default is "static"
   encoding that compresses to zero bits).

B.9.  Control Fields

   One inefficiency in the compression scheme we have produced thus far
   is that it uses two bits to provide the "lsb" encoded sequence number
   with robustness for the loss of just one packet.  In theory, only one
   bit should be needed.  The root of the problem is the unusual
   sequence number that the protocol uses -- it counts up in increments
   of three.  In order to encode it at maximum efficiency we need to
   translate this into a field that increments by one each time.  We do
   this using a control field.

RFC4997 - Page 57

   A control field is extra data that is communicated in the compressed
   format, but which is not a direct encoding of part of the
   uncompressed header.  Control fields can be used to communicate extra
   information in the compressed format, that allows other fields to be
   compressed more efficiently.

   The control field that we introduce scales the sequence number down
   by a factor of three.  Instead of encoding the original sequence
   number in the compressed packet, we encode the scaled sequence
   number, allowing us to have robustness to the loss of one packet by
   using just one bit of "lsb" encoding:

     eg_header
     {
       UNCOMPRESSED {
         version_no    =:= uncompressed_value(2, 1) [ 2 ];
         type                                       [ 2 ];
         flow_id                                    [ 4 ];
         sequence_no                                [ 4 ];
         abc_flag_bits                              [ 3 ];
         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
       }

       CONTROL {
         // need modulo maths to calculate scaling correctly,
         // due to 4 bit wrap around
         scaled_seq_no   [ 4 ];
         ENFORCE(sequence_no.UVALUE
                   == (scaled_seq_no.UVALUE * 3) % 16);
       }

       DEFAULT {
         type          =:= irregular(2);
         flow_id       =:= static;
         scaled_seq_no =:= lsb(1, -1);
       }

       COMPRESSED irregular_format {
         discriminator =:= '00'         [ 2 ];
         type                           [ 2 ];
         flow_id       =:= irregular(4) [ 4 ];
         scaled_seq_no =:= irregular(4) [ 4 ]; // Overrides default
         abc_flag_bits =:= irregular(3) [ 3 ];
       }

       COMPRESSED flags_set {
         discriminator =:= '01' [ 2 ];
         type                   [ 2 ];

RFC4997 - Page 58

         scaled_seq_no          [ 1 ]; // Uses default
         abc_flag_bits =:= uncompressed_value(3, 7);
       }

       COMPRESSED flags_static {
         discriminator =:= '1' [ 1 ];
         type                  [ 2 ];
         scaled_seq_no         [ 1 ]; // Uses default
         abc_flag_bits =:= static;
       }
     }

   Normally, the encoding method(s) used to encode a field specifies the
   length of the field.  In the above notation, since there is no
   encoding method using "sequence_no" directly, its length needs to be
   defined explicitly using an "ENFORCE" statement.  This is done using
   the abbreviated syntax, both for consistency and also for ease of
   readability.  Note that this is unusual: whereas the majority of
   field length indications are redundant (and thus optional), this one
   isn't.  If it was removed from the above notation, the length of the
   "sequence_no" field would be undefined.

   Here is some example output:

     Uncompressed header: 0101000100010000
     Compressed header:   000100011011000


     Uncompressed header: 0101000101000000
     Compressed header:   1010 ; 000100011100000


     Uncompressed header: 0110000101110000
     Compressed header:   1101 ; 001000011101000


     Uncompressed header: 0111000110101110
     Compressed header:   01110 ; 001100011110111

   In this form, we see that this gives us a saving of a further bit in
   most packets.  Assuming the bulk of a flow is made up of
   "flags_static" headers, the mean size of the headers in a compressed
   flow is now just over a quarter of their size in an uncompressed
   flow.

RFC4997 - Page 59

B.10.  Use of "ENFORCE" Statements as Conditionals

   Earlier, we created a new format "flags_set" to handle packets with
   all three of the flag bits set.  As it happens, these three flags are
   always all set for "type 3" packets, and are never all set for other
   packet types (a "type 3" packet is one where the type field is set to
   three).

   This allows extra efficiency in encoding such packets.  We know the
   type is three, so we don't need to encode the type field in the
   compressed header.  The type field was previously encoded as
   "irregular(2)", which is two bits long.  Removing this reduces the
   size of the "flags_set" format from five bits to three, making it the
   smallest format in the encoding method definition.

   In order to notate that the "flags_set" format should only be used
   for "type 3" headers, and the "flags_static" format only when the
   type isn't three, it is necessary to state these conditions inside
   each format.  This can be done with an "ENFORCE" statement:

     eg_header
     {
       UNCOMPRESSED {
         version_no    =:= uncompressed_value(2, 1) [ 2 ];
         type                                       [ 2 ];
         flow_id                                    [ 4 ];
         sequence_no                                [ 4 ];
         abc_flag_bits                              [ 3 ];
         reserved_flag =:= uncompressed_value(1, 0) [ 1 ];
       }

       CONTROL {
         // need modulo maths to calculate scaling correctly,
         // due to 4 bit wrap around
         scaled_seq_no   [ 4 ];
         ENFORCE(sequence_no.UVALUE
                   == (scaled_seq_no.UVALUE * 3) % 16);
       }

       DEFAULT {
         type          =:= irregular(2);
         scaled_seq_no =:= lsb(1, -1);
         flow_id       =:= static;
       }

       COMPRESSED irregular_format {
         discriminator =:= '00'         [ 2 ];
         type                           [ 2 ];

RFC4997 - Page 60

         flow_id       =:= irregular(4) [ 4 ];
         scaled_seq_no =:= irregular(4) [ 4 ];
         abc_flag_bits =:= irregular(3) [ 3 ];
       }

       COMPRESSED flags_set {
         ENFORCE(type.UVALUE == 3); // redundant condition
         discriminator =:= '01'                      [ 2 ];
         type          =:= uncompressed_value(2, 3)  [ 0 ];
         scaled_seq_no                               [ 1 ];
         abc_flag_bits =:= uncompressed_value(3, 7)  [ 0 ];
       }

       COMPRESSED flags_static {
         ENFORCE(type.UVALUE != 3);
         discriminator =:= '1'    [ 1 ];
         type                     [ 2 ];
         scaled_seq_no            [ 1 ];
         abc_flag_bits =:= static [ 0 ];
       }
     }

   The two "ENFORCE" statements in the last two formats act as "guards".
   Guards prevent formats from being used under the wrong circumstances.
   In fact, the "ENFORCE" statement in "flags_set" is redundant.  The
   condition it guards for is already enforced by the new encoding
   method used for the "type" field.  The encoding method
   "uncompressed_value(2,3)" binds the "UVALUE" attribute to three.
   This is exactly what the "ENFORCE" statement does, so it can be
   removed without any change in meaning.  The "uncompressed_value"
   encoding method on the other hand is not redundant.  It specifies
   other bindings on the type field in addition to the one that the
   "ENFORCE" statement specifies.  Therefore it would not be possible to
   remove the encoding method and leave just the "ENFORCE" statement.

   Note that a guard is solely preventative.  A guard can never force a
   format to be chosen by the compressor.  A format can only be
   guaranteed to be chosen in a given situation if there are no other
   formats that can be used instead.  This is demonstrated in the
   example output below.  The compressor can still choose the
   "irregular" format if it wishes:

     Uncompressed header: 0101000100010000
     Compressed header:   000100011011000


     Uncompressed header: 0101000101000000
     Compressed header:   1010 ; 000100011100000

RFC4997 - Page 61

     Uncompressed header: 0110000101110000
     Compressed header:   1101 ; 001000011101000


     Uncompressed header: 0111000110101110
     Compressed header:   010 ; 001100011110111

   This saves just two extra bits (a 7% saving) in the example flow.

Authors' Addresses

   Robert Finking
   Siemens/Roke Manor Research
   Old Salisbury Lane
   Romsey, Hampshire  SO51 0ZN
   UK

   Phone: +44 (0)1794 833189
   EMail: robert.finking@roke.co.uk
   URI:   http://www.roke.co.uk


   Ghyslain Pelletier
   Ericsson
   Box 920
   Lulea  SE-971 28
   Sweden

   Phone: +46 (0) 8 404 29 43
   EMail: ghyslain.pelletier@ericsson.com

RFC4997 - Page 62

Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND
   THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS
   OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF
   THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org.

Acknowledgement

   Funding for the RFC Editor function is currently provided by the
   Internet Society.