RFC 3320

Signaling Compression (SigComp)

Pages: 62
Proposed Standard
Updated by: 4896

Part 2 of 3 – Pages 15 to 37

RFC3320 - Page 15 prevText

4.  SigComp Dispatchers

   This chapter defines the behavior of the compressor and decompressor
   dispatcher.  The function of these entities is to provide an
   interface between SigComp and its environment, minimizing the effort
   needed to integrate SigComp into an existing protocol stack.

4.1.  Compressor Dispatcher

   The compressor dispatcher receives messages from the application and
   passes the compressed version of each message to the transport layer.

   Note that SigComp invokes compressors on a per-compartment basis, so
   when the application provides a message to be compressed it must also
   provide a compartment identifier.  The compressor dispatcher forwards
   the application message to the correct compressor based on the
   compartment identifier (invoking a new compressor if a new
   compartment identifier is encountered).  The compressor returns a
   SigComp message that can be passed to the transport layer.

   Additionally, the application should indicate to the compressor
   dispatcher when it wishes to close a particular compartment, so that
   the resources taken by the corresponding compressor can be reclaimed.

4.2.  Decompressor Dispatcher

   The decompressor dispatcher receives messages from the transport
   layer and passes the decompressed version of each message to the
   application.

   To ensure that SigComp can run over an unsecured transport layer, the
   decompressor dispatcher invokes a new instance of the UDVM for each
   new SigComp message.  Resources for the UDVM are released as soon as
   the message has been decompressed.

   The dispatcher MUST NOT make more than one SigComp message available
   to a given instance of the UDVM.  In particular, the dispatcher MUST
   NOT concatenate two SigComp messages to form a single message.

RFC3320 - Page 16

4.2.1.  Decompressor Dispatcher Strategies

   Once the UDVM has been invoked it is initialized using the SigComp
   message of Chapter 7.  The message is then decompressed by the UDVM,
   returned to the decompressor dispatcher, and passed on to the
   receiving application.  Note that the UDVM has no awareness of
   whether the underlying transport is message-based or stream-based,
   and so it always outputs decompressed data as a stream.  It is the
   responsibility of the dispatcher to provide the decompressed message
   to the application in the expected form (i.e., as a stream or as a
   distinct, bounded message).  The dispatcher knows that the end of a
   decompressed message has been reached when the UDVM instruction END-
   MESSAGE is invoked (see Section 9.4.9).

   For a stream-based transport, two strategies are therefore possible
   for the decompressor dispatcher:

   1) The dispatcher collects a complete SigComp message and then
      invokes the UDVM.  The advantage is that, even in implementations
      that have multiple incoming compressed streams, only one instance
      of the UDVM is ever required.

   2) The dispatcher collects the SigComp header (see Section 7) and
      invokes the UDVM; the UDVM stays active while the rest of the
      message arrives.  The advantage is that there is no need to buffer
      up the rest of the message; the message can be decompressed as it
      arrives, and any decompressed output can be relayed to the
      application immediately.

   In general, which of the strategies is used is an implementation
   choice.

   However, the compressor may want to take advantage of strategy 2 by
   expecting that some of the application message is passed on to the
   application before the SigComp message is terminated, e.g., by
   keeping the UDVM active while expecting the application to
   continuously receive decompressed output.  This approach ("continuous
   mode") invalidates some assumptions of the SigComp security model and
   can only be used if the transport itself can provide the required
   protection against denial of service attacks.  Also, since only
   strategy 2 works in this approach, the use of continuous mode
   requires previous agreement between the two endpoints.

4.2.2.  Record Marking

   For a stream-based transport, the dispatcher delimits messages by
   parsing the compressed data stream for instances of 0xFF and taking
   the following actions:

RFC3320 - Page 17

   Occurs in data stream:     Action:

   0xFF 00                    one 0xFF byte in the data stream
   0xFF 01                    same, but the next byte is quoted (could
                              be another 0xFF)
      :                                           :
   0xFF 7F                    same, but the next 127 bytes are quoted
   0xFF 80 to 0xFF FE         (reserved for future standardization)
   0xFF FF                    end of SigComp message

   The combinations 0xFF01 to 0xFF7F are useful to limit the worst case
   expansion of the record marking scheme:  the 1 (0xFF01) to 127
   (0xFF7F) bytes following the byte combination are copied literally by
   the decompressor without taking any special action on 0xFF.  (Note
   that 0xFF00 is just a special case of this, where zero following
   bytes are copied literally.)

   In UDVM version 0x01, any occurrence of the combinations 0xFF80 to
   0xFFFE that are not protected by quoting causes decompression
   failure; the decompressor SHOULD close the stream-based transport in
   this case.

4.3.  Returning a Compartment Identifier

   Upon receiving a decompressed message the application may supply the
   dispatcher with a compartment identifier.  Supplying this identifier
   grants permission for the following:

   1. Items of state accompanying the decompressed message can be saved
      using the state memory reserved for the specified compartment.

   2. The feedback data accompanying the decompressed message can be
      trusted sufficiently that it can be used when sending SigComp
      messages that relate to the compressor's equivalent for the
      compartment.

   The dispatcher passes the compartment identifier to the UDVM, where
   it is used as per the END-MESSAGE instruction (see Section 9.4.9).

   The application uses a suitable authentication mechanism to determine
   whether the decompressed message belongs to a legitimate compartment
   or not.  If the application fails to authenticate the message with
   sufficient confidence to allow state to be saved or feedback data to
   be trusted, it supplies a "no valid compartment" error to the
   dispatcher and the UDVM is terminated without creating any state or
   forwarding any feedback data.

RFC3320 - Page 18

5.  SigComp Compressor

   An important feature of SigComp is that decompression functionality
   is provided by a Universal Decompressor Virtual Machine (UDVM).  This
   means that the compressor can choose any algorithm to generate
   compressed SigComp messages, and then upload bytecode for the
   corresponding decompression algorithm to the UDVM as part of the
   SigComp message.

   To help with the implementation and testing of a SigComp endpoint,
   further Internet Documents and RFCs may be published to describe
   particular compression algorithms.

   The overall requirement placed on the compressor is that of
   transparency, i.e., the compressor MUST NOT send bytecode which
   causes the UDVM to incorrectly decompress a given SigComp message.

   The following more specific requirements are also placed on the
   compressor (they can be considered particular instances of the
   transparency requirement):

   1. For robustness, it is recommended that the compressor supply some
      form of integrity check (not necessarily of cryptographic
      strength) over the application message to ensure that successful
      decompression has occurred.  A UDVM instruction is provided for
      CRC verification; also, another instruction can be used to compute
      a SHA-1 cryptographic hash.

   2. The compressor MUST ensure that the message can be decompressed
      using the resources available at the remote endpoint.

   3. If the transport is message-based, then the compressor MUST map
      each application message to exactly one SigComp message.

   4. If the transport is stream-based but the application defines its
      own internal message boundaries, then the compressor SHOULD map
      each application message to exactly one SigComp message.

   Message boundaries should be preserved over a stream-based transport
   so that accidental or malicious damage to one SigComp message does
   not affect the decompression of subsequent messages.

   Additionally, if the state handler passes some requested feedback to
   the compressor, then it SHOULD be returned in the next SigComp
   message generated by the compressor (unless the state handler passes
   some newer requested feedback before the older feedback has been
   sent, in which case the older feedback is deleted).

RFC3320 - Page 19

   If present, the requested feedback item SHOULD be copied unmodified
   into the returned_feedback_item field provided in the SigComp
   message.  Note that there is no need to transmit any requested
   feedback item more than once.

   The compressor SHOULD also upload the local SigComp parameters to the
   remote endpoint, unless the endpoint has indicated that it does not
   wish to receive these parameters or the compressor determines that
   the parameters have already successfully arrived (see Section 5.1 for
   details of how this can be achieved).  The SigComp parameters are
   uploaded to the UDVM memory at the remote endpoint as described in
   Section 9.4.9.

5.1.  Ensuring Successful Decompression

   A compressor MUST be certain that all of the data needed to
   decompress a SigComp message is available at the receiving endpoint.
   One way to ensure this is to send all of the needed information in
   every SigComp message (including bytecode to decompress the message).
   However, the compression ratio for this method will be relatively
   low.

   To obtain the best overall compression ratio the compressor needs to
   request the creation of new state items at the remote endpoint.  The
   information saved in these state items can then be accessed by later
   SigComp messages, avoiding the need to upload the data on a per-
   message basis.

   Before the compressor can access saved state however, it must ensure
   that the SigComp message carrying the state creation request arrived
   successfully at the receiving endpoint.  For a reliable transport
   (e.g., TCP or SCTP) this is guaranteed.  For an unreliable transport
   however, the compressor must provide a suitable mechanism itself (see
   [RFC-3321] for further details).

   The compressor must also ensure that the state item it wishes to
   access has not been rejected due to a lack of state memory.  This can
   be accomplished by checking the state_memory_size parameter using the
   SigComp feedback mechanism (see Section 9.4.9 for further details).

5.2.  Compression Failure

   The compressor SHOULD make every effort to successfully compress an
   application message, but in certain cases this might not be possible
   (particularly if resources are scarce at the receiving endpoint). In
   this case a "compression failure" is called.

RFC3320 - Page 20

   If a compression failure occurs then the compressor informs the
   dispatcher and takes no further action.  The dispatcher MUST report
   this failure to the application so that it can try other methods to
   deliver the message.

6.  State Handling and Feedback

   This chapter defines the behavior of the SigComp state handler.  The
   function of the state handler is to retain information between
   received SigComp messages; it is the only SigComp entity that is
   capable of this function, and so it is of particular importance from
   a security perspective.

6.1.  Creating and Accessing State

   To provide security against the malicious insertion or modification
   of SigComp messages, a separate instance of the UDVM is invoked to
   decompress each message.  This ensures that damaged SigComp messages
   do not prevent the successful decompression of subsequent valid
   messages.

   Note, however, that the overall compression ratio is often
   significantly higher if messages can be compressed relative to the
   information contained in previous messages.  For this reason, it is
   possible to create state items for access when a later message is
   being decompressed.  Both the creation and access of state are
   designed to be secure against malicious tampering with the compressed
   data.  The UDVM can only create a state item when a complete message
   has been successfully decompressed and the application has returned a
   compartment identifier under which the state can be saved.

   State access cannot be protected by relying on the application alone,
   since the authentication mechanism may require information from the
   decompressed message (which of course is not available until after
   the state has been accessed).  Instead, SigComp protects state access
   by creating a state identifier that is a hash over the item of state
   to be retrieved.  This state_identifier must be supplied to retrieve
   an item of state from the state handler.

   Also note that state is not deleted when it is accessed.  So even if
   a malicious sender manages to access some state information,
   subsequent messages compressed relative to this state can still be
   successfully decompressed.

   Each state item contains a state_identifier that is used to access
   the state.  One state identifier can be supplied in the SigComp
   message header to initialize the UDVM (see Chapter 7); additional
   state items can be retrieved using the STATE-ACCESS instruction.  The

RFC3320 - Page 21

   UDVM can also request the creation of a new state item by using the
   STATE-CREATE and END-MESSAGE instructions (see Chapter 9 for further
   details).

6.2.  Memory Management

   The state handler manages state memory on a per-compartment basis.
   Each compartment can store state up to a certain state_memory_size
   (where the application may assign different values for the
   state_memory_size parameter to different compartments).

   As well as storing the state items themselves, the state handler
   maintains a list of the state items created by a particular
   compartment and ensures that no compartment exceeds its allocated
   state_memory_size.  For the purpose of calculation, each state item
   is considered to cost (state_length + 64) bytes.

   Each instance of the UDVM can pass up to four state creation requests
   to the state handler, as well as up to four state free requests (the
   latter are requests to free the memory taken by a state item in a
   certain compartment).  When the state handler receives a state
   creation request from the UDVM it takes the following steps:

   1. The state handler MUST reject all state creation requests that are
      not accompanied by a valid compartment identifier, or if the
      compartment is allocated 0 bytes of state memory. Note that if a
      state creation request fails due to lack of state memory then it
      does not mean that the corresponding SigComp message is damaged;
      compressors will often make state creation requests in the first
      SigComp message of a compartment, before they have discovered the
      state_memory_size using the SigComp feedback mechanism.

   2. If the state creation request needs more state memory than the
      total state_memory_size for the compartment, the state handler
      deletes all but the first (state_memory_size - 64) bytes from the
      state_value.  It sets the state_length to (state_memory_size -
      64), and recalculates the state_identifier as defined in Section
      9.4.9.

   3. If the state creation request contains a state_identifier that
      already exists then the state handler checks whether the requested
      state item is identical to the established state item and counts
      the state creation request as successful if this is the case.  If
      not then the state creation request is unsuccessful (although the
      probability that this will occur is vanishingly small).

RFC3320 - Page 22

   4. If the state creation request exceeds the state memory allocated
      to the compartment, sufficient items of state created by the same
      compartment are freed until enough memory is available to
      accommodate the new state.  When a state item is freed, it is
      removed from the list of states created by the compartment and the
      memory cost of the state item no longer counts towards the total
      cost for the compartment.  Note, however, that identical state
      items may be created by several different compartments, so a state
      item must not be physically deleted unless the state handler
      determines that it is no longer required by any compartment.

   5. The order in which the existing state items are freed is
      determined by the state_retention_priority, which is set when the
      state items are created.  The state_retention_priority of 65535 is
      reserved for locally available states; these states must always be
      freed first.  Apart from this special case, states with the lowest
      state_retention_priority are always freed first.  In the event of
      a tie, then the state item created first in the compartment is
      also the first to be freed.

   The state_retention_priority is always stored on a per-compartment
   basis as part of the list of state items created by each compartment.
   In particular, the same state item might have several priority values
   if it has been created by several different compartments.

   Note that locally available state items (as described in Section
   3.3.3) need not be mapped to any particular compartment.  However, if
   they are created on a per-compartment basis, then they must not
   interfere with the state created at the request of the remote
   endpoint.  The special state_retention_priority of 65535 is reserved
   for locally available state items to ensure that this is the case.

   The UDVM may also explicitly request the state handler to free a
   specific state item in a compartment.  In this case, the state
   handler deletes the state item from the list of state items created
   by the compartment (as before the state item itself must not be
   physically deleted unless the state handler determines that it is not
   longer required by any compartment).

   The application should indicate to the state handler when it wishes
   to close a particular compartment, so that the resources taken by the
   corresponding state can be reclaimed.

RFC3320 - Page 23

6.3.  Feedback Data

   The SigComp feedback mechanism allows feedback data to be received by
   a UDVM and forwarded via the state handler to the correct compressor.

   Since this feedback data is retained between SigComp messages, it is
   considered to be part of the overall state and can only be forwarded
   if accompanied by a valid compartment identifier.  If this is the
   case, then the state handler forwards the feedback data to the
   compressor responsible for sending messages that pertain to the peer
   compartment of the specified compartment.

7.  SigComp Message Format

   This chapter describes the format of the SigComp message and how the
   message is used to initialize the UDVM memory.

   Note that the SigComp message is not copied into the UDVM memory as
   soon as it arrives; instead, the UDVM indicates when it requires
   compressed data using a specific instruction.  It then pauses and
   waits for the information to be supplied before executing the next
   instruction.  This means that the UDVM can begin to decompress a
   SigComp message before the entire message has been received.

   A consequence of the above behavior is that when the UDVM is invoked,
   the size of the UDVM memory depends on whether the transport used to
   provide the SigComp message is stream-based or message-based.  If the
   transport is message-based then sufficient memory must be available
   to buffer the entire SigComp message before it is passed to the UDVM.
   So if the message is n bytes long, then the UDVM memory size is set
   to (decompression_memory_size - n), up to a maximum of 65536 bytes.

   If the transport is stream-based however, then a fixed-size input
   buffer is required to accommodate the stream, independently of the
   size of each SigComp message. So, for simplicity, the UDVM memory
   size is set to (decompression_memory_size / 2).

   As a separate instance of the UDVM is invoked on a per-message basis,
   each SigComp message must explicitly indicate its chosen
   decompression algorithm as well as any additional information that is
   needed to decompress the message (e.g., one or more previously
   received messages, a dictionary of common SIP phrases etc.).  This
   information can either be uploaded as part of the SigComp message or
   retrieved from an item of state.

RFC3320 - Page 24

   A SigComp message takes one of two forms depending on whether it
   accesses a state item at the receiving endpoint.  The two variants of
   a SigComp message are given in Figure 3.  (The T-bit controls the
   format of the returned feedback item and is defined in Section 7.1.)

     0   1   2   3   4   5   6   7       0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   | 1   1   1   1   1 | T |  len  |   | 1   1   1   1   1 | T |   0   |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   |                               |   |                               |
   :    returned feedback item     :   :    returned feedback item     :
   |                               |   |                               |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   |                               |   |           code_len            |
   :   partial state identifier    :   +---+---+---+---+---+---+---+---+
   |                               |   |   code_len    |  destination  |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   |                               |   |                               |
   :   remaining SigComp message   :   :    uploaded UDVM bytecode     :
   |                               |   |                               |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
                                       |                               |
                                       :   remaining SigComp message   :
                                       |                               |
                                       +---+---+---+---+---+---+---+---+

                   Figure 3: Format of a SigComp message

   Decompression failure occurs if the SigComp message is too short to
   contain the expected fields (see Section 8.7 for further details).

   The fields except for the "remaining SigComp message" are referred to
   as the "SigComp header" (note that this may include the uploaded UDVM
   bytecode).

7.1.  Returned feedback item

   For both variants of the SigComp message, the T-bit is set to 1
   whenever the SigComp message contains a returned feedback item.  The
   format of the returned feedback item is illustrated in Figure 4.

RFC3320 - Page 25

     0   1   2   3   4   5   6   7       0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   | 0 |  returned_feedback_field  |   | 1 | returned_feedback_length  |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
                                       |                               |
                                       :    returned_feedback_field    :
                                       |                               |
                                       +---+---+---+---+---+---+---+---+

                Figure 4: Format of returned feedback item

   Note that the returned feedback length specifies the size of the
   returned feedback field (from 0 to 127 bytes).  So the total size of
   the returned feedback item lies between 1 and 128 bytes.

   The returned feedback item is not copied to the UDVM memory; instead,
   it is buffered until the UDVM has successfully decompressed the
   SigComp message.  It is then forwarded to the state handler with the
   rest of the feedback data (see Section 9.4.9 for further details).

7.2.  Accessing Stored State

   The len field of the SigComp message determines which fields follow
   the returned feedback item.  If the len field is non-zero, then the
   SigComp message contains a state identifier to access a state item at
   the receiving endpoint.  All state items include a 20-byte state
   identifier as per Section 3.3.3, but it is possible to transmit as
   few as 6 bytes from the identifier if the sender believes that this
   is sufficient to match a unique state item at the receiving endpoint.

   The len field encodes the number of transmitted bytes as follows:

   Encoding:   Length of partial state identifier

   01          6 bytes
   10          9 bytes
   11          12 bytes

   The partial state identifier is passed to the state handler, which
   compares it with the most significant bytes of the state_identifier
   in every currently stored state item.  Decompression failure occurs
   if no state item is matched or if more than one state item is
   matched.

RFC3320 - Page 26

   Decompression failure also occurs if exactly one state item is
   matched but the state item contains a minimum_access_length greater
   than the length of the partial state identifier.  This prevents
   especially sensitive state items from being accessed maliciously by
   brute force guessing of the state_identifier.

   If a state item is successfully accessed then the state_value byte
   string is copied into the UDVM memory beginning at state_address.

   The first 32 bytes of UDVM memory are then initialized to special
   values as illustrated in Figure 5.

                      0             7 8            15
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |       UDVM_memory_size        |  0 - 1
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        cycles_per_bit         |  2 - 3
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        SigComp_version        |  4 - 5
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |    partial_state_ID_length    |  6 - 7
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |         state_length          |  8 - 9
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |                               |
                     :           reserved            :  10 - 31
                     |                               |
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 5: Initializing Useful Values in UDVM memory

   The first five 2-byte words are initialized to contain some values
   that might be useful to the UDVM bytecode (Useful Values).  Note that
   these values are for information only and can be overwritten when
   executing the UDVM bytecode without any effect on the endpoint.  The
   MSBs of each 2-byte word are stored preceding the LSBs.

   Addresses 0 to 5 indicate the resources available to the receiving
   endpoint.  The UDVM memory size is expressed in bytes modulo 2^16, so
   in particular, it is set to 0 if the UDVM memory size is 65536 bytes.
   The cycles_per_bit is expressed as a 2-byte integer taking the value
   16, 32, 64 or 128.  The SigComp_version is expressed as a 2-byte
   value as per Section 3.3.2.

   Addresses 6 to 9 are initialized to the length of the partial state
   identifier, followed by the state_length from the retrieved state
   item.  Both are expressed as 2-byte values.

RFC3320 - Page 27

   Addresses 10 to 31 are reserved and are initialized to 0 for Version
   0x01 of SigComp.  Future versions of SigComp can use these locations
   for additional Useful Values, so a decompressor MUST NOT rely on
   these values being zero.

   Any remaining addresses in the UDVM memory that have not yet been
   initialized MUST be set to 0.

   The UDVM then begins executing instructions at the memory address
   contained in state_instruction (which is part of the retrieved item
   of state).  Note that the remaining SigComp message is held by the
   decompressor dispatcher until requested by the UDVM.

   (Note that the Useful Values are only set at UDVM startup; there is
   no special significance to this memory area afterwards.  This means
   that the UDVM bytecode is free to use these locations for any other
   purpose a memory location might be used for; it just has to be aware
   they are not necessarily initialized to zero.)

7.3.  Uploading UDVM bytecode

   If the len field is set to 0 then the bytecode needed to decompress
   the SigComp message is supplied as part of the message itself.  The
   12-bit code_len field specifies the size of the uploaded UDVM
   bytecode (from 0 to 4095 bytes inclusive); eight most significant
   bits are in the first byte, followed by the four least significant
   bits in the most significant bits in the second byte.  The remaining
   bits in the second byte are interpreted as a 4-bit destination field
   that specifies the starting memory address to which the bytecode is
   copied.  The destination field is encoded as follows:

                     Encoding:   Destination address:

                     0000        reserved
                     0001        2  *  64  =  128
                     0010        3  *  64  =  196
                     0011        4  *  64  =  256
                       :                :
                     1111        16 *  64  =  1024

   Note that the encoding 0000 is reserved for future SigComp versions,
   and causes a decompression failure in Version 0x01.

RFC3320 - Page 28

   The UDVM memory is initialized as per Figure 5, except that addresses
   6 to 9 inclusive are set to 0 because no state item has been
   accessed.  The UDVM then begins executing instructions at the memory
   address specified by the destination field.  As above, the remaining
   SigComp message is held by the decompressor dispatcher until needed
   by the UDVM.

8.  Overview of the UDVM

   Decompression functionality for SigComp is provided by a Universal
   Decompressor Virtual Machine (UDVM).  The UDVM is a virtual machine
   much like the Java Virtual Machine but with a key difference:  it is
   designed solely for the purpose of running decompression algorithms.

   The motivation for creating the UDVM is to provide flexibility when
   choosing how to compress a given application message.  Rather than
   picking one of a small number of pre-negotiated algorithms, the
   compressor implementer has the freedom to select an algorithm of
   their choice.  The compressed data is then combined with a set of
   UDVM instructions that allow the original data to be extracted, and
   the result is outputted as a SigComp message.  Since the UDVM is
   optimized specifically for running decompression algorithms, the code
   size of a typical algorithm is small (often sub 100 bytes).
   Moreover, the UDVM approach does not add significant extra processing
   or memory requirements compared to running a fixed preprogrammed
   decompression algorithm.

   Figure 6 gives a detailed view of the interfaces between the UDVM and
   its environment.

RFC3320 - Page 29

   +----------------+                                 +----------------+
   |                |     Request compressed data     |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |                |     Provide compressed data     |                |
   |                |                                 |                |
   |                |    Output decompressed data     |  Decompressor  |
   |                |-------------------------------->|   dispatcher   |
   |                |                                 |                |
   |                |     Indicate end of message     |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |      UDVM      | Provide compartment identifier  |                |
   |                |                                 +----------------+
   |                |
   |                |                                 +----------------+
   |                |    Request state information    |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |                |    Provide state information    |     State      |
   |                |                                 |    handler     |
   |                |   Make state creation request   |                |
   |                |-------------------------------->|                |
   |                |  Forward feedback information   |                |
   +----------------+                                 +----------------+

         Figure 6: Interfaces between the UDVM and its environment

   Note that once the UDVM has been initialized, additional compressed
   data and state information are only provided at the request of a
   specific UDVM instruction.

   This chapter describes the basic features of the UDVM including the
   UDVM registers and the format of UDVM bytecode.

8.1.  UDVM Registers

   The UDVM registers are 2-byte words in the UDVM memory that have
   special tasks, for example specifying the location of the stack used
   by the CALL and RETURN instructions.

   The UDVM registers are illustrated in Figure 7.

RFC3320 - Page 30

                      0             7 8            15
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        byte_copy_left         |  64 - 65
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        byte_copy_right        |  66 - 67
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        input_bit_order        |  68 - 69
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        stack_location         |  70 - 71
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 7: Memory addresses of the UDVM registers

   The MSBs of each register are always stored before the LSBs.  So, for
   example, the MSBs of byte_copy_left are stored at Address 64 whilst
   the LSBs are stored at Address 65.

   The use of each UDVM register is defined in the following sections.

   (Note that the UDVM registers start at Address 64, that is 32 bytes
   after the area reserved for Useful Values.  The intention is that the
   gap, i.e., the area between Address 32 and Address 63, will often be
   used as scratch-pad memory that is guaranteed to be zero at UDVM
   startup and is efficiently addressable in operand types reference ($)
   and multitype (%).)

8.2.  Requesting Additional Compressed Data

   The decompressor dispatcher stores the compressed data from the
   SigComp message before it is requested by the UDVM via one of the
   INPUT instructions.  When the UDVM bytecode is first executed, the
   dispatcher contains the remaining SigComp message after the header
   has been used to initialize the UDVM as per Chapter 7.

   Note that the INPUT-BITS and INPUT-HUFFMAN instructions retrieve a
   stream of individual compressed bits from the dispatcher.  To provide
   bitwise compatibility with various well-known compression algorithms,
   the input_bit_order register can modify the order in which individual
   bits are passed within a byte.

   The input_bit_order register contains the following three flags:

                      0             7 8            15
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |         reserved        |F|H|P|  68 - 69
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

RFC3320 - Page 31

   The P-bit controls the order in which bits are passed from the
   dispatcher to the INPUT instructions.  If set to 0, it indicates that
   the bits within an individual byte are passed to the INPUT
   instructions in MSB to LSB order.  If it is set to 1, the bits are
   passed in LSB to MSB order.

   Note that the input_bit_order register cannot change the order in
   which the bytes themselves are passed to the INPUT instructions
   (bytes are always passed in the same order as they occur in the
   SigComp message).

   The following diagram illustrates the order in which bits are passed
   to the INPUT instructions for both cases:

    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Byte 0           Byte 1              Byte 0          Byte 1

                 P = 0                               P = 1

   Note that after one or more INPUT instructions the dispatcher may
   hold a fraction of a byte (what used to be the LSBs if P = 0, or, the
   MSBs, if P = 1).  If an INPUT instruction is encountered and the P-
   bit has changed since the last INPUT instruction, any fraction of a
   byte still held by the dispatcher MUST be discarded (even if the
   INPUT instruction requests zero bits).  The first bit passed to the
   INPUT instruction is taken from the subsequent byte.

   When an INPUT instruction requests n bits of compressed data, it
   interprets the received bits as an integer between 0 and 2^n - 1.
   The F-bit and the H-bit specify whether the bits in these integers
   are considered to arrive in MSB to LSB order (bit set to 0) or in LSB
   to MSB order (bit set to 1).

   If the F-bit is set to 0, the INPUT-BITS instruction interprets the
   received bits as arriving MSBs first, and if it is set to 1, it
   interprets the bits as arriving LSBs first.  The H-bit performs the
   same function for the INPUT-HUFFMAN instruction.  Note that it is
   possible to set these two bits to different values in order to use
   different bit orders for the two instructions (certain algorithms
   actually require this, e.g., DEFLATE [RFC-1951]).  (Note that there
   are no special considerations for changing the F- or H-bit between
   INPUT instructions, unlike the discard rule for the P-bit described
   above.)

RFC3320 - Page 32

   Decompression failure occurs if an INPUT-BITS or an INPUT-HUFFMAN
   instruction is encountered and the input_bit_order register does not
   lie between 0 and 7 inclusive.

8.3.  UDVM Stack

   Certain UDVM instructions make use of a stack of 2-byte words stored
   at the memory address specified by the 2-byte word stack_location.
   The stack contains the following words:

               Name:                 Starting memory address:

               stack_fill            stack_location
               stack[0]              stack_location + 2
               stack[1]              stack_location + 4
               stack[2]              stack_location + 6
                  :                       :

   The notation stack_location is an abbreviation for the contents of
   the stack_location register, i.e., the 2-byte word at locations 70
   and 71.  The notation stack_fill is an abbreviation for the 2-byte
   word at stack_location and stack_location+1.  Similarly, the notation
   stack[n] is an abbreviation for the 2-byte word at
   stack_location+2*n+2 and stack_location+2*n+3.  (As always, the
   arithmetic is modulo 2^16.)

   The stack is used by the CALL, RETURN, PUSH and POP instructions.

   "Pushing" a value on the stack is an abbreviation for copying the
   value to stack[stack_fill] and then increasing stack_fill by 1.  CALL
   and PUSH push values on the stack.

   "Popping" a value from the stack is an abbreviation for decreasing
   stack_fill by 1, and then using the value stored in
   stack[stack_fill].  Decompression failure occurs if stack_fill is
   zero at the commencement of a popping operation.  POP and RETURN pop
   values from the stack.

   For both of these abstract operations, the UDVM first takes note of
   the current value of stack_location and uses this value for both
   sub-operations (accessing the stack and manipulating stack_fill),
   i.e., overwriting stack_location in the course of the operation is
   inconsequential for the operation.

RFC3320 - Page 33

8.4.  Byte copying

   A number of UDVM instructions require a string of bytes to be copied
   to and from areas of the UDVM memory.  This section defines how the
   byte copying operation should be performed.

   The string of bytes is copied in ascending order of memory address,
   respecting the bounds set by byte_copy_left and byte_copy_right.
   More precisely, if a byte is copied from/to Address m then the next
   byte is copied from/to Address n where n is calculated as follows:

   Set k := m + 1 (modulo 2^16)
   If k = byte_copy_right then set n := byte_copy_left, else set n := k

   Decompression failure occurs if a byte is copied from/to an address
   beyond the UDVM memory.

   Note that the string of bytes is copied one byte at a time.  In
   particular, some of the later bytes to be copied may themselves have
   been written into the UDVM memory by the byte copying operation
   currently being performed.

   Equally, it is possible for a byte copying operation to overwrite the
   instruction that invoked the byte copy.  If this occurs, then the
   byte copying operation MUST be completed as if the original
   instruction were still in place in the UDVM memory (this also applies
   if byte_copy_left or byte_copy_right are overwritten).

   Byte copying is used by the following UDVM instructions:

   SHA-1, COPY, COPY-LITERAL, COPY-OFFSET, MEMSET, INPUT-BYTES, STATE-
   ACCESS, OUTPUT, END-MESSAGE

8.5.  Instruction operands and UDVM bytecode

   Each of the UDVM instructions in a piece of UDVM bytecode is
   represented by a single byte, followed by 0 or more bytes containing
   the operands required by the instruction.

   During instruction execution, conceptually the UDVM first fetches the
   first byte of the instruction, determines the number and types of
   operands required for this instruction, and then decodes all the
   operands in sequence before starting to act on the instruction.
   (Note that the UDVM instructions have been designed in such a way
   that this sequence remains conceptual in those cases where it would
   result in an unreasonable burden on the implementation.)

RFC3320 - Page 34

   To reduce the size of typical UDVM bytecode, each operand for a UDVM
   instruction is compressed using variable-length encoding.  The aim is
   to store more common operand values using fewer bytes than rarely
   occurring values.

   Four different types of operand are available: the literal, the
   reference, the multitype and the address.  Chapter 9 gives a complete
   list of UDVM instructions and the operand types that follow each
   instruction.

   The UDVM bytecode for each operand type is illustrated in Figure 8 to
   Figure 10, together with the integer values represented by the
   bytecode.

   Note that the MSBs in the bytecode are illustrated as preceding the
   LSBs.  Also, any string of bits marked with k consecutive "n"s is to
   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the
   MSBs of n illustrated as preceding the LSBs).

   The decoded integer value of the bytecode can be interpreted in two
   ways.  In some cases it is taken to be the actual value of the
   operand.  In other cases it is taken to be a memory address at which
   the 2-byte operand value can be found (MSBs found at the specified
   address, LSBs found at the following address).  The latter cases are
   denoted by memory[X] where X is the address and memory[X] is the 2-
   byte value starting at Address X.

   The simplest operand type is the literal (#), which encodes a
   constant integer from 0 to 65535 inclusive.  A literal operand may
   require between 1 and 3 bytes depending on its value.

   Bytecode:                       Operand value:      Range:

   0nnnnnnn                        N                   0 - 127
   10nnnnnn nnnnnnnn               N                   0 - 16383
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535

               Figure 8: Bytecode for a literal (#) operand

   The second operand type is the reference ($), which is always used to
   access a 2-byte value located elsewhere in the UDVM memory.  The
   bytecode for a reference operand is decoded to be a constant integer
   from 0 to 65535 inclusive, which is interpreted as the memory address
   containing the actual value of the operand.

RFC3320 - Page 35

   Bytecode:                       Operand value:      Range:

   0nnnnnnn                        memory[2 * N]       0 - 65535
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 65535
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535

              Figure 9: Bytecode for a reference ($) operand

   Note that the range of a reference operand is always 0 - 65535
   independently of how many bits are used to encode the reference,
   because the operand always references a 2-byte value in the memory.

   The third kind of operand is the multitype (%), which can be used to
   encode both actual values and memory addresses.  The multitype
   operand also offers efficient encoding for small integer values (both
   positive and negative) and for powers of 2.

   Bytecode:                       Operand value:      Range:

   00nnnnnn                        N                   0 - 63
   01nnnnnn                        memory[2 * N]       0 - 65535
   1000011n                        2 ^ (N + 6)        64 , 128
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768
   111nnnnn                        N + 65504       65504 - 65535
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535
   101nnnnn nnnnnnnn               N                   0 - 8191
   110nnnnn nnnnnnnn               memory[N]           0 - 65535
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535

              Figure 10: Bytecode for a multitype (%) operand

   The fourth operand type is the address (@).  This operand is decoded
   as a multitype operand followed by a further step: the memory address
   of the UDVM instruction containing the address operand is added to
   obtain the correct operand value.  So if the operand value from
   Figure 10 is D then the actual operand value of an address is
   calculated as follows:

   operand_value = (memory_address_of_instruction + D) modulo 2^16

   Address operands are always used in instructions that control program
   flow, because they ensure that the UDVM bytecode is position-
   independent code (i.e., it will run independently of where it is
   placed in the UDVM memory).

RFC3320 - Page 36

8.6.  UDVM Cycles

   Once the UDVM has been invoked it executes the instructions contained
   in its memory consecutively unless otherwise indicated (for example
   when the UDVM encounters a JUMP instruction).  If the next
   instruction to be executed lies outside the available memory then
   decompression failure occurs (see Section 8.7).

   To ensure that a SigComp message cannot consume excessive processing
   resources, SigComp limits the number of "UDVM cycles" allocated to
   each message.  The number of available UDVM cycles is initialized to
   1000 plus the number of bits in the SigComp header (as described in
   Section 7); this sum is then multiplied by cycles_per_bit.  Each time
   an instruction is executed the number of available UDVM cycles is
   decreased by the amount specified in Chapter 9.  Additionally, if the
   UDVM successfully requests n bits of compressed data using one of the
   INPUT instructions then the number of available UDVM cycles is
   increased by n * cycles_per_bit once the instruction has been
   executed.

   This means that the maximum number of UDVM cycles available for
   processing an n-byte SigComp message is given by the formula:

           maximum_UDVM_cycles = (8 * n + 1000) * cycles_per_bit

   The reason that this total is not allocated to the UDVM when it is
   invoked is that the UDVM can begin to decompress a message that has
   only been partially received.  So the total message size may not be
   known when the UDVM is initialized.

   Note that the number of UDVM cycles MUST NOT be increased if a
   request for additional compressed data fails.

   The UDVM stops executing instructions when it encounters an END-
   MESSAGE instruction or if decompression failure occurs (see Section
   8.7 for further details).

8.7.  Decompression Failure

   If a compressed message given to the UDVM is corrupted (either
   accidentally or maliciously), then the UDVM may terminate with a
   decompression failure.

RFC3320 - Page 37

   Reasons for decompression failure include the following:

   1. A SigComp message contains an invalid header as per Chapter 7.

   2. A SigComp message is larger than the decompression_memory_size.

   3. An instruction costs more than the number of remaining UDVM
      cycles.

   4. The UDVM attempts to read from or write to a memory address beyond
      its memory size.

   5. An unknown instruction is encountered.

   6. An unknown operand is encountered.

   7. An instruction is encountered that cannot be processed
      successfully by the UDVM (for example a RETURN instruction when no
      CALL instruction has previously been encountered).

   8. A request to access some state information fails.

   9. A manual decompression failure is triggered using the
      DECOMPRESSION-FAILURE instruction.

   If a decompression failure occurs when decompressing a message then
   the UDVM informs the dispatcher and takes no further action.  It is
   the responsibility of the dispatcher to decide how to cope with the
   decompression failure.  In general a dispatcher SHOULD discard the
   compressed message (or the compressed stream if the transport is
   stream-based) and any decompressed data that has been outputted but
   not yet passed to the application.

(page 37 continued on part 3)