Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 3320

Signaling Compression (SigComp)

Pages: 62
Proposed Standard
Updated by:  4896
Part 2 of 3 – Pages 15 to 37
First   Prev   Next

Top   ToC   RFC3320 - Page 15   prevText

4. SigComp Dispatchers

This chapter defines the behavior of the compressor and decompressor dispatcher. The function of these entities is to provide an interface between SigComp and its environment, minimizing the effort needed to integrate SigComp into an existing protocol stack.

4.1. Compressor Dispatcher

The compressor dispatcher receives messages from the application and passes the compressed version of each message to the transport layer. Note that SigComp invokes compressors on a per-compartment basis, so when the application provides a message to be compressed it must also provide a compartment identifier. The compressor dispatcher forwards the application message to the correct compressor based on the compartment identifier (invoking a new compressor if a new compartment identifier is encountered). The compressor returns a SigComp message that can be passed to the transport layer. Additionally, the application should indicate to the compressor dispatcher when it wishes to close a particular compartment, so that the resources taken by the corresponding compressor can be reclaimed.

4.2. Decompressor Dispatcher

The decompressor dispatcher receives messages from the transport layer and passes the decompressed version of each message to the application. To ensure that SigComp can run over an unsecured transport layer, the decompressor dispatcher invokes a new instance of the UDVM for each new SigComp message. Resources for the UDVM are released as soon as the message has been decompressed. The dispatcher MUST NOT make more than one SigComp message available to a given instance of the UDVM. In particular, the dispatcher MUST NOT concatenate two SigComp messages to form a single message.
Top   ToC   RFC3320 - Page 16

4.2.1. Decompressor Dispatcher Strategies

Once the UDVM has been invoked it is initialized using the SigComp message of Chapter 7. The message is then decompressed by the UDVM, returned to the decompressor dispatcher, and passed on to the receiving application. Note that the UDVM has no awareness of whether the underlying transport is message-based or stream-based, and so it always outputs decompressed data as a stream. It is the responsibility of the dispatcher to provide the decompressed message to the application in the expected form (i.e., as a stream or as a distinct, bounded message). The dispatcher knows that the end of a decompressed message has been reached when the UDVM instruction END- MESSAGE is invoked (see Section 9.4.9). For a stream-based transport, two strategies are therefore possible for the decompressor dispatcher: 1) The dispatcher collects a complete SigComp message and then invokes the UDVM. The advantage is that, even in implementations that have multiple incoming compressed streams, only one instance of the UDVM is ever required. 2) The dispatcher collects the SigComp header (see Section 7) and invokes the UDVM; the UDVM stays active while the rest of the message arrives. The advantage is that there is no need to buffer up the rest of the message; the message can be decompressed as it arrives, and any decompressed output can be relayed to the application immediately. In general, which of the strategies is used is an implementation choice. However, the compressor may want to take advantage of strategy 2 by expecting that some of the application message is passed on to the application before the SigComp message is terminated, e.g., by keeping the UDVM active while expecting the application to continuously receive decompressed output. This approach ("continuous mode") invalidates some assumptions of the SigComp security model and can only be used if the transport itself can provide the required protection against denial of service attacks. Also, since only strategy 2 works in this approach, the use of continuous mode requires previous agreement between the two endpoints.

4.2.2. Record Marking

For a stream-based transport, the dispatcher delimits messages by parsing the compressed data stream for instances of 0xFF and taking the following actions:
Top   ToC   RFC3320 - Page 17
   Occurs in data stream:     Action:

   0xFF 00                    one 0xFF byte in the data stream
   0xFF 01                    same, but the next byte is quoted (could
                              be another 0xFF)
      :                                           :
   0xFF 7F                    same, but the next 127 bytes are quoted
   0xFF 80 to 0xFF FE         (reserved for future standardization)
   0xFF FF                    end of SigComp message

   The combinations 0xFF01 to 0xFF7F are useful to limit the worst case
   expansion of the record marking scheme:  the 1 (0xFF01) to 127
   (0xFF7F) bytes following the byte combination are copied literally by
   the decompressor without taking any special action on 0xFF.  (Note
   that 0xFF00 is just a special case of this, where zero following
   bytes are copied literally.)

   In UDVM version 0x01, any occurrence of the combinations 0xFF80 to
   0xFFFE that are not protected by quoting causes decompression
   failure; the decompressor SHOULD close the stream-based transport in
   this case.

4.3. Returning a Compartment Identifier

Upon receiving a decompressed message the application may supply the dispatcher with a compartment identifier. Supplying this identifier grants permission for the following: 1. Items of state accompanying the decompressed message can be saved using the state memory reserved for the specified compartment. 2. The feedback data accompanying the decompressed message can be trusted sufficiently that it can be used when sending SigComp messages that relate to the compressor's equivalent for the compartment. The dispatcher passes the compartment identifier to the UDVM, where it is used as per the END-MESSAGE instruction (see Section 9.4.9). The application uses a suitable authentication mechanism to determine whether the decompressed message belongs to a legitimate compartment or not. If the application fails to authenticate the message with sufficient confidence to allow state to be saved or feedback data to be trusted, it supplies a "no valid compartment" error to the dispatcher and the UDVM is terminated without creating any state or forwarding any feedback data.
Top   ToC   RFC3320 - Page 18

5. SigComp Compressor

An important feature of SigComp is that decompression functionality is provided by a Universal Decompressor Virtual Machine (UDVM). This means that the compressor can choose any algorithm to generate compressed SigComp messages, and then upload bytecode for the corresponding decompression algorithm to the UDVM as part of the SigComp message. To help with the implementation and testing of a SigComp endpoint, further Internet Documents and RFCs may be published to describe particular compression algorithms. The overall requirement placed on the compressor is that of transparency, i.e., the compressor MUST NOT send bytecode which causes the UDVM to incorrectly decompress a given SigComp message. The following more specific requirements are also placed on the compressor (they can be considered particular instances of the transparency requirement): 1. For robustness, it is recommended that the compressor supply some form of integrity check (not necessarily of cryptographic strength) over the application message to ensure that successful decompression has occurred. A UDVM instruction is provided for CRC verification; also, another instruction can be used to compute a SHA-1 cryptographic hash. 2. The compressor MUST ensure that the message can be decompressed using the resources available at the remote endpoint. 3. If the transport is message-based, then the compressor MUST map each application message to exactly one SigComp message. 4. If the transport is stream-based but the application defines its own internal message boundaries, then the compressor SHOULD map each application message to exactly one SigComp message. Message boundaries should be preserved over a stream-based transport so that accidental or malicious damage to one SigComp message does not affect the decompression of subsequent messages. Additionally, if the state handler passes some requested feedback to the compressor, then it SHOULD be returned in the next SigComp message generated by the compressor (unless the state handler passes some newer requested feedback before the older feedback has been sent, in which case the older feedback is deleted).
Top   ToC   RFC3320 - Page 19
   If present, the requested feedback item SHOULD be copied unmodified
   into the returned_feedback_item field provided in the SigComp
   message.  Note that there is no need to transmit any requested
   feedback item more than once.

   The compressor SHOULD also upload the local SigComp parameters to the
   remote endpoint, unless the endpoint has indicated that it does not
   wish to receive these parameters or the compressor determines that
   the parameters have already successfully arrived (see Section 5.1 for
   details of how this can be achieved).  The SigComp parameters are
   uploaded to the UDVM memory at the remote endpoint as described in
   Section 9.4.9.

5.1. Ensuring Successful Decompression

A compressor MUST be certain that all of the data needed to decompress a SigComp message is available at the receiving endpoint. One way to ensure this is to send all of the needed information in every SigComp message (including bytecode to decompress the message). However, the compression ratio for this method will be relatively low. To obtain the best overall compression ratio the compressor needs to request the creation of new state items at the remote endpoint. The information saved in these state items can then be accessed by later SigComp messages, avoiding the need to upload the data on a per- message basis. Before the compressor can access saved state however, it must ensure that the SigComp message carrying the state creation request arrived successfully at the receiving endpoint. For a reliable transport (e.g., TCP or SCTP) this is guaranteed. For an unreliable transport however, the compressor must provide a suitable mechanism itself (see [RFC-3321] for further details). The compressor must also ensure that the state item it wishes to access has not been rejected due to a lack of state memory. This can be accomplished by checking the state_memory_size parameter using the SigComp feedback mechanism (see Section 9.4.9 for further details).

5.2. Compression Failure

The compressor SHOULD make every effort to successfully compress an application message, but in certain cases this might not be possible (particularly if resources are scarce at the receiving endpoint). In this case a "compression failure" is called.
Top   ToC   RFC3320 - Page 20
   If a compression failure occurs then the compressor informs the
   dispatcher and takes no further action.  The dispatcher MUST report
   this failure to the application so that it can try other methods to
   deliver the message.

6. State Handling and Feedback

This chapter defines the behavior of the SigComp state handler. The function of the state handler is to retain information between received SigComp messages; it is the only SigComp entity that is capable of this function, and so it is of particular importance from a security perspective.

6.1. Creating and Accessing State

To provide security against the malicious insertion or modification of SigComp messages, a separate instance of the UDVM is invoked to decompress each message. This ensures that damaged SigComp messages do not prevent the successful decompression of subsequent valid messages. Note, however, that the overall compression ratio is often significantly higher if messages can be compressed relative to the information contained in previous messages. For this reason, it is possible to create state items for access when a later message is being decompressed. Both the creation and access of state are designed to be secure against malicious tampering with the compressed data. The UDVM can only create a state item when a complete message has been successfully decompressed and the application has returned a compartment identifier under which the state can be saved. State access cannot be protected by relying on the application alone, since the authentication mechanism may require information from the decompressed message (which of course is not available until after the state has been accessed). Instead, SigComp protects state access by creating a state identifier that is a hash over the item of state to be retrieved. This state_identifier must be supplied to retrieve an item of state from the state handler. Also note that state is not deleted when it is accessed. So even if a malicious sender manages to access some state information, subsequent messages compressed relative to this state can still be successfully decompressed. Each state item contains a state_identifier that is used to access the state. One state identifier can be supplied in the SigComp message header to initialize the UDVM (see Chapter 7); additional state items can be retrieved using the STATE-ACCESS instruction. The
Top   ToC   RFC3320 - Page 21
   UDVM can also request the creation of a new state item by using the
   STATE-CREATE and END-MESSAGE instructions (see Chapter 9 for further
   details).

6.2. Memory Management

The state handler manages state memory on a per-compartment basis. Each compartment can store state up to a certain state_memory_size (where the application may assign different values for the state_memory_size parameter to different compartments). As well as storing the state items themselves, the state handler maintains a list of the state items created by a particular compartment and ensures that no compartment exceeds its allocated state_memory_size. For the purpose of calculation, each state item is considered to cost (state_length + 64) bytes. Each instance of the UDVM can pass up to four state creation requests to the state handler, as well as up to four state free requests (the latter are requests to free the memory taken by a state item in a certain compartment). When the state handler receives a state creation request from the UDVM it takes the following steps: 1. The state handler MUST reject all state creation requests that are not accompanied by a valid compartment identifier, or if the compartment is allocated 0 bytes of state memory. Note that if a state creation request fails due to lack of state memory then it does not mean that the corresponding SigComp message is damaged; compressors will often make state creation requests in the first SigComp message of a compartment, before they have discovered the state_memory_size using the SigComp feedback mechanism. 2. If the state creation request needs more state memory than the total state_memory_size for the compartment, the state handler deletes all but the first (state_memory_size - 64) bytes from the state_value. It sets the state_length to (state_memory_size - 64), and recalculates the state_identifier as defined in Section 9.4.9. 3. If the state creation request contains a state_identifier that already exists then the state handler checks whether the requested state item is identical to the established state item and counts the state creation request as successful if this is the case. If not then the state creation request is unsuccessful (although the probability that this will occur is vanishingly small).
Top   ToC   RFC3320 - Page 22
   4. If the state creation request exceeds the state memory allocated
      to the compartment, sufficient items of state created by the same
      compartment are freed until enough memory is available to
      accommodate the new state.  When a state item is freed, it is
      removed from the list of states created by the compartment and the
      memory cost of the state item no longer counts towards the total
      cost for the compartment.  Note, however, that identical state
      items may be created by several different compartments, so a state
      item must not be physically deleted unless the state handler
      determines that it is no longer required by any compartment.

   5. The order in which the existing state items are freed is
      determined by the state_retention_priority, which is set when the
      state items are created.  The state_retention_priority of 65535 is
      reserved for locally available states; these states must always be
      freed first.  Apart from this special case, states with the lowest
      state_retention_priority are always freed first.  In the event of
      a tie, then the state item created first in the compartment is
      also the first to be freed.

   The state_retention_priority is always stored on a per-compartment
   basis as part of the list of state items created by each compartment.
   In particular, the same state item might have several priority values
   if it has been created by several different compartments.

   Note that locally available state items (as described in Section
   3.3.3) need not be mapped to any particular compartment.  However, if
   they are created on a per-compartment basis, then they must not
   interfere with the state created at the request of the remote
   endpoint.  The special state_retention_priority of 65535 is reserved
   for locally available state items to ensure that this is the case.

   The UDVM may also explicitly request the state handler to free a
   specific state item in a compartment.  In this case, the state
   handler deletes the state item from the list of state items created
   by the compartment (as before the state item itself must not be
   physically deleted unless the state handler determines that it is not
   longer required by any compartment).

   The application should indicate to the state handler when it wishes
   to close a particular compartment, so that the resources taken by the
   corresponding state can be reclaimed.
Top   ToC   RFC3320 - Page 23

6.3. Feedback Data

The SigComp feedback mechanism allows feedback data to be received by a UDVM and forwarded via the state handler to the correct compressor. Since this feedback data is retained between SigComp messages, it is considered to be part of the overall state and can only be forwarded if accompanied by a valid compartment identifier. If this is the case, then the state handler forwards the feedback data to the compressor responsible for sending messages that pertain to the peer compartment of the specified compartment.

7. SigComp Message Format

This chapter describes the format of the SigComp message and how the message is used to initialize the UDVM memory. Note that the SigComp message is not copied into the UDVM memory as soon as it arrives; instead, the UDVM indicates when it requires compressed data using a specific instruction. It then pauses and waits for the information to be supplied before executing the next instruction. This means that the UDVM can begin to decompress a SigComp message before the entire message has been received. A consequence of the above behavior is that when the UDVM is invoked, the size of the UDVM memory depends on whether the transport used to provide the SigComp message is stream-based or message-based. If the transport is message-based then sufficient memory must be available to buffer the entire SigComp message before it is passed to the UDVM. So if the message is n bytes long, then the UDVM memory size is set to (decompression_memory_size - n), up to a maximum of 65536 bytes. If the transport is stream-based however, then a fixed-size input buffer is required to accommodate the stream, independently of the size of each SigComp message. So, for simplicity, the UDVM memory size is set to (decompression_memory_size / 2). As a separate instance of the UDVM is invoked on a per-message basis, each SigComp message must explicitly indicate its chosen decompression algorithm as well as any additional information that is needed to decompress the message (e.g., one or more previously received messages, a dictionary of common SIP phrases etc.). This information can either be uploaded as part of the SigComp message or retrieved from an item of state.
Top   ToC   RFC3320 - Page 24
   A SigComp message takes one of two forms depending on whether it
   accesses a state item at the receiving endpoint.  The two variants of
   a SigComp message are given in Figure 3.  (The T-bit controls the
   format of the returned feedback item and is defined in Section 7.1.)

     0   1   2   3   4   5   6   7       0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   | 1   1   1   1   1 | T |  len  |   | 1   1   1   1   1 | T |   0   |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   |                               |   |                               |
   :    returned feedback item     :   :    returned feedback item     :
   |                               |   |                               |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   |                               |   |           code_len            |
   :   partial state identifier    :   +---+---+---+---+---+---+---+---+
   |                               |   |   code_len    |  destination  |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   |                               |   |                               |
   :   remaining SigComp message   :   :    uploaded UDVM bytecode     :
   |                               |   |                               |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
                                       |                               |
                                       :   remaining SigComp message   :
                                       |                               |
                                       +---+---+---+---+---+---+---+---+

                   Figure 3: Format of a SigComp message

   Decompression failure occurs if the SigComp message is too short to
   contain the expected fields (see Section 8.7 for further details).

   The fields except for the "remaining SigComp message" are referred to
   as the "SigComp header" (note that this may include the uploaded UDVM
   bytecode).

7.1. Returned feedback item

For both variants of the SigComp message, the T-bit is set to 1 whenever the SigComp message contains a returned feedback item. The format of the returned feedback item is illustrated in Figure 4.
Top   ToC   RFC3320 - Page 25
     0   1   2   3   4   5   6   7       0   1   2   3   4   5   6   7
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
   | 0 |  returned_feedback_field  |   | 1 | returned_feedback_length  |
   +---+---+---+---+---+---+---+---+   +---+---+---+---+---+---+---+---+
                                       |                               |
                                       :    returned_feedback_field    :
                                       |                               |
                                       +---+---+---+---+---+---+---+---+

                Figure 4: Format of returned feedback item

   Note that the returned feedback length specifies the size of the
   returned feedback field (from 0 to 127 bytes).  So the total size of
   the returned feedback item lies between 1 and 128 bytes.

   The returned feedback item is not copied to the UDVM memory; instead,
   it is buffered until the UDVM has successfully decompressed the
   SigComp message.  It is then forwarded to the state handler with the
   rest of the feedback data (see Section 9.4.9 for further details).

7.2. Accessing Stored State

The len field of the SigComp message determines which fields follow the returned feedback item. If the len field is non-zero, then the SigComp message contains a state identifier to access a state item at the receiving endpoint. All state items include a 20-byte state identifier as per Section 3.3.3, but it is possible to transmit as few as 6 bytes from the identifier if the sender believes that this is sufficient to match a unique state item at the receiving endpoint. The len field encodes the number of transmitted bytes as follows: Encoding: Length of partial state identifier 01 6 bytes 10 9 bytes 11 12 bytes The partial state identifier is passed to the state handler, which compares it with the most significant bytes of the state_identifier in every currently stored state item. Decompression failure occurs if no state item is matched or if more than one state item is matched.
Top   ToC   RFC3320 - Page 26
   Decompression failure also occurs if exactly one state item is
   matched but the state item contains a minimum_access_length greater
   than the length of the partial state identifier.  This prevents
   especially sensitive state items from being accessed maliciously by
   brute force guessing of the state_identifier.

   If a state item is successfully accessed then the state_value byte
   string is copied into the UDVM memory beginning at state_address.

   The first 32 bytes of UDVM memory are then initialized to special
   values as illustrated in Figure 5.

                      0             7 8            15
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |       UDVM_memory_size        |  0 - 1
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        cycles_per_bit         |  2 - 3
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        SigComp_version        |  4 - 5
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |    partial_state_ID_length    |  6 - 7
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |         state_length          |  8 - 9
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |                               |
                     :           reserved            :  10 - 31
                     |                               |
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

            Figure 5: Initializing Useful Values in UDVM memory

   The first five 2-byte words are initialized to contain some values
   that might be useful to the UDVM bytecode (Useful Values).  Note that
   these values are for information only and can be overwritten when
   executing the UDVM bytecode without any effect on the endpoint.  The
   MSBs of each 2-byte word are stored preceding the LSBs.

   Addresses 0 to 5 indicate the resources available to the receiving
   endpoint.  The UDVM memory size is expressed in bytes modulo 2^16, so
   in particular, it is set to 0 if the UDVM memory size is 65536 bytes.
   The cycles_per_bit is expressed as a 2-byte integer taking the value
   16, 32, 64 or 128.  The SigComp_version is expressed as a 2-byte
   value as per Section 3.3.2.

   Addresses 6 to 9 are initialized to the length of the partial state
   identifier, followed by the state_length from the retrieved state
   item.  Both are expressed as 2-byte values.
Top   ToC   RFC3320 - Page 27
   Addresses 10 to 31 are reserved and are initialized to 0 for Version
   0x01 of SigComp.  Future versions of SigComp can use these locations
   for additional Useful Values, so a decompressor MUST NOT rely on
   these values being zero.

   Any remaining addresses in the UDVM memory that have not yet been
   initialized MUST be set to 0.

   The UDVM then begins executing instructions at the memory address
   contained in state_instruction (which is part of the retrieved item
   of state).  Note that the remaining SigComp message is held by the
   decompressor dispatcher until requested by the UDVM.

   (Note that the Useful Values are only set at UDVM startup; there is
   no special significance to this memory area afterwards.  This means
   that the UDVM bytecode is free to use these locations for any other
   purpose a memory location might be used for; it just has to be aware
   they are not necessarily initialized to zero.)

7.3. Uploading UDVM bytecode

If the len field is set to 0 then the bytecode needed to decompress the SigComp message is supplied as part of the message itself. The 12-bit code_len field specifies the size of the uploaded UDVM bytecode (from 0 to 4095 bytes inclusive); eight most significant bits are in the first byte, followed by the four least significant bits in the most significant bits in the second byte. The remaining bits in the second byte are interpreted as a 4-bit destination field that specifies the starting memory address to which the bytecode is copied. The destination field is encoded as follows: Encoding: Destination address: 0000 reserved 0001 2 * 64 = 128 0010 3 * 64 = 196 0011 4 * 64 = 256 : : 1111 16 * 64 = 1024 Note that the encoding 0000 is reserved for future SigComp versions, and causes a decompression failure in Version 0x01.
Top   ToC   RFC3320 - Page 28
   The UDVM memory is initialized as per Figure 5, except that addresses
   6 to 9 inclusive are set to 0 because no state item has been
   accessed.  The UDVM then begins executing instructions at the memory
   address specified by the destination field.  As above, the remaining
   SigComp message is held by the decompressor dispatcher until needed
   by the UDVM.

8. Overview of the UDVM

Decompression functionality for SigComp is provided by a Universal Decompressor Virtual Machine (UDVM). The UDVM is a virtual machine much like the Java Virtual Machine but with a key difference: it is designed solely for the purpose of running decompression algorithms. The motivation for creating the UDVM is to provide flexibility when choosing how to compress a given application message. Rather than picking one of a small number of pre-negotiated algorithms, the compressor implementer has the freedom to select an algorithm of their choice. The compressed data is then combined with a set of UDVM instructions that allow the original data to be extracted, and the result is outputted as a SigComp message. Since the UDVM is optimized specifically for running decompression algorithms, the code size of a typical algorithm is small (often sub 100 bytes). Moreover, the UDVM approach does not add significant extra processing or memory requirements compared to running a fixed preprogrammed decompression algorithm. Figure 6 gives a detailed view of the interfaces between the UDVM and its environment.
Top   ToC   RFC3320 - Page 29
   +----------------+                                 +----------------+
   |                |     Request compressed data     |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |                |     Provide compressed data     |                |
   |                |                                 |                |
   |                |    Output decompressed data     |  Decompressor  |
   |                |-------------------------------->|   dispatcher   |
   |                |                                 |                |
   |                |     Indicate end of message     |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |      UDVM      | Provide compartment identifier  |                |
   |                |                                 +----------------+
   |                |
   |                |                                 +----------------+
   |                |    Request state information    |                |
   |                |-------------------------------->|                |
   |                |<--------------------------------|                |
   |                |    Provide state information    |     State      |
   |                |                                 |    handler     |
   |                |   Make state creation request   |                |
   |                |-------------------------------->|                |
   |                |  Forward feedback information   |                |
   +----------------+                                 +----------------+

         Figure 6: Interfaces between the UDVM and its environment

   Note that once the UDVM has been initialized, additional compressed
   data and state information are only provided at the request of a
   specific UDVM instruction.

   This chapter describes the basic features of the UDVM including the
   UDVM registers and the format of UDVM bytecode.

8.1. UDVM Registers

The UDVM registers are 2-byte words in the UDVM memory that have special tasks, for example specifying the location of the stack used by the CALL and RETURN instructions. The UDVM registers are illustrated in Figure 7.
Top   ToC   RFC3320 - Page 30
                      0             7 8            15
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        byte_copy_left         |  64 - 65
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        byte_copy_right        |  66 - 67
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        input_bit_order        |  68 - 69
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
                     |        stack_location         |  70 - 71
                     +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

             Figure 7: Memory addresses of the UDVM registers

   The MSBs of each register are always stored before the LSBs.  So, for
   example, the MSBs of byte_copy_left are stored at Address 64 whilst
   the LSBs are stored at Address 65.

   The use of each UDVM register is defined in the following sections.

   (Note that the UDVM registers start at Address 64, that is 32 bytes
   after the area reserved for Useful Values.  The intention is that the
   gap, i.e., the area between Address 32 and Address 63, will often be
   used as scratch-pad memory that is guaranteed to be zero at UDVM
   startup and is efficiently addressable in operand types reference ($)
   and multitype (%).)

8.2. Requesting Additional Compressed Data

The decompressor dispatcher stores the compressed data from the SigComp message before it is requested by the UDVM via one of the INPUT instructions. When the UDVM bytecode is first executed, the dispatcher contains the remaining SigComp message after the header has been used to initialize the UDVM as per Chapter 7. Note that the INPUT-BITS and INPUT-HUFFMAN instructions retrieve a stream of individual compressed bits from the dispatcher. To provide bitwise compatibility with various well-known compression algorithms, the input_bit_order register can modify the order in which individual bits are passed within a byte. The input_bit_order register contains the following three flags: 0 7 8 15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reserved |F|H|P| 68 - 69 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Top   ToC   RFC3320 - Page 31
   The P-bit controls the order in which bits are passed from the
   dispatcher to the INPUT instructions.  If set to 0, it indicates that
   the bits within an individual byte are passed to the INPUT
   instructions in MSB to LSB order.  If it is set to 1, the bits are
   passed in LSB to MSB order.

   Note that the input_bit_order register cannot change the order in
   which the bytes themselves are passed to the INPUT instructions
   (bytes are always passed in the same order as they occur in the
   SigComp message).

   The following diagram illustrates the order in which bits are passed
   to the INPUT instructions for both cases:

    MSB         LSB MSB         LSB     MSB         LSB MSB         LSB

   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |0 1 2 3 4 5 6 7|8 9 ...        |   |7 6 5 4 3 2 1 0|        ... 9 8|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

        Byte 0           Byte 1              Byte 0          Byte 1

                 P = 0                               P = 1

   Note that after one or more INPUT instructions the dispatcher may
   hold a fraction of a byte (what used to be the LSBs if P = 0, or, the
   MSBs, if P = 1).  If an INPUT instruction is encountered and the P-
   bit has changed since the last INPUT instruction, any fraction of a
   byte still held by the dispatcher MUST be discarded (even if the
   INPUT instruction requests zero bits).  The first bit passed to the
   INPUT instruction is taken from the subsequent byte.

   When an INPUT instruction requests n bits of compressed data, it
   interprets the received bits as an integer between 0 and 2^n - 1.
   The F-bit and the H-bit specify whether the bits in these integers
   are considered to arrive in MSB to LSB order (bit set to 0) or in LSB
   to MSB order (bit set to 1).

   If the F-bit is set to 0, the INPUT-BITS instruction interprets the
   received bits as arriving MSBs first, and if it is set to 1, it
   interprets the bits as arriving LSBs first.  The H-bit performs the
   same function for the INPUT-HUFFMAN instruction.  Note that it is
   possible to set these two bits to different values in order to use
   different bit orders for the two instructions (certain algorithms
   actually require this, e.g., DEFLATE [RFC-1951]).  (Note that there
   are no special considerations for changing the F- or H-bit between
   INPUT instructions, unlike the discard rule for the P-bit described
   above.)
Top   ToC   RFC3320 - Page 32
   Decompression failure occurs if an INPUT-BITS or an INPUT-HUFFMAN
   instruction is encountered and the input_bit_order register does not
   lie between 0 and 7 inclusive.

8.3. UDVM Stack

Certain UDVM instructions make use of a stack of 2-byte words stored at the memory address specified by the 2-byte word stack_location. The stack contains the following words: Name: Starting memory address: stack_fill stack_location stack[0] stack_location + 2 stack[1] stack_location + 4 stack[2] stack_location + 6 : : The notation stack_location is an abbreviation for the contents of the stack_location register, i.e., the 2-byte word at locations 70 and 71. The notation stack_fill is an abbreviation for the 2-byte word at stack_location and stack_location+1. Similarly, the notation stack[n] is an abbreviation for the 2-byte word at stack_location+2*n+2 and stack_location+2*n+3. (As always, the arithmetic is modulo 2^16.) The stack is used by the CALL, RETURN, PUSH and POP instructions. "Pushing" a value on the stack is an abbreviation for copying the value to stack[stack_fill] and then increasing stack_fill by 1. CALL and PUSH push values on the stack. "Popping" a value from the stack is an abbreviation for decreasing stack_fill by 1, and then using the value stored in stack[stack_fill]. Decompression failure occurs if stack_fill is zero at the commencement of a popping operation. POP and RETURN pop values from the stack. For both of these abstract operations, the UDVM first takes note of the current value of stack_location and uses this value for both sub-operations (accessing the stack and manipulating stack_fill), i.e., overwriting stack_location in the course of the operation is inconsequential for the operation.
Top   ToC   RFC3320 - Page 33

8.4. Byte copying

A number of UDVM instructions require a string of bytes to be copied to and from areas of the UDVM memory. This section defines how the byte copying operation should be performed. The string of bytes is copied in ascending order of memory address, respecting the bounds set by byte_copy_left and byte_copy_right. More precisely, if a byte is copied from/to Address m then the next byte is copied from/to Address n where n is calculated as follows: Set k := m + 1 (modulo 2^16) If k = byte_copy_right then set n := byte_copy_left, else set n := k Decompression failure occurs if a byte is copied from/to an address beyond the UDVM memory. Note that the string of bytes is copied one byte at a time. In particular, some of the later bytes to be copied may themselves have been written into the UDVM memory by the byte copying operation currently being performed. Equally, it is possible for a byte copying operation to overwrite the instruction that invoked the byte copy. If this occurs, then the byte copying operation MUST be completed as if the original instruction were still in place in the UDVM memory (this also applies if byte_copy_left or byte_copy_right are overwritten). Byte copying is used by the following UDVM instructions: SHA-1, COPY, COPY-LITERAL, COPY-OFFSET, MEMSET, INPUT-BYTES, STATE- ACCESS, OUTPUT, END-MESSAGE

8.5. Instruction operands and UDVM bytecode

Each of the UDVM instructions in a piece of UDVM bytecode is represented by a single byte, followed by 0 or more bytes containing the operands required by the instruction. During instruction execution, conceptually the UDVM first fetches the first byte of the instruction, determines the number and types of operands required for this instruction, and then decodes all the operands in sequence before starting to act on the instruction. (Note that the UDVM instructions have been designed in such a way that this sequence remains conceptual in those cases where it would result in an unreasonable burden on the implementation.)
Top   ToC   RFC3320 - Page 34
   To reduce the size of typical UDVM bytecode, each operand for a UDVM
   instruction is compressed using variable-length encoding.  The aim is
   to store more common operand values using fewer bytes than rarely
   occurring values.

   Four different types of operand are available: the literal, the
   reference, the multitype and the address.  Chapter 9 gives a complete
   list of UDVM instructions and the operand types that follow each
   instruction.

   The UDVM bytecode for each operand type is illustrated in Figure 8 to
   Figure 10, together with the integer values represented by the
   bytecode.

   Note that the MSBs in the bytecode are illustrated as preceding the
   LSBs.  Also, any string of bits marked with k consecutive "n"s is to
   be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the
   MSBs of n illustrated as preceding the LSBs).

   The decoded integer value of the bytecode can be interpreted in two
   ways.  In some cases it is taken to be the actual value of the
   operand.  In other cases it is taken to be a memory address at which
   the 2-byte operand value can be found (MSBs found at the specified
   address, LSBs found at the following address).  The latter cases are
   denoted by memory[X] where X is the address and memory[X] is the 2-
   byte value starting at Address X.

   The simplest operand type is the literal (#), which encodes a
   constant integer from 0 to 65535 inclusive.  A literal operand may
   require between 1 and 3 bytes depending on its value.

   Bytecode:                       Operand value:      Range:

   0nnnnnnn                        N                   0 - 127
   10nnnnnn nnnnnnnn               N                   0 - 16383
   11000000 nnnnnnnn nnnnnnnn      N                   0 - 65535

               Figure 8: Bytecode for a literal (#) operand

   The second operand type is the reference ($), which is always used to
   access a 2-byte value located elsewhere in the UDVM memory.  The
   bytecode for a reference operand is decoded to be a constant integer
   from 0 to 65535 inclusive, which is interpreted as the memory address
   containing the actual value of the operand.
Top   ToC   RFC3320 - Page 35
   Bytecode:                       Operand value:      Range:

   0nnnnnnn                        memory[2 * N]       0 - 65535
   10nnnnnn nnnnnnnn               memory[2 * N]       0 - 65535
   11000000 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535

              Figure 9: Bytecode for a reference ($) operand

   Note that the range of a reference operand is always 0 - 65535
   independently of how many bits are used to encode the reference,
   because the operand always references a 2-byte value in the memory.

   The third kind of operand is the multitype (%), which can be used to
   encode both actual values and memory addresses.  The multitype
   operand also offers efficient encoding for small integer values (both
   positive and negative) and for powers of 2.

   Bytecode:                       Operand value:      Range:

   00nnnnnn                        N                   0 - 63
   01nnnnnn                        memory[2 * N]       0 - 65535
   1000011n                        2 ^ (N + 6)        64 , 128
   10001nnn                        2 ^ (N + 8)    256 , ... , 32768
   111nnnnn                        N + 65504       65504 - 65535
   1001nnnn nnnnnnnn               N + 61440       61440 - 65535
   101nnnnn nnnnnnnn               N                   0 - 8191
   110nnnnn nnnnnnnn               memory[N]           0 - 65535
   10000000 nnnnnnnn nnnnnnnn      N                   0 - 65535
   10000001 nnnnnnnn nnnnnnnn      memory[N]           0 - 65535

              Figure 10: Bytecode for a multitype (%) operand

   The fourth operand type is the address (@).  This operand is decoded
   as a multitype operand followed by a further step: the memory address
   of the UDVM instruction containing the address operand is added to
   obtain the correct operand value.  So if the operand value from
   Figure 10 is D then the actual operand value of an address is
   calculated as follows:

   operand_value = (memory_address_of_instruction + D) modulo 2^16

   Address operands are always used in instructions that control program
   flow, because they ensure that the UDVM bytecode is position-
   independent code (i.e., it will run independently of where it is
   placed in the UDVM memory).
Top   ToC   RFC3320 - Page 36

8.6. UDVM Cycles

Once the UDVM has been invoked it executes the instructions contained in its memory consecutively unless otherwise indicated (for example when the UDVM encounters a JUMP instruction). If the next instruction to be executed lies outside the available memory then decompression failure occurs (see Section 8.7). To ensure that a SigComp message cannot consume excessive processing resources, SigComp limits the number of "UDVM cycles" allocated to each message. The number of available UDVM cycles is initialized to 1000 plus the number of bits in the SigComp header (as described in Section 7); this sum is then multiplied by cycles_per_bit. Each time an instruction is executed the number of available UDVM cycles is decreased by the amount specified in Chapter 9. Additionally, if the UDVM successfully requests n bits of compressed data using one of the INPUT instructions then the number of available UDVM cycles is increased by n * cycles_per_bit once the instruction has been executed. This means that the maximum number of UDVM cycles available for processing an n-byte SigComp message is given by the formula: maximum_UDVM_cycles = (8 * n + 1000) * cycles_per_bit The reason that this total is not allocated to the UDVM when it is invoked is that the UDVM can begin to decompress a message that has only been partially received. So the total message size may not be known when the UDVM is initialized. Note that the number of UDVM cycles MUST NOT be increased if a request for additional compressed data fails. The UDVM stops executing instructions when it encounters an END- MESSAGE instruction or if decompression failure occurs (see Section 8.7 for further details).

8.7. Decompression Failure

If a compressed message given to the UDVM is corrupted (either accidentally or maliciously), then the UDVM may terminate with a decompression failure.
Top   ToC   RFC3320 - Page 37
   Reasons for decompression failure include the following:

   1. A SigComp message contains an invalid header as per Chapter 7.

   2. A SigComp message is larger than the decompression_memory_size.

   3. An instruction costs more than the number of remaining UDVM
      cycles.

   4. The UDVM attempts to read from or write to a memory address beyond
      its memory size.

   5. An unknown instruction is encountered.

   6. An unknown operand is encountered.

   7. An instruction is encountered that cannot be processed
      successfully by the UDVM (for example a RETURN instruction when no
      CALL instruction has previously been encountered).

   8. A request to access some state information fails.

   9. A manual decompression failure is triggered using the
      DECOMPRESSION-FAILURE instruction.

   If a decompression failure occurs when decompressing a message then
   the UDVM informs the dispatcher and takes no further action.  It is
   the responsibility of the dispatcher to decide how to cope with the
   decompression failure.  In general a dispatcher SHOULD discard the
   compressed message (or the compressed stream if the transport is
   stream-based) and any decompressed data that has been outputted but
   not yet passed to the application.



(page 37 continued on part 3)

Next Section