4. SigComp Dispatchers
This chapter defines the behavior of the compressor and decompressor dispatcher. The function of these entities is to provide an interface between SigComp and its environment, minimizing the effort needed to integrate SigComp into an existing protocol stack.4.1. Compressor Dispatcher
The compressor dispatcher receives messages from the application and passes the compressed version of each message to the transport layer. Note that SigComp invokes compressors on a per-compartment basis, so when the application provides a message to be compressed it must also provide a compartment identifier. The compressor dispatcher forwards the application message to the correct compressor based on the compartment identifier (invoking a new compressor if a new compartment identifier is encountered). The compressor returns a SigComp message that can be passed to the transport layer. Additionally, the application should indicate to the compressor dispatcher when it wishes to close a particular compartment, so that the resources taken by the corresponding compressor can be reclaimed.4.2. Decompressor Dispatcher
The decompressor dispatcher receives messages from the transport layer and passes the decompressed version of each message to the application. To ensure that SigComp can run over an unsecured transport layer, the decompressor dispatcher invokes a new instance of the UDVM for each new SigComp message. Resources for the UDVM are released as soon as the message has been decompressed. The dispatcher MUST NOT make more than one SigComp message available to a given instance of the UDVM. In particular, the dispatcher MUST NOT concatenate two SigComp messages to form a single message.
4.2.1. Decompressor Dispatcher Strategies
Once the UDVM has been invoked it is initialized using the SigComp message of Chapter 7. The message is then decompressed by the UDVM, returned to the decompressor dispatcher, and passed on to the receiving application. Note that the UDVM has no awareness of whether the underlying transport is message-based or stream-based, and so it always outputs decompressed data as a stream. It is the responsibility of the dispatcher to provide the decompressed message to the application in the expected form (i.e., as a stream or as a distinct, bounded message). The dispatcher knows that the end of a decompressed message has been reached when the UDVM instruction END- MESSAGE is invoked (see Section 9.4.9). For a stream-based transport, two strategies are therefore possible for the decompressor dispatcher: 1) The dispatcher collects a complete SigComp message and then invokes the UDVM. The advantage is that, even in implementations that have multiple incoming compressed streams, only one instance of the UDVM is ever required. 2) The dispatcher collects the SigComp header (see Section 7) and invokes the UDVM; the UDVM stays active while the rest of the message arrives. The advantage is that there is no need to buffer up the rest of the message; the message can be decompressed as it arrives, and any decompressed output can be relayed to the application immediately. In general, which of the strategies is used is an implementation choice. However, the compressor may want to take advantage of strategy 2 by expecting that some of the application message is passed on to the application before the SigComp message is terminated, e.g., by keeping the UDVM active while expecting the application to continuously receive decompressed output. This approach ("continuous mode") invalidates some assumptions of the SigComp security model and can only be used if the transport itself can provide the required protection against denial of service attacks. Also, since only strategy 2 works in this approach, the use of continuous mode requires previous agreement between the two endpoints.4.2.2. Record Marking
For a stream-based transport, the dispatcher delimits messages by parsing the compressed data stream for instances of 0xFF and taking the following actions:
Occurs in data stream: Action: 0xFF 00 one 0xFF byte in the data stream 0xFF 01 same, but the next byte is quoted (could be another 0xFF) : : 0xFF 7F same, but the next 127 bytes are quoted 0xFF 80 to 0xFF FE (reserved for future standardization) 0xFF FF end of SigComp message The combinations 0xFF01 to 0xFF7F are useful to limit the worst case expansion of the record marking scheme: the 1 (0xFF01) to 127 (0xFF7F) bytes following the byte combination are copied literally by the decompressor without taking any special action on 0xFF. (Note that 0xFF00 is just a special case of this, where zero following bytes are copied literally.) In UDVM version 0x01, any occurrence of the combinations 0xFF80 to 0xFFFE that are not protected by quoting causes decompression failure; the decompressor SHOULD close the stream-based transport in this case.4.3. Returning a Compartment Identifier
Upon receiving a decompressed message the application may supply the dispatcher with a compartment identifier. Supplying this identifier grants permission for the following: 1. Items of state accompanying the decompressed message can be saved using the state memory reserved for the specified compartment. 2. The feedback data accompanying the decompressed message can be trusted sufficiently that it can be used when sending SigComp messages that relate to the compressor's equivalent for the compartment. The dispatcher passes the compartment identifier to the UDVM, where it is used as per the END-MESSAGE instruction (see Section 9.4.9). The application uses a suitable authentication mechanism to determine whether the decompressed message belongs to a legitimate compartment or not. If the application fails to authenticate the message with sufficient confidence to allow state to be saved or feedback data to be trusted, it supplies a "no valid compartment" error to the dispatcher and the UDVM is terminated without creating any state or forwarding any feedback data.
5. SigComp Compressor
An important feature of SigComp is that decompression functionality is provided by a Universal Decompressor Virtual Machine (UDVM). This means that the compressor can choose any algorithm to generate compressed SigComp messages, and then upload bytecode for the corresponding decompression algorithm to the UDVM as part of the SigComp message. To help with the implementation and testing of a SigComp endpoint, further Internet Documents and RFCs may be published to describe particular compression algorithms. The overall requirement placed on the compressor is that of transparency, i.e., the compressor MUST NOT send bytecode which causes the UDVM to incorrectly decompress a given SigComp message. The following more specific requirements are also placed on the compressor (they can be considered particular instances of the transparency requirement): 1. For robustness, it is recommended that the compressor supply some form of integrity check (not necessarily of cryptographic strength) over the application message to ensure that successful decompression has occurred. A UDVM instruction is provided for CRC verification; also, another instruction can be used to compute a SHA-1 cryptographic hash. 2. The compressor MUST ensure that the message can be decompressed using the resources available at the remote endpoint. 3. If the transport is message-based, then the compressor MUST map each application message to exactly one SigComp message. 4. If the transport is stream-based but the application defines its own internal message boundaries, then the compressor SHOULD map each application message to exactly one SigComp message. Message boundaries should be preserved over a stream-based transport so that accidental or malicious damage to one SigComp message does not affect the decompression of subsequent messages. Additionally, if the state handler passes some requested feedback to the compressor, then it SHOULD be returned in the next SigComp message generated by the compressor (unless the state handler passes some newer requested feedback before the older feedback has been sent, in which case the older feedback is deleted).
If present, the requested feedback item SHOULD be copied unmodified into the returned_feedback_item field provided in the SigComp message. Note that there is no need to transmit any requested feedback item more than once. The compressor SHOULD also upload the local SigComp parameters to the remote endpoint, unless the endpoint has indicated that it does not wish to receive these parameters or the compressor determines that the parameters have already successfully arrived (see Section 5.1 for details of how this can be achieved). The SigComp parameters are uploaded to the UDVM memory at the remote endpoint as described in Section 9.4.9.5.1. Ensuring Successful Decompression
A compressor MUST be certain that all of the data needed to decompress a SigComp message is available at the receiving endpoint. One way to ensure this is to send all of the needed information in every SigComp message (including bytecode to decompress the message). However, the compression ratio for this method will be relatively low. To obtain the best overall compression ratio the compressor needs to request the creation of new state items at the remote endpoint. The information saved in these state items can then be accessed by later SigComp messages, avoiding the need to upload the data on a per- message basis. Before the compressor can access saved state however, it must ensure that the SigComp message carrying the state creation request arrived successfully at the receiving endpoint. For a reliable transport (e.g., TCP or SCTP) this is guaranteed. For an unreliable transport however, the compressor must provide a suitable mechanism itself (see [RFC-3321] for further details). The compressor must also ensure that the state item it wishes to access has not been rejected due to a lack of state memory. This can be accomplished by checking the state_memory_size parameter using the SigComp feedback mechanism (see Section 9.4.9 for further details).5.2. Compression Failure
The compressor SHOULD make every effort to successfully compress an application message, but in certain cases this might not be possible (particularly if resources are scarce at the receiving endpoint). In this case a "compression failure" is called.
If a compression failure occurs then the compressor informs the dispatcher and takes no further action. The dispatcher MUST report this failure to the application so that it can try other methods to deliver the message.6. State Handling and Feedback
This chapter defines the behavior of the SigComp state handler. The function of the state handler is to retain information between received SigComp messages; it is the only SigComp entity that is capable of this function, and so it is of particular importance from a security perspective.6.1. Creating and Accessing State
To provide security against the malicious insertion or modification of SigComp messages, a separate instance of the UDVM is invoked to decompress each message. This ensures that damaged SigComp messages do not prevent the successful decompression of subsequent valid messages. Note, however, that the overall compression ratio is often significantly higher if messages can be compressed relative to the information contained in previous messages. For this reason, it is possible to create state items for access when a later message is being decompressed. Both the creation and access of state are designed to be secure against malicious tampering with the compressed data. The UDVM can only create a state item when a complete message has been successfully decompressed and the application has returned a compartment identifier under which the state can be saved. State access cannot be protected by relying on the application alone, since the authentication mechanism may require information from the decompressed message (which of course is not available until after the state has been accessed). Instead, SigComp protects state access by creating a state identifier that is a hash over the item of state to be retrieved. This state_identifier must be supplied to retrieve an item of state from the state handler. Also note that state is not deleted when it is accessed. So even if a malicious sender manages to access some state information, subsequent messages compressed relative to this state can still be successfully decompressed. Each state item contains a state_identifier that is used to access the state. One state identifier can be supplied in the SigComp message header to initialize the UDVM (see Chapter 7); additional state items can be retrieved using the STATE-ACCESS instruction. The
UDVM can also request the creation of a new state item by using the STATE-CREATE and END-MESSAGE instructions (see Chapter 9 for further details).6.2. Memory Management
The state handler manages state memory on a per-compartment basis. Each compartment can store state up to a certain state_memory_size (where the application may assign different values for the state_memory_size parameter to different compartments). As well as storing the state items themselves, the state handler maintains a list of the state items created by a particular compartment and ensures that no compartment exceeds its allocated state_memory_size. For the purpose of calculation, each state item is considered to cost (state_length + 64) bytes. Each instance of the UDVM can pass up to four state creation requests to the state handler, as well as up to four state free requests (the latter are requests to free the memory taken by a state item in a certain compartment). When the state handler receives a state creation request from the UDVM it takes the following steps: 1. The state handler MUST reject all state creation requests that are not accompanied by a valid compartment identifier, or if the compartment is allocated 0 bytes of state memory. Note that if a state creation request fails due to lack of state memory then it does not mean that the corresponding SigComp message is damaged; compressors will often make state creation requests in the first SigComp message of a compartment, before they have discovered the state_memory_size using the SigComp feedback mechanism. 2. If the state creation request needs more state memory than the total state_memory_size for the compartment, the state handler deletes all but the first (state_memory_size - 64) bytes from the state_value. It sets the state_length to (state_memory_size - 64), and recalculates the state_identifier as defined in Section 9.4.9. 3. If the state creation request contains a state_identifier that already exists then the state handler checks whether the requested state item is identical to the established state item and counts the state creation request as successful if this is the case. If not then the state creation request is unsuccessful (although the probability that this will occur is vanishingly small).
4. If the state creation request exceeds the state memory allocated to the compartment, sufficient items of state created by the same compartment are freed until enough memory is available to accommodate the new state. When a state item is freed, it is removed from the list of states created by the compartment and the memory cost of the state item no longer counts towards the total cost for the compartment. Note, however, that identical state items may be created by several different compartments, so a state item must not be physically deleted unless the state handler determines that it is no longer required by any compartment. 5. The order in which the existing state items are freed is determined by the state_retention_priority, which is set when the state items are created. The state_retention_priority of 65535 is reserved for locally available states; these states must always be freed first. Apart from this special case, states with the lowest state_retention_priority are always freed first. In the event of a tie, then the state item created first in the compartment is also the first to be freed. The state_retention_priority is always stored on a per-compartment basis as part of the list of state items created by each compartment. In particular, the same state item might have several priority values if it has been created by several different compartments. Note that locally available state items (as described in Section 3.3.3) need not be mapped to any particular compartment. However, if they are created on a per-compartment basis, then they must not interfere with the state created at the request of the remote endpoint. The special state_retention_priority of 65535 is reserved for locally available state items to ensure that this is the case. The UDVM may also explicitly request the state handler to free a specific state item in a compartment. In this case, the state handler deletes the state item from the list of state items created by the compartment (as before the state item itself must not be physically deleted unless the state handler determines that it is not longer required by any compartment). The application should indicate to the state handler when it wishes to close a particular compartment, so that the resources taken by the corresponding state can be reclaimed.
6.3. Feedback Data
The SigComp feedback mechanism allows feedback data to be received by a UDVM and forwarded via the state handler to the correct compressor. Since this feedback data is retained between SigComp messages, it is considered to be part of the overall state and can only be forwarded if accompanied by a valid compartment identifier. If this is the case, then the state handler forwards the feedback data to the compressor responsible for sending messages that pertain to the peer compartment of the specified compartment.7. SigComp Message Format
This chapter describes the format of the SigComp message and how the message is used to initialize the UDVM memory. Note that the SigComp message is not copied into the UDVM memory as soon as it arrives; instead, the UDVM indicates when it requires compressed data using a specific instruction. It then pauses and waits for the information to be supplied before executing the next instruction. This means that the UDVM can begin to decompress a SigComp message before the entire message has been received. A consequence of the above behavior is that when the UDVM is invoked, the size of the UDVM memory depends on whether the transport used to provide the SigComp message is stream-based or message-based. If the transport is message-based then sufficient memory must be available to buffer the entire SigComp message before it is passed to the UDVM. So if the message is n bytes long, then the UDVM memory size is set to (decompression_memory_size - n), up to a maximum of 65536 bytes. If the transport is stream-based however, then a fixed-size input buffer is required to accommodate the stream, independently of the size of each SigComp message. So, for simplicity, the UDVM memory size is set to (decompression_memory_size / 2). As a separate instance of the UDVM is invoked on a per-message basis, each SigComp message must explicitly indicate its chosen decompression algorithm as well as any additional information that is needed to decompress the message (e.g., one or more previously received messages, a dictionary of common SIP phrases etc.). This information can either be uploaded as part of the SigComp message or retrieved from an item of state.
A SigComp message takes one of two forms depending on whether it accesses a state item at the receiving endpoint. The two variants of a SigComp message are given in Figure 3. (The T-bit controls the format of the returned feedback item and is defined in Section 7.1.) 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | 1 1 1 1 1 | T | len | | 1 1 1 1 1 | T | 0 | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | | | : returned feedback item : : returned feedback item : | | | | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | | code_len | : partial state identifier : +---+---+---+---+---+---+---+---+ | | | code_len | destination | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | | | : remaining SigComp message : : uploaded UDVM bytecode : | | | | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | : remaining SigComp message : | | +---+---+---+---+---+---+---+---+ Figure 3: Format of a SigComp message Decompression failure occurs if the SigComp message is too short to contain the expected fields (see Section 8.7 for further details). The fields except for the "remaining SigComp message" are referred to as the "SigComp header" (note that this may include the uploaded UDVM bytecode).7.1. Returned feedback item
For both variants of the SigComp message, the T-bit is set to 1 whenever the SigComp message contains a returned feedback item. The format of the returned feedback item is illustrated in Figure 4.
0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | 0 | returned_feedback_field | | 1 | returned_feedback_length | +---+---+---+---+---+---+---+---+ +---+---+---+---+---+---+---+---+ | | : returned_feedback_field : | | +---+---+---+---+---+---+---+---+ Figure 4: Format of returned feedback item Note that the returned feedback length specifies the size of the returned feedback field (from 0 to 127 bytes). So the total size of the returned feedback item lies between 1 and 128 bytes. The returned feedback item is not copied to the UDVM memory; instead, it is buffered until the UDVM has successfully decompressed the SigComp message. It is then forwarded to the state handler with the rest of the feedback data (see Section 9.4.9 for further details).7.2. Accessing Stored State
The len field of the SigComp message determines which fields follow the returned feedback item. If the len field is non-zero, then the SigComp message contains a state identifier to access a state item at the receiving endpoint. All state items include a 20-byte state identifier as per Section 3.3.3, but it is possible to transmit as few as 6 bytes from the identifier if the sender believes that this is sufficient to match a unique state item at the receiving endpoint. The len field encodes the number of transmitted bytes as follows: Encoding: Length of partial state identifier 01 6 bytes 10 9 bytes 11 12 bytes The partial state identifier is passed to the state handler, which compares it with the most significant bytes of the state_identifier in every currently stored state item. Decompression failure occurs if no state item is matched or if more than one state item is matched.
Decompression failure also occurs if exactly one state item is matched but the state item contains a minimum_access_length greater than the length of the partial state identifier. This prevents especially sensitive state items from being accessed maliciously by brute force guessing of the state_identifier. If a state item is successfully accessed then the state_value byte string is copied into the UDVM memory beginning at state_address. The first 32 bytes of UDVM memory are then initialized to special values as illustrated in Figure 5. 0 7 8 15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | UDVM_memory_size | 0 - 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cycles_per_bit | 2 - 3 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SigComp_version | 4 - 5 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | partial_state_ID_length | 6 - 7 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | state_length | 8 - 9 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | : reserved : 10 - 31 | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: Initializing Useful Values in UDVM memory The first five 2-byte words are initialized to contain some values that might be useful to the UDVM bytecode (Useful Values). Note that these values are for information only and can be overwritten when executing the UDVM bytecode without any effect on the endpoint. The MSBs of each 2-byte word are stored preceding the LSBs. Addresses 0 to 5 indicate the resources available to the receiving endpoint. The UDVM memory size is expressed in bytes modulo 2^16, so in particular, it is set to 0 if the UDVM memory size is 65536 bytes. The cycles_per_bit is expressed as a 2-byte integer taking the value 16, 32, 64 or 128. The SigComp_version is expressed as a 2-byte value as per Section 3.3.2. Addresses 6 to 9 are initialized to the length of the partial state identifier, followed by the state_length from the retrieved state item. Both are expressed as 2-byte values.
Addresses 10 to 31 are reserved and are initialized to 0 for Version 0x01 of SigComp. Future versions of SigComp can use these locations for additional Useful Values, so a decompressor MUST NOT rely on these values being zero. Any remaining addresses in the UDVM memory that have not yet been initialized MUST be set to 0. The UDVM then begins executing instructions at the memory address contained in state_instruction (which is part of the retrieved item of state). Note that the remaining SigComp message is held by the decompressor dispatcher until requested by the UDVM. (Note that the Useful Values are only set at UDVM startup; there is no special significance to this memory area afterwards. This means that the UDVM bytecode is free to use these locations for any other purpose a memory location might be used for; it just has to be aware they are not necessarily initialized to zero.)7.3. Uploading UDVM bytecode
If the len field is set to 0 then the bytecode needed to decompress the SigComp message is supplied as part of the message itself. The 12-bit code_len field specifies the size of the uploaded UDVM bytecode (from 0 to 4095 bytes inclusive); eight most significant bits are in the first byte, followed by the four least significant bits in the most significant bits in the second byte. The remaining bits in the second byte are interpreted as a 4-bit destination field that specifies the starting memory address to which the bytecode is copied. The destination field is encoded as follows: Encoding: Destination address: 0000 reserved 0001 2 * 64 = 128 0010 3 * 64 = 196 0011 4 * 64 = 256 : : 1111 16 * 64 = 1024 Note that the encoding 0000 is reserved for future SigComp versions, and causes a decompression failure in Version 0x01.
The UDVM memory is initialized as per Figure 5, except that addresses 6 to 9 inclusive are set to 0 because no state item has been accessed. The UDVM then begins executing instructions at the memory address specified by the destination field. As above, the remaining SigComp message is held by the decompressor dispatcher until needed by the UDVM.8. Overview of the UDVM
Decompression functionality for SigComp is provided by a Universal Decompressor Virtual Machine (UDVM). The UDVM is a virtual machine much like the Java Virtual Machine but with a key difference: it is designed solely for the purpose of running decompression algorithms. The motivation for creating the UDVM is to provide flexibility when choosing how to compress a given application message. Rather than picking one of a small number of pre-negotiated algorithms, the compressor implementer has the freedom to select an algorithm of their choice. The compressed data is then combined with a set of UDVM instructions that allow the original data to be extracted, and the result is outputted as a SigComp message. Since the UDVM is optimized specifically for running decompression algorithms, the code size of a typical algorithm is small (often sub 100 bytes). Moreover, the UDVM approach does not add significant extra processing or memory requirements compared to running a fixed preprogrammed decompression algorithm. Figure 6 gives a detailed view of the interfaces between the UDVM and its environment.
+----------------+ +----------------+ | | Request compressed data | | | |-------------------------------->| | | |<--------------------------------| | | | Provide compressed data | | | | | | | | Output decompressed data | Decompressor | | |-------------------------------->| dispatcher | | | | | | | Indicate end of message | | | |-------------------------------->| | | |<--------------------------------| | | UDVM | Provide compartment identifier | | | | +----------------+ | | | | +----------------+ | | Request state information | | | |-------------------------------->| | | |<--------------------------------| | | | Provide state information | State | | | | handler | | | Make state creation request | | | |-------------------------------->| | | | Forward feedback information | | +----------------+ +----------------+ Figure 6: Interfaces between the UDVM and its environment Note that once the UDVM has been initialized, additional compressed data and state information are only provided at the request of a specific UDVM instruction. This chapter describes the basic features of the UDVM including the UDVM registers and the format of UDVM bytecode.8.1. UDVM Registers
The UDVM registers are 2-byte words in the UDVM memory that have special tasks, for example specifying the location of the stack used by the CALL and RETURN instructions. The UDVM registers are illustrated in Figure 7.
0 7 8 15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte_copy_left | 64 - 65 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | byte_copy_right | 66 - 67 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | input_bit_order | 68 - 69 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | stack_location | 70 - 71 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Memory addresses of the UDVM registers The MSBs of each register are always stored before the LSBs. So, for example, the MSBs of byte_copy_left are stored at Address 64 whilst the LSBs are stored at Address 65. The use of each UDVM register is defined in the following sections. (Note that the UDVM registers start at Address 64, that is 32 bytes after the area reserved for Useful Values. The intention is that the gap, i.e., the area between Address 32 and Address 63, will often be used as scratch-pad memory that is guaranteed to be zero at UDVM startup and is efficiently addressable in operand types reference ($) and multitype (%).)8.2. Requesting Additional Compressed Data
The decompressor dispatcher stores the compressed data from the SigComp message before it is requested by the UDVM via one of the INPUT instructions. When the UDVM bytecode is first executed, the dispatcher contains the remaining SigComp message after the header has been used to initialize the UDVM as per Chapter 7. Note that the INPUT-BITS and INPUT-HUFFMAN instructions retrieve a stream of individual compressed bits from the dispatcher. To provide bitwise compatibility with various well-known compression algorithms, the input_bit_order register can modify the order in which individual bits are passed within a byte. The input_bit_order register contains the following three flags: 0 7 8 15 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | reserved |F|H|P| 68 - 69 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The P-bit controls the order in which bits are passed from the dispatcher to the INPUT instructions. If set to 0, it indicates that the bits within an individual byte are passed to the INPUT instructions in MSB to LSB order. If it is set to 1, the bits are passed in LSB to MSB order. Note that the input_bit_order register cannot change the order in which the bytes themselves are passed to the INPUT instructions (bytes are always passed in the same order as they occur in the SigComp message). The following diagram illustrates the order in which bits are passed to the INPUT instructions for both cases: MSB LSB MSB LSB MSB LSB MSB LSB +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 2 3 4 5 6 7|8 9 ... | |7 6 5 4 3 2 1 0| ... 9 8| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Byte 0 Byte 1 Byte 0 Byte 1 P = 0 P = 1 Note that after one or more INPUT instructions the dispatcher may hold a fraction of a byte (what used to be the LSBs if P = 0, or, the MSBs, if P = 1). If an INPUT instruction is encountered and the P- bit has changed since the last INPUT instruction, any fraction of a byte still held by the dispatcher MUST be discarded (even if the INPUT instruction requests zero bits). The first bit passed to the INPUT instruction is taken from the subsequent byte. When an INPUT instruction requests n bits of compressed data, it interprets the received bits as an integer between 0 and 2^n - 1. The F-bit and the H-bit specify whether the bits in these integers are considered to arrive in MSB to LSB order (bit set to 0) or in LSB to MSB order (bit set to 1). If the F-bit is set to 0, the INPUT-BITS instruction interprets the received bits as arriving MSBs first, and if it is set to 1, it interprets the bits as arriving LSBs first. The H-bit performs the same function for the INPUT-HUFFMAN instruction. Note that it is possible to set these two bits to different values in order to use different bit orders for the two instructions (certain algorithms actually require this, e.g., DEFLATE [RFC-1951]). (Note that there are no special considerations for changing the F- or H-bit between INPUT instructions, unlike the discard rule for the P-bit described above.)
Decompression failure occurs if an INPUT-BITS or an INPUT-HUFFMAN instruction is encountered and the input_bit_order register does not lie between 0 and 7 inclusive.8.3. UDVM Stack
Certain UDVM instructions make use of a stack of 2-byte words stored at the memory address specified by the 2-byte word stack_location. The stack contains the following words: Name: Starting memory address: stack_fill stack_location stack[0] stack_location + 2 stack[1] stack_location + 4 stack[2] stack_location + 6 : : The notation stack_location is an abbreviation for the contents of the stack_location register, i.e., the 2-byte word at locations 70 and 71. The notation stack_fill is an abbreviation for the 2-byte word at stack_location and stack_location+1. Similarly, the notation stack[n] is an abbreviation for the 2-byte word at stack_location+2*n+2 and stack_location+2*n+3. (As always, the arithmetic is modulo 2^16.) The stack is used by the CALL, RETURN, PUSH and POP instructions. "Pushing" a value on the stack is an abbreviation for copying the value to stack[stack_fill] and then increasing stack_fill by 1. CALL and PUSH push values on the stack. "Popping" a value from the stack is an abbreviation for decreasing stack_fill by 1, and then using the value stored in stack[stack_fill]. Decompression failure occurs if stack_fill is zero at the commencement of a popping operation. POP and RETURN pop values from the stack. For both of these abstract operations, the UDVM first takes note of the current value of stack_location and uses this value for both sub-operations (accessing the stack and manipulating stack_fill), i.e., overwriting stack_location in the course of the operation is inconsequential for the operation.
8.4. Byte copying
A number of UDVM instructions require a string of bytes to be copied to and from areas of the UDVM memory. This section defines how the byte copying operation should be performed. The string of bytes is copied in ascending order of memory address, respecting the bounds set by byte_copy_left and byte_copy_right. More precisely, if a byte is copied from/to Address m then the next byte is copied from/to Address n where n is calculated as follows: Set k := m + 1 (modulo 2^16) If k = byte_copy_right then set n := byte_copy_left, else set n := k Decompression failure occurs if a byte is copied from/to an address beyond the UDVM memory. Note that the string of bytes is copied one byte at a time. In particular, some of the later bytes to be copied may themselves have been written into the UDVM memory by the byte copying operation currently being performed. Equally, it is possible for a byte copying operation to overwrite the instruction that invoked the byte copy. If this occurs, then the byte copying operation MUST be completed as if the original instruction were still in place in the UDVM memory (this also applies if byte_copy_left or byte_copy_right are overwritten). Byte copying is used by the following UDVM instructions: SHA-1, COPY, COPY-LITERAL, COPY-OFFSET, MEMSET, INPUT-BYTES, STATE- ACCESS, OUTPUT, END-MESSAGE8.5. Instruction operands and UDVM bytecode
Each of the UDVM instructions in a piece of UDVM bytecode is represented by a single byte, followed by 0 or more bytes containing the operands required by the instruction. During instruction execution, conceptually the UDVM first fetches the first byte of the instruction, determines the number and types of operands required for this instruction, and then decodes all the operands in sequence before starting to act on the instruction. (Note that the UDVM instructions have been designed in such a way that this sequence remains conceptual in those cases where it would result in an unreasonable burden on the implementation.)
To reduce the size of typical UDVM bytecode, each operand for a UDVM instruction is compressed using variable-length encoding. The aim is to store more common operand values using fewer bytes than rarely occurring values. Four different types of operand are available: the literal, the reference, the multitype and the address. Chapter 9 gives a complete list of UDVM instructions and the operand types that follow each instruction. The UDVM bytecode for each operand type is illustrated in Figure 8 to Figure 10, together with the integer values represented by the bytecode. Note that the MSBs in the bytecode are illustrated as preceding the LSBs. Also, any string of bits marked with k consecutive "n"s is to be interpreted as an integer N from 0 to 2^k - 1 inclusive (with the MSBs of n illustrated as preceding the LSBs). The decoded integer value of the bytecode can be interpreted in two ways. In some cases it is taken to be the actual value of the operand. In other cases it is taken to be a memory address at which the 2-byte operand value can be found (MSBs found at the specified address, LSBs found at the following address). The latter cases are denoted by memory[X] where X is the address and memory[X] is the 2- byte value starting at Address X. The simplest operand type is the literal (#), which encodes a constant integer from 0 to 65535 inclusive. A literal operand may require between 1 and 3 bytes depending on its value. Bytecode: Operand value: Range: 0nnnnnnn N 0 - 127 10nnnnnn nnnnnnnn N 0 - 16383 11000000 nnnnnnnn nnnnnnnn N 0 - 65535 Figure 8: Bytecode for a literal (#) operand The second operand type is the reference ($), which is always used to access a 2-byte value located elsewhere in the UDVM memory. The bytecode for a reference operand is decoded to be a constant integer from 0 to 65535 inclusive, which is interpreted as the memory address containing the actual value of the operand.
Bytecode: Operand value: Range: 0nnnnnnn memory[2 * N] 0 - 65535 10nnnnnn nnnnnnnn memory[2 * N] 0 - 65535 11000000 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 9: Bytecode for a reference ($) operand Note that the range of a reference operand is always 0 - 65535 independently of how many bits are used to encode the reference, because the operand always references a 2-byte value in the memory. The third kind of operand is the multitype (%), which can be used to encode both actual values and memory addresses. The multitype operand also offers efficient encoding for small integer values (both positive and negative) and for powers of 2. Bytecode: Operand value: Range: 00nnnnnn N 0 - 63 01nnnnnn memory[2 * N] 0 - 65535 1000011n 2 ^ (N + 6) 64 , 128 10001nnn 2 ^ (N + 8) 256 , ... , 32768 111nnnnn N + 65504 65504 - 65535 1001nnnn nnnnnnnn N + 61440 61440 - 65535 101nnnnn nnnnnnnn N 0 - 8191 110nnnnn nnnnnnnn memory[N] 0 - 65535 10000000 nnnnnnnn nnnnnnnn N 0 - 65535 10000001 nnnnnnnn nnnnnnnn memory[N] 0 - 65535 Figure 10: Bytecode for a multitype (%) operand The fourth operand type is the address (@). This operand is decoded as a multitype operand followed by a further step: the memory address of the UDVM instruction containing the address operand is added to obtain the correct operand value. So if the operand value from Figure 10 is D then the actual operand value of an address is calculated as follows: operand_value = (memory_address_of_instruction + D) modulo 2^16 Address operands are always used in instructions that control program flow, because they ensure that the UDVM bytecode is position- independent code (i.e., it will run independently of where it is placed in the UDVM memory).
8.6. UDVM Cycles
Once the UDVM has been invoked it executes the instructions contained in its memory consecutively unless otherwise indicated (for example when the UDVM encounters a JUMP instruction). If the next instruction to be executed lies outside the available memory then decompression failure occurs (see Section 8.7). To ensure that a SigComp message cannot consume excessive processing resources, SigComp limits the number of "UDVM cycles" allocated to each message. The number of available UDVM cycles is initialized to 1000 plus the number of bits in the SigComp header (as described in Section 7); this sum is then multiplied by cycles_per_bit. Each time an instruction is executed the number of available UDVM cycles is decreased by the amount specified in Chapter 9. Additionally, if the UDVM successfully requests n bits of compressed data using one of the INPUT instructions then the number of available UDVM cycles is increased by n * cycles_per_bit once the instruction has been executed. This means that the maximum number of UDVM cycles available for processing an n-byte SigComp message is given by the formula: maximum_UDVM_cycles = (8 * n + 1000) * cycles_per_bit The reason that this total is not allocated to the UDVM when it is invoked is that the UDVM can begin to decompress a message that has only been partially received. So the total message size may not be known when the UDVM is initialized. Note that the number of UDVM cycles MUST NOT be increased if a request for additional compressed data fails. The UDVM stops executing instructions when it encounters an END- MESSAGE instruction or if decompression failure occurs (see Section 8.7 for further details).8.7. Decompression Failure
If a compressed message given to the UDVM is corrupted (either accidentally or maliciously), then the UDVM may terminate with a decompression failure.
Reasons for decompression failure include the following: 1. A SigComp message contains an invalid header as per Chapter 7. 2. A SigComp message is larger than the decompression_memory_size. 3. An instruction costs more than the number of remaining UDVM cycles. 4. The UDVM attempts to read from or write to a memory address beyond its memory size. 5. An unknown instruction is encountered. 6. An unknown operand is encountered. 7. An instruction is encountered that cannot be processed successfully by the UDVM (for example a RETURN instruction when no CALL instruction has previously been encountered). 8. A request to access some state information fails. 9. A manual decompression failure is triggered using the DECOMPRESSION-FAILURE instruction. If a decompression failure occurs when decompressing a message then the UDVM informs the dispatcher and takes no further action. It is the responsibility of the dispatcher to decide how to cope with the decompression failure. In general a dispatcher SHOULD discard the compressed message (or the compressed stream if the transport is stream-based) and any decompressed data that has been outputted but not yet passed to the application.