2.9. Forwarded Message Transactions A Server may invoke another Server to handle a Request. It is fairly common for the invocation of the second Server to be the last action performed by the first Server as part of handling the Request. For example, the original Server may function primarily to select a process to handle the Request. Also, the Server may simply check the authorization on the Request. Describing this situation in the context of RPC, a nested remote procedure call may be the last action in the remote procedure and the return parameters are exactly those of the nested call. (This situation is analogous to tail recursion.) As an optimization to support this case, VMTP provides a Forward operation that allows the server to send the nested Request to the other server and have this other server respond directly to the Client. If the message transaction being forwarded was not multicast, not secure or the two Servers are the same principal and the ForwardCount of the Request is less than the maximum forward count of 15, the Forward operation is implemented by the Server sending a Request onto the next Server with the forwarded Request identified by the same Client and Transaction as the original Request and a ForwardCount one greater than the Request received from the Client. In this case, the new Server responds directly to the Client. A forwarded Request is illustrated in the following figure. +---------+ Request +----------+ | Client +---------------->| Server 1 | +---------+ +----------+ ^ | | | forwarded Request | V | Response +----------+ +----------------------| Server 2 | +----------+ If the message transaction does not meet the above requirements, the Server's VMTP module issues a nested call and simply maps the returned Response to a Response to original Request without further Server-level processing. In this case, the only optimization over a user-level nested call is one fewer VMTP service operation; the VMTP module handles the return to the invoking call directly. The Server may also use this form of forwarding when the Request is part of a stream of message transactions. Otherwise, it must wait until the forwarded message transaction completes before proceeding with the subsequent message transactions in the stream.
Implementation of the user-level Forward operation is optional, depending on whether the server modules require this facility. Handling an incoming forwarded Request is a minor modification of handling a normal incoming Request. In particular, it is only necessary to examine the ForwardCount field when the Transaction of the Request matches that of the last message transaction received from the Client. Thus, the additional complexity in the VMTP module for the required forwarding support is minimal; the complexity is concentrated in providing a highly optimized user-level Forward primitive, and that is optional. 2.10. VMTP Management VMTP management includes operations for creating, deleting, modifying and querying VMTP entities and entity groups. VMTP management is logically implemented by a VMTP management server module that is invoked using a message transaction addressed to the Server, VMTP_MANAGER_GROUP, a well-known group entity identifier, in conjunction with Coresident Entity mechanism introduced in Section 2.7. A particular Request may address the local module, the module managing a particular entity, the set of modules managing those entities contained in a specific group or all management modules, as appropriate. The VMTP management procedures are specified in Appendix III. 2.11. Streamed Message Transactions Streamed message transactions refer to two or more message transactions initiated by a Client before it receives the response to the first message transaction, with each transaction being processed and responded to in order but asynchronous relative to the initiation of the transactions. A Client streams messages transactions, and thereby has multiple message transactions outstanding, by sending them as part of a single run of message transactions. A run of message transactions is a sequence of message transactions with the same Client and Server and consecutive Transaction identifiers, with all but the first and last Requests and Responses flagged with the NSR (Not Start Run) and NER (Not End Run) control bits. (Conversely, the first Request and Response does not have the NSR set and the last Request and Response does not have the NER bit set.) The message transactions in a run use
consecutive transaction identifiers (except if the STI bit <4> is used in one, in which case the transaction identifier for the next message transaction is 256 greater, rather than 1). The Client retains a record for each outstanding transaction until it gets a Response or is timed out in error. The record provides the information required to retransmit the Request. On retransmission timeout, the client retransmits the last Request for which it has not received a Response the same as is done with non-streamed communication. (I.e. there need be only one timeout for all the outstanding message transactions associated with a single client.) The consecutive transaction identifiers within a run of message transactions are used as sequence numbers for error control. The Server handles each message transaction in the sequence specified by its transaction identifier. When it receives a message transaction that is not marked as the beginning of a run, it checks that it previously received a message transaction with the predecessor transaction identifier, either 1 less than the current one or 256 less if the previous one had the STI bit set. If not, the Server sends a NotifyVmtpClient operation to the Client's manager indicating either: (1) the first message transaction was not fully received, or else (2) it has no record of the last one received. If the NRT control flag is set, it does not await nor expect retransmission but proceeds with handling this Request. This flag is used primarily when datagram Requests are used as part of a stream of message transactions. If NRT was not specified, the Client must retransmit from the first message transaction not fully received (either at all or in part) before the Server can proceed with handling this run of Requests or else restart the run of message transactions. The Client expects to receive the Responses in a consecutive sequence, using the Transaction identifier to detect missing Responses. Thus, the Server must return Responses in sequence except possibly for some gaps, as follows. The Server can specify in the PGcount field in a Response, the number of consecutively previous Responses that this Response _______________ <4> The STI bit is used by the Client to effectively allocate 255 transaction identifiers for use by the Server in returning a large Response or stream of Responses.
corresponds to, up to a maximum of 255 previous Responses <5>. Thus, for example, a Response with Transaction identifier 46 and PGcount 3 represents Responses 43, 44, 45 and 46. This facility allows the Server to eliminate sending Responses to Requests that require no Response, effectively batching the Responses into one. It also allows the Server to effectively maintain strictly consecutive sequencing when the Client has skipped 256 Transaction identifiers using the STI bit and the Server does not have that many Responses to return. If the Client receives a Response that is not consecutive, it retransmits the Request(s) for which the Response(s) is/are missing (unless, of course, the corresponding Requests were sent as datagrams). The Client should wait at the end of a run of message transactions for the last one to complete. When a Server receives a Request with the NSR bit clear and a higher transaction identifier than it currently has for the Client, it terminates all processing and discards Responses associated with the previous Requests. Thus, a stream of message transactions is effectively aborted by starting a new run, even if the Server was in the middle of handling the previous run. Using a mixture of datagram and normal Requests as part of a stream of message transactions, particularly with the use of the NRT bit, can lead to complex behavior under packet loss. It is recommended that a run of message transactions be all of one type to avoid problems, i.e. all normal or all datagrams. Finally, when a Server forwards a Request that is part of a run, it must suspend further processing of the subsequent Requests until the forwarded Request has been handled, to preserve order of processing. The simplest handling of this situation is to use a real nested call when forwarding with streamed message transactions. Flow control of streamed message transactions relies on rate control at the Client plus receipt (or non-receipt) of management notify operations indicating the presence of overrunning. A Client must reduce the number of outstanding message transactions at the Server when it receives a NotifyVmtpServer operation with the MSGTRANS_OVERFLOW ResponseCode. The transact parameter indicates the last packet group that was accepted. _______________ <5> PGcount actually corresponds to packet groups which are described in Section 2.13. This (simplified) description is accurate when there is one Request or Response per packet group.
The implementation of multiple outstanding message transactions requires the ability to record, timeout and buffer multiple outstanding message transactions at the Client end as well as the Server end. However, this facility is optional for both the Client and the Server. Client systems with heavy-weight processes and high network access cost are most likely to benefit from this facility. Servers that serve a wide variety of client machines should implement streaming to accommodate these types of clients. 2.12. Fault-Tolerant Applications One approach to fault-tolerant systems is to maintain a log of all messages sent at each node and replay the messages at a node when the node fails, after restarting it from the last checkpoint <6>. As an experimental facility, VMTP provides a Receive Sequence Number field in the NotifyVmtpClient and NotifyVmtpServer operations as well as the Next Receive Sequence (NRS) flag in the Response packet to allow a sender to log a receive sequence number with each message sent, allowing the packets to be replayed at a recovering node in the same sequence as they were originally received, thereby recovering to the same state as before. Basically, each sending node maintains a receive sequence number for each receiving node. On sending a Request to a node, it presume that the receive sequence number is one greater than the one it has recorded for that node. If not, the receiving node sends a notify operation indicating the receive sequence number assigned the Request. The NRS in the Response confirms that the Request message was the next receive sequence number, so the sender can detect if it failed to receive the notify operation in the previous case. With Responses, the packets are ordered by the Transaction identifier except for multicast message transactions, in which there may be multiple Responses with the same identification. In this case, NotifyVmtpServer operations are used to provide receive sequence numbers. This experimental extension of the protocol is focused on support for fault-tolerant real-time distributed systems required in various critical applications. It may be removed or extended, depending on further investigations. _______________ <6> The sender-based logging is being investigated by Willy Zwaenepoel of Rice University.
2.13. Packet Groups A message (whether Request or Response) is sent as one or more packet groups. A packet group is one or more packets, each containing the same transaction identification and message control block. Each packet is formatted as below with the message control block logically embedded in the VMTP header. +------------------------------------++---------------------+ | VMTP Header || | +------------+-----------------------|| segment data | |VMTP Control| Message Control Block || | +------------+-----------------------++---------------------+ The some fields of the VMTP control portion of the packet and data segment portion can differ between packets within the same packet group. The segment data portion of a packet group represents up to 16 kilooctets of the segment specified in the message control block. The portion contained in each packet is indicated by the PacketDelivery field contained in the VMTP header. The PacketDelivery field as a bit mask has a similar interpretation to the MsgDelivery field in that each bit corresponds to a segment data block of 512 octets. The PacketDelivery field limits a packet group to 16 kilooctets and a maximum of 32 VMTP packets (with a minimum of 1 packet). Data can be sent in fewer packets by sending multiple data blocks per packet. We require that the underlying datagram service support delivery of (at minimum) the basic 580 octet VMTP packet <7>. To illustrate the use of the PacketDelivery field, consider for example the Ethernet which has a MTU of 1536 octets. so one would send 2 512-octet segment data blocks per packet. (In fact, if a third block is last in the segment and less than 512 octets and fits in the packet without making it too big, an Ethernet packet could contain three data blocks. Thus, an Ethernet packet group for a segment of size 0x1D00 octets (14.5 blocks) and MsgDelivery 0x000074FF consists of 6 packets indicated as follows <8>. _______________ <7> Note that with a 20 octet IP header, a VMTP packet is 600 octets. We propose the convention that any host implementing VMTP implicitly agrees to accept IP/VMTP packets of at least 600 octets. <8> We use the C notation 0xHHHH to represent a hexadecimal number.
Packet Delivery 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 0 0 0 0 0 0 . . . 0000 0400 0800 0C00 1000 1400 1800 1C00 +----+----+----+----+----+----+----+-+ Segment |....|....|....|....|....|....|....|.| +----+----+----+----+----+----+----+-+ : : : : : : : / / : v v v v v v v /| v +----+----+----+----+ +----+ +---+ Packets | 1 | 2 | 3 | 4 | | 5 | | 6 | +----+----+----+----+ +----+ +---+ Each '.' is 256 octets of data. The PacketDelivery masks for the 6 packets are: 0x00000003, 0x0000000C, 0x00000030, 0x000000C0, 0x00001400 and 0x00006000, indicating the segment blocks contained in each of the packets. (Note that the delivery bits are in little endian order.) A packet group is sent as a single "blast" of packets with no explicit flow control. However, the sender should estimate and transmit at a rate of packet transmission to avoid congesting the network or overwhelming the receiver, as described in Section 2.5.6. Packets in a packet group can be sent in any order with no change in semantics. When the first packet of a packet group is received (assuming the Server does not decide to discard the packet group), the Server saves a copy of the VMTP packet header, indicates it is currently receiving a packet group, initializes a "current delivery mask" (indicating the data in the segment received so far) to 0, accepts this packet (updating the current delivery mask) and sets the timer for the packet group. Subsequent packets in the packet group update the current delivery mask. Reception of a packet group is terminated when either the current delivery mask indicates that all the packets in the packet group have been received or the packet group reception timer expires (set to TC3 or TS1). If the packet group reception timer expires, if the NRT bit is set in the Control flags then the packet group is discarded if not complete unless MDM is set. In this case, the MsgDelivery field in the message control block is set to indicate the segment data blocks actually received and the message control block and segment data received is delivered to application level. If NRT is not set and not all data blocks have been received, a NotifyVmtpClient (if a Request) or NotifyVmtpServer (if a Response) is sent back with a PacketDelivery field indicating the blocks received. The source of the packet group is then expected to retransmit the missing blocks. If not all blocks of a Request are received after RequestAckRetries(Client) retransmissions, the Request is discarded and
a NotifyVmtpClient operation with an error response code is sent to the client's manager unless MDM is set. With a Response, there are ResponseAckRetries(Server) retransmissions and then, if MDM is not set, the requesting entity is returned the message control block with an indication of the amount of segment data received extending contiguously from the start of the segment. E.g. if the sender sent 6 512-octet blocks and only the first two and the last two arrived, the receiver would be told that 1024 octets were received. The ResponseCode field is set to BAD_REPLY_SEGMENT. (Note that VMTP is only able to indicate the specific segment blocks received if MDM is set.) The parameters RequestAckRetries(Client) and ResponseAckRetries(Server) could be set on a per-client and per-server basis in a sophisticated implementation based on knowledge of packet loss. If the APG flag is set, a NotifyVmtpClient or NotifyVmtpServer operation is sent back at the end of the packet group reception, depending on whether it is a Request or a Response. At minimum, a Server should check that each packet in the packet group contains the same Client, Server, Transaction identifier and SegmentSize fields. It is a protocol error for any field other than the Checksum, packet group control flags, Length and PacketDelivery in the VMTP header to differ between any two packets in one packet group. A packet group containing a protocol error of this nature should be discarded. Notify operations should be sent (or invoked) in the manager whenever there is a problem with a unicast packet. i.e. negative acknowledgments are always sent in this case. In the case of problems with multicast packets, the default is to send nothing in response to an error condition unless there is some clear reason why no other node can respond positively. For example, the packet might be a Probe for an entity that is known to have been recently existing on the receiving host but now invalid and could not have migrated. In this case, the receiving host responds to the Probe indicating the entity is nonexistent, knowing that no other host can respond to the Probe. For packets and packet groups that are received and processed without problems, a Notify operation is invoked only if the APG bit is set. 2.14. Runs of Packet Groups A run of packet groups is a sequence of packet groups, all Request packets or all Response packets, with the same Client and consecutive transaction identifiers, all but the first and last packets flagged with the NSR (Not Start Run) and NER (Not End Run) control bits. When each packet group in the run corresponds to a single Request or Response, it
is identical to a run of message transactions. (See Section 2.11) However, a Request message or a Response message may consists of up to 256 packet groups within a run, for a maximum of 4 megaoctets of segment data. A message that is continued in the next packet group in the run is flagged in the current packet group by the CMG flag. Otherwise, the next packet group in the run (if any) is treated as a separate Request or Response. Normally, each Request and Response message is sent as a single packet group and each run consists of a single packet group. In this case neither NSR or NER are set. For multi-packet group messages, the PacketDelivery mask in the i-th packet group of a message corresponds to the portion of the segment offset by i-1 times 16 kilooctets, designating the the first packet group to have i = 1. 2.15. Byte Order For purposes of transmission and reception, the MCB is treated as consisting of 8 32-bit fields and the segment is a sequence of bytes. VMTP transmits the MCB in big-endian order, performing byte-swapping, if necessary, before transmission. A little-endian host must byte-swap the MCB on reception. (The data segment is transmitted as a sequence of bytes with no reordering.) The byte order of the sender of a message is indicated by the LEE bit in the entity identifier for the sender, the Client field if a Request and the Server field if a Response. The sender and receiver of a message are required to agree in some higher level protocol (such as an RPC presentation protocol) on who does further swapping of the MCB and data segment if required by the types of the data actually being transmitted. For example, the segment data may contain a record with 8-bit, 16-bit and 32-bit fields, so additional transformation is required to move the segment from a host of one byte order to another. VMTP to date has used a higher-level presentation protocol in which segment data is sent in the native order of the sending host and byte-swapped as necessary by the receiving host. This approach minimizes the byte-swapping overhead between machines of common byte order (including when the communication is transparently local to one host), avoids a strong bias in the protocol to one byte-order, and allows for the sending entity to be sending to a group of hosts with different byte orders. (Note that the byte-swap overhead for the MCB is minimal.) The presentation-level overhead is minimal because most common operations, such as file access operations, have parameters that fit the MCB and data segment data types exactly.
2.16. Minimal VMTP Implementation A minimal VMTP client needs to be able to send a Request packet group and receive a Response packet group as well as accept and respond to Requests sent to its management module, including Probe and NotifyClient operations. It may also require the ability to invoke Probe and Notify operations to locate a Server and acknowledge responses. (the latter only if it is involved in transactions that are not idempotent or datagram message transactions. However, a simple sensor, for example, can transmit VMTP datagram Requests indicating its current state with even less mechanism.) The minimal client thus requires very little code and is suitable as a basis for (e.g.) a network boot loader. A minimal VMTP server implements idempotent, non-encrypted message transactions, possibly with no segment data support. It should use an entity state record for each Request but need only retain it while processing the Request. Without segment data larger than a packet, there is no need for any timers, buffering (outside of immediate request processing) or queuing. In particular, it needs only as many records as message transactions it handles simultaneously (e.g. 1). The entity state record is required to recognize and respond to Request retransmissions during request processing. The minimal server need only receive Requests and and be able to send Response packets. It need have only a minimal management module supporting Probe operations. (Support for the NotifyVmtpClient operation is only required if it does not respond immediately to a Request.) Thus the VMTP support for say a time server, sensor, or actuator can be extremely simple. Note that the server need never issue a Probe operation if it uses the host address of the Request for the Response and does not require the Client information returned by the Probe operation. The minimal server should also support reception of forwarded Requests. 2.17. Message vs. Procedural Request Handling A request-response protocol can be used to implement two forms of semantics on reception. With procedural handling of a Request, a Request is handled by a process associated with the Server that effectively takes on the identity of the calling process, treating the Request message as invoking a procedure, and relinquishing its association to the calling process on return. VMTP supports multiple nested calls spanning multiple machines. In this case, the distributed call stack that results is associated with a single process from the standpoint of authentication and resource management, using the ProcessId field supported by VMTP. The entity identifiers effectively
link these call frames together. That is, the Client field in a Request is effectively the return link to the previous call frame. With message handling of a Request, a Request message is queued for a server process. The server process dequeues, reads, processes and responds to the Request message, executing as a separate process. Subsequent Requests to the same server are queued until the server asks to receive the next Request. Procedural semantics have the advantage of allowing each Request (up to the resource limits of the Server) to execute concurrently at the Server, with Request-specific synchronization. Message semantics have the advantage that Requests are serialized at the Server and that the request processing logically executes with the priority, protection and independent execution of a separate process. Note that procedural and message handling of a request appear no differently to the client invoking the message transaction, except possibly for differences in performance. We view the two Request handling approaches as appropriate under different circumstances. VMTP supports both models. 2.18. Bibliography The basic protocol is similar to that used in the original form of the V kernel [3, 4] as well as the transport protocol of Birrell and Nelson's [2] remote procedure call mechanism. An earlier version of the protocol was described in SIGCOMM'86 [6]. The rate-based flow control is similar to the techniques of Netblt [9]. The support for idempotency draws, in part, on the favorable experience with idempotency in the V distributed system. Its use was originally inspired by the Woodstock File Server [11]. The multicast support draws on the multicast facilities in V [5] and is designed to work with, and is now implemented using, the multicast extensions to the Internet [8] described in RFC 966 and 988. The secure version of the protocol is similar to that described by Birrell [1] for secure RPC. The use of runs of packet groups is similar to Fletcher and Watson's delta-T protocol [10]. The use of "management" operations implemented using VMTP in place of specialized packet types is viewed as part of a general strategy of using recursion to simplify protocol architectures [7]. Finally, this protocol was designed, in part, to respond to the requirements identified by Braden in RFC 955. We believe that VMTP satisfies the requirements stated in RFC 955.
[1] A.D. Birrell, "Secure Communication using Remote Procedure Calls", ACM. Trans. on Computer Systems 3(1), February, 1985. [2] A. Birrell and B. Nelson, "Implementing Remote Procedure Calls", ACM Trans. on Computer Systems 2(1), February, 1984. [3] D.R. Cheriton and W. Zwaenepoel, "The Distributed V Kernel and its Performance for Diskless Workstations", In Proceedings of the 9th Symposium on Operating System Principles, ACM, 1983. [4] D.R. Cheriton, "The V Kernel: A Software Base for Distributed Systems", IEEE Software 1(2), April, 1984. [5] D.R. Cheriton and W. Zwaenepoel, "Distributed Process Groups in the V Kernel", ACM Trans. on Computer Systems 3(2), May, 1985. [6] D.R. Cheriton, "VMTP: A Transport Protocol for the Next Generation of Communication Systems", In Proceedings of SIGCOMM'86, ACM, Aug 5-7, 1986. [7] D.R. Cheriton, "Exploiting Recursion to Simplify an RPC Communication Architecture", in preparation, 1988. [8] D.R. Cheriton and S.E. Deering, "Host Groups: A Multicast Extension for Datagram Internetworks", In 9th Data Communication Symposium, IEEE Computer Society and ACM SIGCOMM, September, 1985. [9] D.D. Clark and M. Lambert and L. Zhang, "NETBLT: A Bulk Data Transfer Protocol", Technical Report RFC 969, Defense Advanced Research Projects Agency, 1985. [10] J.G. Fletcher and R.W. Watson, "Mechanism for a Reliable Timer- based Protocol", Computer Networks 2:271-290, 1978.
[11] D. Swinehart and G. McDaniel and D. Boggs, "WFS: A Simple File System for a Distributed Environment", In Proc. 7th Symp. Operating Systems Principles, 1979.
3. VMTP Packet Formats VMTP uses 2 basic packet formats corresponding to Request packets and Response packets. These packet formats are identical in most of the fields to simplify the implementation. We first describe the entity identifier format and the packet fields that are used in general, followed by a detailed description of each of the packet formats. These fields are described below in detail. The individual packet formats are described in the following subsections. The reader and VMTP implementor may wish to refer to Chapters 4 and 5 for a description of VMTP event handling and only refer to this detailed description as needed. 3.1. Entity Identifier Format The 64-bit non-group entity identifiers have the following substructure. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| |L|R| |A|0|E|E| Domain-specific structure |E| |E|S| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Domain-specific structure | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The field meanings are as follows: RAE Remote Alias Entity - the entity identifier identifies an entity that is acting as an alias for some entity outside this entity domain. This bit is used by higher-level protocols. For instance, servers may take extra security and protection measures with aliases. GRP Group - 0, for non-group entity identifiers. LEE Little-Endian Entity - the entity transmits data in little-endian (VAX) order. RES Reserved - must be 0. The 64-bit entity group identifiers have the following substructure.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |R| |U|R| |A|1|G|E| Domain-specific structure |E| |P|S| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Domain-specific structure | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ The field meanings are as follows: RAE Remote Alias Entity - same as for non-group entity identifier. GRP Group - 1, for entity group identifiers. UGP Unrestricted Group - no restrictions are placed on joining this group. I.e. any entity can join limited only by implementation resources. RES Reserved - must be 0. The all-zero entity identifier is reserved and guaranteed to be unallocated in all domains. In addition, a domain may reserve part of the entity identifier space for statically allocated identifiers. However, this is domain-specific. Description of currently defined entity identifier domains is provided in Appendix IV. 3.2. Packet Fields Client 64-bit identifier for the client entity associated with this packet. The structure, allocation and binding of this identifier is specific to the specified Domain. An entity identifier always includes 4 types bits as specified in Section 3.1. Version The 3-bit identifier specifying the version of the protocol. Current version is version 0. Domain The 13-bit identifier specifying the naming and administration domain for the client and server named in the packet.
Packet Flags: 3 bits. (The normal case has none of the flags set.) HCO Header checksum only - checksum has only been calculated on the header. This is used in some real-time applications where the strict correctness of the data is not needed. EPG Encrypted packet group - part of a secure message transaction. MPG Multicast packet group - packet was multicast on transmission. Length A 13-bit field that specifies the number of 32-bit words in the segment data portion of the packet (if any), excluding the checksum field. (Every VMTP packet is required to be a multiple of 64 bits, possibly by padding out the segment data.) The minimum legal Length is 0, the maximum length is 4096 and it must be an even number. Control Flags: 9 bits. (The normal case has none of the flags set.) NRS Next Receive Sequence - the associated Request message (in a Response) or previous Response (if a Request) was received consecutive with the last Request from this entity. That is, there was no interfering messages received. APG Acknowledge Packet Group - Acknowledge packet group on receipt. If a Request, send back a Request to the client's manager providing an update on the state of the transaction as soon as the request packet group is received, independent of the response being available. If a Response, send an update to the server's manager as soon as possible after response packet group is received providing an update on the state of the transaction at the client NSR Not Start Run - 1 if this packet is not part of the first packet group of a run of packet groups. NER Not End Run - 1 if this packet is not part of the last packet group of a run of packet groups. NRT No Retransmission - do not ask for retransmissions of this packet group if not all received within timeout
period, just deliver or discard. MDG Member of Destination Group - this packet is sent to a group and the client is a member of this group. CMG Continued Message - the message (Request or Response) is continued in the next packet group. The next packet group has to be part of the same run of packet groups. STI Skip Transaction Identifiers - the next transaction identifier that the Client plans to use is the current transaction plus 256, if part of the same run and at least this big if not. In a Request, this authorizes the Server to send back up to 256 packet groups containing the Response. DRT Delay Response Transmission - set by request sender if multiple responses are expected (as indicated by the MRD flag in the RequestCode) and it may be overrun by multiple responses. The responder(s) should then introduce a short random delay in sending the Response to minimize the danger of overrunning the Client. This is normally only used for responding to multicast Requests where the Client may be receiving a large number of Responses, as indicated by the MRD flag in the Request flags. Otherwise, the Response is sent immediately. RetransmitCount: 3 bits - the ordinal number of transmissions of this packet group prior to this one, modulo 8. This field is used in estimation of roundtrip times. This count may wrap around during a message transaction. However, it should be sufficient to match acknowledgments and responses with a particular transmission. ForwardCount: 4 bits indicating the number of times this Request has been forwarded. The original Request is always sent with a ForwardCount of 0. Interpacket Gap: 8 bits. Indicates the recommended time to use between subsequent packet transmissions within a multi-packet packet group transmission. The Interpacket Gap time is in 1/32nd of a network packet transmission time for a packet of size MTU for the node. (Thus, the maximum gap time is 8 packet times.)
PGcount: 8 bits The number of packet groups that this packet group represents in addition to that specified by the Transaction field. This is used in acknowledging multiple packet groups in streamed communication. Priority 4-bit identifier for priority for the processing of this request both on transmission and reception. The interpretation is: 1100 urgent/emergency 1000 important 0000 normal 0100 background Viewing the higher-order bit as a sign bit (with 1 meaning negative), low values are high priority and high values are low priority. The low-order 2 bits indicate additional (lower) gradations for each level. Function Code: 1 bit - types of VMTP packets. If the low-order bit of the function code is 0, the packet is sent to the Server, else it is sent to the Client. 0 Request 1 Response Transaction: 32 bits: Identifier for this message transaction. PacketDelivery: 32 bits: Delivery indicates the segment blocks contained in this packet. Each bit corresponds to one 512-octet block of segment data. A 1 bit in the i-th bit position (counting the LSB as 0) indicates the presence of the i-th segment block. Server: 64 bits Entity identifier for the server or server group associated with this transaction. This is the receiver when a Request packet and the sender when a Response packet.
Code: 32 bits The Request Code and Response Code, set either at the user level or VMTP level depending on use and packet type. Both the Request and Response codes include 8 high-order bits from the following set of control bits: CMD Conditional Message Delivery - only deliver the request or response if the receiving entity is waiting for it at the time of delivery, otherwise drop the message. DGM DataGram Message - indicates that the message is being sent as a datagram. If a Request message, do not wait for reply, or retransmit. If a Response message, treat this message transaction as idempotent. MDM Message Delivery Mask - indicates that the MsgDelivery field is being used. Otherwise, the MsgDelivery field is available for general use. SDA Segment Data Appended - segment data is appended to the message control block, with the total size of the segment specified by the SegmentSize field. Otherwise, the segment data is null and the SegmentSize field is not used by VMTP and available for user- or RPC-level uses. CRE CoResident Entity - indicates that the CoResidentEntity field in the message should be interpreted by VMTP. Otherwise, this field is available for additional user data. MRD Multiple Responses Desired - multiple Responses are desired to to this Request if it is multicast. Otherwise, the VMTP module can discard subsequent Responses after the first Response. PIC Public Interface Code - Values for Code with this bit set are reserved for definition by the VMTP specification and other standard protocols defined on top of VMTP. RES Reserved for future use. Must be 0. CoResidentEntity 64-bit Identifier for an entity or group of entities with which the Server entity or entities must be co-resident, i.e. route only to entities (identified by Server) on the same host(s) as that specified by
CoResidentEntity, Only meaningful if CRE is set in the Code field. User Data 12 octets Space in the header for the VMTP user to specify user-specific control and data. MsgDelivery: 32 bits The segment blocks being transmitted (in total) in this packet group following the conventions for the PacketDelivery field. This field is ignored by the protocol and treated as an additional user data field if MDM is 0. On transmission, the user level sets the MsgDelivery to indicate those portions of the segment to be transmitted. On receipt, the MsgDelivery field is modified by the VMTP module to indicate the segment data blocks that were actually received before the message control block is passed to the user or RPC level. In particular, the kernel does not discard the packet group if segment data blocks are missing. A Server or Client entity receiving a message with a MsgDelivery in use must check the field to ensure adequate delivery and retry the operation if necessary. SegmentSize: 32 bits Size of segment in octets, up to a maximum of 16 kilooctets without streaming and 4 megaoctets with streaming, if SDA is set. Otherwise, this field is ignored by the protocol and treated as an additional user data field. Segment Data: 0-16 kilooctets 0 octets if SDA is 0, else the portion of the segment corresponding to the Delivery Mask, limited by the SegmentSize and the MTU, padded out to a multiple of 64 bits. Checksum: 32 bits. The 32-bit checksum for the header and segment data. The VMTP checksum algorithm <9> develops a 32-bit checksum by computing _______________ <9> This algorithm and description are largely due to Steve Deering of Stanford University.
two 16-bit, ones-complement sums (like IP), each covering different parts of the packet. The packet is divided into clusters of 16 16-bit words. The first, third, fifth,... clusters are added to the first sum, and the second, fourth, sixth,... clusters are added to the second sum. Addition stops at the end of the packet; there is no need to pad out to a cluster boundary (although it is necessary that the packet be an integral multiple of 64 bits; padding octets may have any value and are included in the checksum and in the transmitted packet). If either of the resulting sums is zero, it is changed to 0xFFFF. The two sums are appended to the transmitted packet, with the first sum being transmitted first. Four bytes of zero in place of the checksum may be used to indicate that no checksum was computed. The 16-bit, ones-complement addition in this algorithm is the same as used in IP and, therefore, subject to the same optimizations. In particular, the words may be added up 32-bits at a time as long as the carry-out of each addition is added to the sum on the following addition, using an "add-with-carry" type of instruction. (64-bit or 128-bit additions would also work on machines that have registers that big.) A particular weakness of this algorithm (shared by IP) is that it does not detect the erroneous swapping of 16-bit words, which may easily occur due to software errors. A future version of VMTP is expected to include a more secure algorithm, but such an algorithm appears to require hardware support for efficient execution. Not all of these fields are used in every packet. The specific packet formats are described below. If a field is not mentioned in the description of a packet type, its use is assumed to be clear from the above description.
3.3. Request Packet The Request packet (or packet group) is sent from the client to the server or group of servers to solicit processing plus the return of zero or more responses. A Request packet is identified by a 0 in the LSB of the fourth 32-bit word in the packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + Client (8 octets) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver | |H|E|M| | |sion | Domain |C|P|P| Length | | | |O|G|G| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |N|A|N|N|N|M|C|S|D|Retra|Forward| Inter- | |R|R|R| | |R|P|S|E|R|D|M|T|R|nsmit| Count | Packet | Prior |E|E|E|0| |S|G|R|R|T|G|G|I|T|Count| | Gap | -ity |S|S|S| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transaction | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PacketDelivery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + Server (8 octets) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C|D|M|S|R|C|M|P| | |M|G|D|D|E|R|R|I| RequestCode | |D|M|M|A|S|E|D|C| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + CoResidentEntity (8 octets) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > User Data (12 octets) < +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MsgDelivery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SegmentSize | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > segment data, if any < +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3-1: Request Packet Format The fields of the Request packet are set according to the semantics described in Section 3.2 with the following qualifications.
InterPacketGap The estimated interpacket gap time the client would like for the Response packet group to be sent by the Server in responding to this Request. Transaction Identifier for transaction, at least one greater than the previously issued Request from this Client. Server Server to which this Request is destined. RequestCode Request code for this request, indicating the operation to perform.
3.4. Response Packet The Response packet is sent from the Server to the Client in response to a Request, identified by a 1 in the LSB of the fourth 32-bit word in the packet. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + Client (8 octets) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Ver | |H|E|M| | |sion | Domain |C|P|P| Length | | | |O|G|G| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |N|A|N|N|N|R|C|S|R|Retra|Forward| | |R|R|R| | |R|P|S|E|R|E|M|T|E|nsmit| Count | PGcount | Prior |E|E|E|1| |S|G|R|R|T|S|G|I|S|Count| | | -ity |S|S|S| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Transaction | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PacketDelivery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ + Server (8 octets) + +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |C|D|M|S|R|R|R|R| | |M|G|D|D|E|E|E|E| ResponseCode | |D|M|M|A|S|S|S|S| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > UserData (20 octets) < +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MsgDelivery | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Segment Size | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ > segment data, if any < +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Checksum | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3-2: Response Packet Format The fields of the Response packet are set according to the semantics described in Section 3.2 with the following qualifications. Client, Version, Domain, Transaction Match those in the Request packet group to which this is
a response. STI 1 if this Response is using one or more of the transaction identifiers skipped by the Client after the Request to which this is a Response. STI in the Request essentially allocates up to 256 transaction identifiers for the Server to use in a run of Response packet groups. RetransmitCount The retransmit count from the last Request packet received to which this is a response. ForwardCount The number of times the corresponding Request was forwarded before this Response was generated. PGcount The number of consecutively previous packet groups that this response is acknowledging in addition to the one identified by the Transaction identifier. Server Server sending this response. This may differ from that originally specified in the Request packet if the original Server was a server group, or the request was forwarded. The next two chapters describes the protocol operation using these packet formats, with the the Client and the Server portions described separately.