6.3. Message Structure
RELOAD is a message-oriented request/response protocol. The messages are encoded using binary fields. All integers are represented in network byte order. The general philosophy behind the design was to use Type, Length, Value (TLV) fields to allow for extensibility. However, for the parts of a structure that were required in all messages, we just define these in a fixed position, as adding a type and length for them is unnecessary and would only increase bandwidth and introduce new potential interoperability issues. Each message has three parts, which are concatenated, as shown below: +-------------------------+ | Forwarding Header | +-------------------------+ | Message Contents | +-------------------------+ | Security Block | +-------------------------+
The contents of these parts are as follows: Forwarding Header: Each message has a generic header which is used to forward the message between peers and to its final destination. This header is the only information that an intermediate peer (i.e., one that is not the target of a message) needs to examine. Section 6.3.2 describes the format of this part. Message Contents: The message being delivered between the peers. From the perspective of the forwarding layer, the contents are opaque; however, they are interpreted by the higher layers. Section 6.3.3 describes the format of this part. Security Block: A security block containing certificates and a digital signature over the "Message Contents" section. Note that this signature can be computed without parsing the message contents. All messages MUST be signed by their originator. Section 6.3.4 describes the format of this part.6.3.1. Presentation Language
The structures defined in this document are defined using a C-like syntax based on the presentation language used to define TLS [RFC5246]. Advantages of this style include: o It is familiar enough that most readers can grasp it quickly. o The ability to define nested structures allows a separation between high-level and low-level message structures. o It has a straightforward wire encoding that allows quick implementation, but the structures can be comprehended without knowing the encoding. o It is possible to mechanically compile encoders and decoders. Several idiosyncrasies of this language are worth noting: o All lengths are denoted in bytes, not objects. o Variable-length values are denoted like arrays, with angle brackets. o "select" is used to indicate variant structures. For instance, "uint16 array<0..2^8-2>;" represents up to 254 bytes, which corresponds to up to 127 values of two bytes (16 bits) each.
A repetitive structure member shares a common notation with a member containing a variable-length block of data. The latter always starts with "opaque", whereas the former does not. For instance, the following denotes a variable block of data: opaque data<0..2^32-1>; whereas the following denotes a list of 0, 1, or more instances of the Name element: Name names<0..2^32-1>;6.3.1.1. Common Definitions
This section provides an introduction to the presentation language used throughout RELOAD. An enum represents an enumerated type. The values associated with each possibility are represented in parentheses, and the maximum value is represented as a nameless value, for purposes of describing the width of the containing integral type. For instance, Boolean represents a true or false: enum { false(0), true(1), (255) } Boolean; A boolean value is either a 1 or a 0. The max value of 255 indicates that this is represented as a single byte on the wire. The NodeId, shown below, represents a single Node-ID. typedef opaque NodeId[NodeIdLength]; A NodeId is a fixed-length structure represented as a series of bytes, with the most significant byte first. The length is set on a per-overlay basis within the range of 16-20 bytes (128 to 160 bits). (See Section 11.1 for how NodeIdLength is set.) Note that the use of "typedef" here is an extension to the TLS language, but its meaning should be relatively obvious. Also note that the [ size ] syntax defines a fixed-length element that does not include the length of the element in the on-the-wire encoding. A ResourceId, shown below, represents a single Resource-ID. typedef opaque ResourceId<0..2^8-1>; Like a NodeId, a ResourceId is an opaque string of bytes, but unlike NodeIds, ResourceIds are variable length, up to 254 bytes (2040 bits) in length. On the wire, each ResourceId is preceded by a single
length byte (allowing lengths up to 255 bytes). Thus, the 3-byte value "FOO" would be encoded as: 03 46 4f 4f. Note the < range > syntax defines a variable length element that includes the length of the element in the on-the-wire encoding. The number of bytes to encode the length on the wire is derived by range; i.e., it is the minimum number of bytes which can encode the largest range value. A more complicated example is IpAddressPort, which represents a network address and can be used to carry either an IPv6 or IPv4 address: enum { invalidAddressType(0), ipv4_address(1), ipv6_address(2), (255) } AddressType; struct { uint32 addr; uint16 port; } IPv4AddrPort; struct { uint128 addr; uint16 port; } IPv6AddrPort; struct { AddressType type; uint8 length; select (type) { case ipv4_address: IPv4AddrPort v4addr_port; case ipv6_address: IPv6AddrPort v6addr_port; /* This structure can be extended */ }; } IpAddressPort; The first two fields in the structure are the same no matter what kind of address is being represented: type: The type of address (IPv4 or IPv6). length: The length of the rest of the structure.
By having the type and the length appear at the beginning of the structure regardless of the kind of address being represented, an implementation which does not understand new address type X can still parse the IpAddressPort field and then discard it if it is not needed. The rest of the IpAddressPort structure is either an IPv4AddrPort or an IPv6AddrPort. Both of these simply consist of an address represented as an integer and a 16-bit port. As an example, here is the wire representation of the IPv4 address "192.0.2.1" with port "6084". 01 ; type = IPv4 06 ; length = 6 c0 00 02 01 ; address = 192.0.2.1 17 c4 ; port = 6084 Unless a given structure that uses a select explicitly allows for unknown types in the select, any unknown type SHOULD be treated as a parsing error, and the whole message SHOULD be discarded with no response.6.3.2. Forwarding Header
The forwarding header is defined as a ForwardingHeader structure, as shown below. struct { uint32 relo_token; uint32 overlay; uint16 configuration_sequence; uint8 version; uint8 ttl; uint32 fragment; uint32 length; uint64 transaction_id; uint32 max_response_length; uint16 via_list_length; uint16 destination_list_length; uint16 options_length; Destination via_list[via_list_length]; Destination destination_list [destination_list_length]; ForwardingOption options[options_length]; } ForwardingHeader;
The contents of the structure are: relo_token: The first four bytes identify this message as a RELOAD message. This field MUST contain the value 0xd2454c4f (the string "RELO" with the high bit of the first byte set). overlay: The 32-bit checksum/hash of the overlay being used. This MUST be formed by taking the lower 32 bits of the SHA-1 [RFC3174] hash of the overlay name. The purpose of this field is to allow nodes to participate in multiple overlays and to detect accidental misconfiguration. This is not a security-critical function. The overlay name MUST consist of a sequence of characters that would be allowable as a DNS name. Specifically, as it is used in a DNS lookup, it will need to be compliant with the grammar for the domain as specified in Section 2.3.1 of [RFC1035]. configuration_sequence: The sequence number of the configuration file. See Section 6.3.2.1 for details. version: The version of the RELOAD protocol being used times 10. RELOAD version numbers are fixed-point decimal numbers between fixed-point integer between 0.1 and 25.4. This document describes version 1.0, with a value of 0x0a. (Note that versions used prior to the publication of this RFC used version number 0.1.) Nodes MUST reject messages with other versions. ttl: An 8-bit field indicating the number of iterations, or hops, a message can experience before it is discarded. The TTL (time-to- live) value MUST be decremented by one at every hop along the route the message traverses just before transmission. If a received message has a TTL of 0 and the message is not destined for the receiving node, then the message MUST NOT be propagated further, and an Error_TTL_Exceeded error should be generated. The initial value of the TTL SHOULD be 100 and MUST NOT exceed 100 unless defined otherwise by the overlay configuration. Implementations which receive messages with a TTL greater than the current value of initial-ttl (or the default of 100) MUST discard the message and send an Error_TTL_Exceeded error. fragment: This field is used to handle fragmentation. The high bit (0x80000000) MUST be set for historical reasons. If the next bit (0x40000000) is set to 1, it indicates that this is the last (or only) fragment. The next six bits (0x20000000 through 0x01000000) are reserved and SHOULD be set to zero. The remainder of the field is used to indicate the fragment offset; see Section 6.7 for details.
length: The count in bytes of the size of the message, including the header, after the eventual fragmentation. transaction_id: A unique 64-bit number that identifies this transaction and also allows receivers to disambiguate transactions which are otherwise identical. In order to provide a high probability that transaction IDs are unique, they MUST be randomly generated. Responses use the same transaction ID as the request to which they correspond. Transaction IDs are also used for fragment reassembly. See Section 6.7 for details. max_response_length: The maximum size in bytes of a response. This is used by requesting nodes to avoid receiving (unexpected) very large responses. If this value is non-zero, responding peers MUST check that any response would not exceed it and if so generate an Error_Incompatible_with_Overlay value. This value SHOULD be set to zero for responses. via_list_length: The length of the Via List in bytes. Note that in this field and the following two length fields, we depart from the usual variable-length convention of having the length immediately precede the value, in order to make it easier for hardware decoding engines to quickly determine the length of the header. destination_list_length: The length of the Destination List in bytes. options_length: The length of the header options in bytes. via_list: The via_list contains the sequence of destinations through which the message has passed. The via_list starts out empty and grows as the message traverses each peer. In stateless cases, the previous hop that the message is from is appended to the Via List as specified in Section 6.1.2. destination_list: The destination_list contains a sequence of destinations through which the message should pass. The Destination List is constructed by the message originator. The first element on the Destination List is where the message goes next. Generally, the list shrinks as the message traverses each listed peer, though if list compression is used, this may not be true. options: Contains a series of ForwardingOption entries. See Section 6.3.2.3.
6.3.2.1. Processing Configuration Sequence Numbers
In order to be part of the overlay, a node MUST have a copy of the overlay Configuration Document. In order to allow for configuration document changes, each version of the Configuration Document MUST contain a sequence number which MUST be monotonically increasing mod 65535. Because the sequence number may, in principle, wrap, greater than or less than are interpreted by modulo arithmetic as in TCP. When a destination node receives a request, it MUST check that the configuration_sequence field is equal to its own configuration sequence number. If they do not match, the node MUST generate an error, either Error_Config_Too_Old or Error_Config_Too_New. In addition, if the configuration file in the request is too old, the node MUST generate a ConfigUpdate message to update the requesting node. This allows new Configuration Documents to propagate quickly throughout the system. The one exception to this rule is that if the configuration_sequence field is equal to 65535 and the message type is ConfigUpdate, then the message MUST be accepted regardless of the receiving node's configuration sequence number. Since 65535 is a special value, peers sending a new configuration when the configuration sequence is currently 65534 MUST set the configuration sequence number to 0 when they send a new configuration.
6.3.2.2. Destination and Via Lists
The Destination List and Via List are sequences of Destination values: enum { invalidDestinationType(0), node(1), resource(2), opaque_id_type(3), /* 128-255 not allowed */ (255) } DestinationType; select (destination_type) { case node: NodeId node_id; case resource: ResourceId resource_id; case opaque_id_type: opaque opaque_id<0..2^8-1>; /* This structure may be extended with new types */ } DestinationData; struct { DestinationType type; uint8 length; DestinationData destination_data; } Destination; struct { uint16 opaque_id; /* Top bit MUST be 1 */ } Destination; If the destination structure is a 16-bit integer, then the first bit MUST be set to 1, and it MUST be treated as if it were a full structure with a DestinationType of opaque_id_type and an opaque_id that was 2 bytes long with the value of the 16-bit integer. If the destination structure starts with DestinationType, then the first bit MUST be set to 0, and the destination structure must use a TLV structure with the following contents: type The type of the DestinationData Payload Data Unit (PDU). It may be one of "node", "resource", or "opaque_id_type". length The length of the destination_data.
destination_data The destination value itself, which is an encoded DestinationData structure that depends on the value of "type". Note that the destination structure encodes a Type, Length, Value. The Length field specifies the length of the DestinationData values, which allows the addition of new DestinationTypes. It also allows an implementation which does not understand a given DestinationType to skip over it. A DestinationData can be one of three types: node A Node-ID. opaque A compressed list of Node-IDs and an eventual Resource-ID. Because this value has been compressed by one of the peers, it is meaningful only to that peer and cannot be decoded by other peers. Thus, it is represented as an opaque string. resource The Resource-ID of the resource which is desired. This type MUST appear only in the final location of a Destination List and MUST NOT appear in a Via List. It is meaningless to try to route through a resource. One possible encoding of the 16-bit integer version as an opaque identifier is to encode an index into a Connection Table. To avoid misrouting responses in the event a response is delayed and the Connection Table entry has changed, the identifier SHOULD be split between an index and a generation counter for that index. When a Node first joins the overlay, the generation counters SHOULD be initialized to random values. An implementation MAY use 12 bits for the Connection Table index and 3 bits for the generation counter. (Note that this does not suggest a 4096-entry Connection Table for every peer, only the ability to encode for a larger Connection Table.) When a Connection Table slot is used for a new connection, the generation counter is incremented (with wrapping). Connection Table slots are used on a rotating basis to maximize the time interval between uses of the same slot for different connections. When routing a message to an entry in the Destination List encoding a Connection Table entry, the peer MUST confirm that the generation counter matches the current generation counter of that index before forwarding the message. If it does not match, the message MUST be silently dropped.
6.3.2.3. Forwarding Option
The Forwarding header can be extended with forwarding header options, which are a series of ForwardingOption structures: enum { invalidForwardingOptionType(0), (255) } ForwardingOptionType; struct { ForwardingOptionType type; uint8 flags; uint16 length; select (type) { /* This type may be extended */ }; } ForwardingOption; Each ForwardingOption consists of the following values: type The type of the option. This structure allows for unknown options types. flags Three flags are defined: FORWARD_CRITICAL(0x01), DESTINATION_CRITICAL(0x02), and RESPONSE_COPY(0x04). These flags MUST NOT be set in a response. If the FORWARD_CRITICAL flag is set, any peer that would forward the message but does not understand this option MUST reject the request with an Error_Unsupported_Forwarding_Option error response. If the DESTINATION_CRITICAL flag is set, any node that generates a response to the message but does not understand the forwarding option MUST reject the request with an Error_Unsupported_Forwarding_Option error response. If the RESPONSE_COPY flag is set, any node generating a response MUST copy the option from the request to the response except that the RESPONSE_COPY, FORWARD_CRITICAL, and DESTINATION_CRITICAL flags MUST be cleared. length The length of the rest of the structure. Note that a 0 length may be reasonable if the mere presence of the option is meaningful and no value is required. option The option value.
6.3.3. Message Contents Format
The second major part of a RELOAD message is the contents part, which is defined by MessageContents: enum { invalidMessageExtensionType(0), (2^16-1) } MessageExtensionType; struct { MessageExtensionType type; Boolean critical; opaque extension_contents<0..2^32-1>; } MessageExtension; struct { uint16 message_code; opaque message_body<0..2^32-1>; MessageExtension extensions<0..2^32-1>; } MessageContents; The contents of this structure are as follows: message_code This indicates the message that is being sent. The code space is broken up as follows: 0x0 Invalid Message Code. This code will never be assigned. 0x1 .. 0x7FFF Requests and responses. These code points are always paired, with requests being an odd value and the corresponding response being the request code plus 1. Thus, "probe_request" (the Probe request) has the value 1 and "probe_answer" (the Probe response) has the value 2 0x8000 .. 0xFFFE Reserved 0xFFFF Error The message codes are defined in Section 14.8. message_body The message body itself, represented as a variable-length string of bytes. The bytes themselves are dependent on the code value. See the sections describing the various RELOAD methods (Join, Update, Attach, Store, Fetch, etc.) for the definitions of the payload contents.
extensions Extensions to the message. Currently no extensions are defined, but new extensions can be defined by the process described in Section 14.14. All extensions have the following form: type The extension type. critical Whether this extension needs to be understood in order to process the message. If critical = True and the recipient does not understand the message, it MUST generate an Error_Unknown_Extension error. If critical = False, the recipient MAY choose to process the message even if it does not understand the extension. extension_contents The contents of the extension (which are extension dependent). The subsections 6.4.2, 6.5, and 7 describe structures that are inserted inside the message_body member, depending on the value of the message_code value. For example, a message_code value of join_req means that the structure named JoinReq is inserted inside message_body. This document does not contain a mapping between message_code values and structure names, as the conversion between the two is obvious. Similarly, this document uses the name of the structure without the "Req" or "Ans" suffix to mean the execution of a transaction consisting of the matching request and answer. For example, when the text says "perform an Attach", it must be understood as performing a transaction composed of an AttachReq and an AttachAns.6.3.3.1. Response Codes and Response Errors
A node processing a request MUST return its status in the message_code field. If the request was a success, then the message code MUST be set to the response code that matches the request (i.e., the next code up). The response payload is then as defined in the request/response descriptions. If the request has failed, then the message code MUST be set to 0xffff (error) and the payload MUST be an error_response message, as shown below.
When the message code is 0xFFFF, the payload MUST be an ErrorResponse: public struct { uint16 error_code; opaque error_info<0..2^16-1>; } ErrorResponse; The contents of this structure are as follows: error_code A numeric error code indicating the error that occurred. error_info An optional arbitrary byte string. Unless otherwise specified, this will be a UTF-8 text string that provides further information about what went wrong. Developers are encouraged to include enough diagnostic information to be useful in error_info. The specific text to be used and any relevant language or encoding thereof is left to the implementation. The following error code values are defined. The numeric values for these are defined in Section 14.9. Error_Forbidden The requesting node does not have permission to make this request. Error_Not_Found The resource or node cannot be found or does not exist. Error_Request_Timeout A response to the request has not been received in a suitable amount of time. The requesting node MAY resend the request at a later time. Error_Data_Too_Old A store cannot be completed because the storage_time precedes the existing value. Error_Data_Too_Large A store cannot be completed because the requested object exceeds the size limits for that Kind. Error_Generation_Counter_Too_Low A store cannot be completed because the generation counter precedes the existing value.
Error_Incompatible_with_Overlay A peer receiving the request is using a different overlay, overlay algorithm, or hash algorithm, or some other parameter that is inconsistent with the overlay configuration. Error_Unsupported_Forwarding_Option A node received the request with a forwarding options flagged as critical, but the node does not support this option. See Section 6.3.2.3. Error_TTL_Exceeded A peer received the request in which the TTL was decremented to zero. See Section 6.3.2. Error_Message_Too_Large A peer received a request that was too large. See Section 6.6. Error_Response_Too_Large A node would have generated a response that is too large per the max_response_length field. Error_Config_Too_Old A destination node received a request with a configuration sequence that is too old. See Section 6.3.2.1. Error_Config_Too_New A destination node received a request with a configuration sequence that is too new. See Section 6.3.2.1. Error_Unknown_Kind A destination peer received a request with an unknown Kind-ID. See Section 7.4.1.2. Error_In_Progress An Attach to this peer is already in progress. See Section 6.5.1.2. Error_Unknown_Extension A destination node received a request with an unknown extension. Error_Invalid_Message Something about this message is invalid, but it does not fit the other error codes. When this message is sent, implementations SHOULD provide some meaningful description in error_info to aid in debugging.
Error_Exp_A For the purposes of experimentation. It is not meant for vendor- specific use of any sort and MUST NOT be used for operational deployments. Error_Exp_B For the purposes of experimentation. It is not meant for vendor- specific use of any sort and MUST NOT be used for operational deployments.6.3.4. Security Block
The third part of a RELOAD message is the security block. The security block is represented by a SecurityBlock structure: struct { CertificateType type; // From RFC 6091 opaque certificate<0..2^16-1>; } GenericCertificate; struct { GenericCertificate certificates<0..2^16-1>; Signature signature; } SecurityBlock; The contents of this structure are: certificates A bucket of certificates. signature A signature. The certificates bucket SHOULD contain all the certificates necessary to verify every signature in both the message and the internal message objects, except for those certificates in a root-cert element of the current configuration file. This is the only location in the message which contains certificates, thus allowing only a single copy of each certificate to be sent. In systems that have an alternative certificate distribution mechanism, some certificates MAY be omitted. However, unless an alternative mechanism for immediately generating certificates, such as shared secret security (Section 13.4) is used, implementers MUST include all referenced certificates. NOTE TO IMPLEMENTERS: This requirement implies that a peer storing data is obligated to retain certificates for the data that it holds.
Each certificate is represented by a GenericCertificate structure, which has the following contents: type The type of the certificate, as defined in [RFC6091]. Only the use of X.509 certificates is defined in this document. certificate The encoded version of the certificate. For X.509 certificates, it is the Distinguished Encoding Rules (DER) form. The signature is computed over the payload and parts of the forwarding header. In case of a Store, the payload MUST contain an additional signature computed as described in Section 7.1. All signatures MUST be formatted using the Signature element. This element is also used in other contexts where signatures are needed. The input structure to the signature computation MAY vary depending on the data element being signed. enum { invalidSignerIdentityType(0), cert_hash(1), cert_hash_node_id(2), none(3) (255) } SignerIdentityType; struct { select (identity_type) { case cert_hash; HashAlgorithm hash_alg; // From TLS opaque certificate_hash<0..2^8-1>; case cert_hash_node_id: HashAlgorithm hash_alg; // From TLS opaque certificate_node_id_hash<0..2^8-1>; case none: /* empty */ /* This structure may be extended with new types if necessary*/ }; } SignerIdentityValue; struct { SignerIdentityType identity_type; uint16 length; SignerIdentityValue identity[SignerIdentity.length]; } SignerIdentity;
struct { SignatureAndHashAlgorithm algorithm; // From TLS SignerIdentity identity; opaque signature_value<0..2^16-1>; } Signature; The Signature construct contains the following values: algorithm The signature algorithm in use. The algorithm definitions are found in the IANA TLS SignatureAlgorithm and HashAlgorithm registries. All implementations MUST support RSASSA-PKCS1-v1_5 [RFC3447] signatures with SHA-256 hashes [RFC6234]. identity The identity, as defined in the two paragraphs following this list, used to form the signature. signature_value The value of the signature. Note that storage operations allow for special values of algorithm and identity. See the Store Request definition (Section 7.4.1.1) and the Fetch Response definition (Section 7.4.2.2). There are two permitted identity formats, one for a certificate with only one Node-ID and one for a certificate with multiple Node-IDs. In the first case, the cert_hash type MUST be used. The hash_alg field is used to indicate the algorithm used to produce the hash. The certificate_hash contains the hash of the certificate object (i.e., the DER-encoded certificate). In the second case, the cert_hash_node_id type MUST be used. The hash_alg is as in cert_hash, but the cert_hash_node_id is computed over the NodeId used to sign concatenated with the certificate; i.e., H(NodeId || certificate). The NodeId is represented without any framing or length fields, as simple raw bytes. This is safe because NodeIds are a fixed length for a given overlay. For signatures over messages, the input to the signature is computed over: overlay || transaction_id || MessageContents || SignerIdentity where overlay and transaction_id come from the forwarding header and || indicates concatenation.
The input to signatures over data values is different and is described in Section 7.1. All RELOAD messages MUST be signed. Intermediate nodes do not verify signatures. Upon receipt (and fragment reassembly, if needed), the destination node MUST verify the signature and the authorizing certificate. If the signature fails, the implementation SHOULD simply drop the message and MUST NOT process it. This check provides a minimal level of assurance that the sending node is a valid part of the overlay, and it provides cryptographic authentication of the sending node. In addition, responses MUST be checked as follows by the requesting node: 1. The response to a message sent to a Node-ID MUST have been sent by that Node-ID unless the response has been sent to the wildcard Node-ID. 2. The response to a message sent to a Resource-ID MUST have been sent by a Node-ID which is at least as close to the target Resource-ID as any node in the requesting node's Neighbor Table. The second condition serves as a primitive check for responses from wildly wrong nodes but is not a complete check. Note that in periods of churn, it is possible for the requesting node to obtain a closer neighbor while the request is outstanding. This will cause the response to be rejected and the request to be retransmitted. In addition, some methods (especially Store) have additional authentication requirements, which are described in the sections covering those methods.6.4. Overlay Topology
As discussed in previous sections, RELOAD defines a default overlay topology (CHORD-RELOAD) but allows for other topologies through the use of Topology Plug-ins. This section describes the requirements for new Topology Plug-ins and the methods that RELOAD provides for overlay topology maintenance.6.4.1. Topology Plug-in Requirements
When specifying a new overlay algorithm, at least the following MUST be described: o Joining procedures, including the contents of the Join message.
o Stabilization procedures, including the contents of the Update message, the frequency of topology probes and keepalives, and the mechanism used to detect when peers have disconnected. o Exit procedures, including the contents of the Leave message. o The length of the Resource-IDs and for DHTs the hash algorithm to compute the hash of an identifier. o The procedures that peers use to route messages. o The replication strategy used to ensure data redundancy. All overlay algorithms MUST specify maintenance procedures that send Updates to clients and peers that have established connections to the peer responsible for a particular ID when the responsibility for that ID changes. Because tracking this information is difficult, overlay algorithms MAY simply specify that an Update is sent to all members of the Connection Table whenever the range of IDs for which the peer is responsible changes.6.4.2. Methods and Types for Use by Topology Plug-ins
This section describes the methods that Topology Plug-ins use to join, leave, and maintain the overlay.6.4.2.1. Join
A new peer (which already has credentials) uses the JoinReq message to join the overlay. The JoinReq is sent to the responsible peer depending on the routing mechanism described in the Topology Plug-in. This message notifies the responsible peer that the new peer is taking over some of the overlay and that it needs to synchronize its state. struct { NodeId joining_peer_id; opaque overlay_specific_data<0..2^16-1>; } JoinReq; The minimal JoinReq contains only the Node-ID which the sending peer wishes to assume. Overlay algorithms MAY specify other data to appear in this request. Receivers of the JoinReq MUST verify that the joining_peer_id field matches the Node-ID used to sign the message and, if not, the message MUST be rejected with an Error_Forbidden error.
Because joins may be executed only between nodes which are directly adjacent, receiving peers MUST verify that any JoinReq they receive arrives from a transport channel that is bound to the Node-ID to be assumed by the Joining Node. Implementations MUST use DTLS anti-replay mechanisms, thus preventing replay attacks. If the request succeeds, the responding peer responds with a JoinAns message, as defined below: struct { opaque overlay_specific_data<0..2^16-1>; } JoinAns; If the request succeeds, the responding peer MUST follow up by executing the right sequence of Stores and Updates to transfer the appropriate section of the overlay space to the Joining Node. In addition, overlay algorithms MAY define data to appear in the response payload that provides additional information. Joining Nodes MUST verify that the signature on the JoinAns message matches the expected target (i.e., the adjacency over which they are joining). If not, they MUST discard the message. In general, nodes which cannot form connections SHOULD report an error to the user. However, implementations MUST provide some mechanism whereby nodes can determine that they are potentially the first node and can take responsibility for the overlay. (The idea is to avoid having ordinary nodes try to become responsible for the entire overlay during a partition.) This specification does not mandate any particular mechanism, but a configuration flag or setting seems appropriate.6.4.2.2. Leave
The LeaveReq message is used to indicate that a node is exiting the overlay. A node SHOULD send this message to each peer with which it is directly connected prior to exiting the overlay. struct { NodeId leaving_peer_id; opaque overlay_specific_data<0..2^16-1>; } LeaveReq; LeaveReq contains only the Node-ID of the leaving peer. Overlay algorithms MAY specify other data to appear in this request. Receivers of the LeaveReq MUST verify that the leaving_peer_id field matches the Node-ID used to sign the message and, if not, the message MUST be rejected with an Error_Forbidden error.
Because leaves may be executed only between nodes which are directly adjacent, receiving peers MUST verify that any LeaveReq they receive arrives from a transport channel that is bound to the Node-ID to be assumed by the leaving peer. This also prevents replay attacks, provided that DTLS anti-replay is used. Upon receiving a Leave request, a peer MUST update its own Routing Table and send the appropriate Store/Update sequences to re-stabilize the overlay. LeaveAns is an empty message.6.4.2.3. Update
Update is the primary overlay-specific maintenance message. It is used by the sender to notify the recipient of the sender's view of the current state of the overlay (that is, its routing state), and it is up to the recipient to take whatever actions are appropriate to deal with the state change. In general, peers send Update messages to all their adjacencies whenever they detect a topology shift. When a peer receives an Attach request with the send_update flag set to True (Section 6.4.2.4.1), it MUST send an Update message back to the sender of the Attach request after completion of the corresponding ICE check and TLS connection. Note that the sender of such an Attach request may not have joined the overlay yet. When a peer detects through an Update that it is no longer responsible for any data value it is storing, it MUST attempt to Store a copy to the correct node unless it knows the newly responsible node already has a copy of the data. This prevents data loss during large-scale topology shifts, such as the merging of partitioned overlays. The contents of the UpdateReq message are completely overlay specific. The UpdateAns response is expected to be either success or an error.6.4.2.4. RouteQuery
The RouteQuery request allows the sender to ask a peer where they would route a message directed to a given destination. In other words, a RouteQuery for a destination X requests the Node-ID for the node that the receiving peer would next route to in order to get to X. A RouteQuery can also request that the receiving peer initiate an Update request to transfer the receiving peer's Routing Table.
One important use of the RouteQuery request is to support iterative routing. The sender selects one of the peers in its Routing Table and sends it a RouteQuery message with the destination field set to the Node-ID or Resource-ID to which it wishes to route. The receiving peer responds with information about the peers to which the request would be routed. The sending peer MAY then use the Attach method to attach to that peer(s) and repeat the RouteQuery. Eventually, the sender gets a response from a peer that is closest to the identifier in the destination field as determined by the Topology Plug-in. At that point, the sender can send messages directly to that peer.6.4.2.4.1. Request Definition
A RouteQueryReq message indicates the peer or resource that the requesting node is interested in. It also contains a "send_update" option that allows the requesting node to request a full copy of the other peer's Routing Table. struct { Boolean send_update; Destination destination; opaque overlay_specific_data<0..2^16-1>; } RouteQueryReq; The contents of the RouteQueryReq message are as follows: send_update A single byte. This may be set to True to indicate that the requester wishes the responder to initiate an Update request immediately. Otherwise, this value MUST be set to False. destination The destination which the requester is interested in. This may be any valid destination object, including a Node-ID, opaque ID, or Resource-ID. Note: If implementations are using opaque IDs for privacy purposes, answering RouteQueryReqs for opaque IDs will allow the requester to translate an opaque ID. Implementations MAY wish to consider limiting the use of RouteQuery for opaque IDs in such cases. overlay_specific_data Other data as appropriate for the overlay.
6.4.2.4.2. Response Definition
A response to a successful RouteQueryReq request is a RouteQueryAns message. This message is completely overlay specific.6.4.2.5. Probe
Probe provides primitive "exploration" services: it allows a node to determine which resources another node is responsible for. A probe can be addressed to a specific Node-ID or to the peer controlling a given location (by using a Resource-ID). In either case, the target node responds with a simple response containing some status information.6.4.2.5.1. Request Definition
The ProbeReq message contains a list (potentially empty) of the pieces of status information that the requester would like the responder to provide. enum { invalidProbeInformationType(0), responsible_set(1), num_resources(2), uptime(3), (255) } ProbeInformationType; struct { ProbeInformationType requested_info<0..2^8-1>; } ProbeReq; The currently defined values for ProbeInformationType are: responsible_set Indicates that the peer should Respond with the fraction of the overlay for which the responding peer is responsible. num_resources Indicates that the peer should Respond with the number of resources currently being stored by the peer. Note that multiple values under the same Resource-ID are counted only once. uptime Indicates that the peer should Respond with how long the peer has been up, in seconds.
6.4.2.5.2. Response Definition
A successful ProbeAns response contains the information elements requested by the peer. struct { select (type) { case responsible_set: uint32 responsible_ppb; case num_resources: uint32 num_resources; case uptime: uint32 uptime; /* This type may be extended */ }; } ProbeInformationData; struct { ProbeInformationType type; uint8 length; ProbeInformationData value; } ProbeInformation; struct { ProbeInformation probe_info<0..2^16-1>; } ProbeAns; A ProbeAns message contains a sequence of ProbeInformation structures. Each has a "length" indicating the length of the following value field. This structure allows for unknown option types. Each of the current possible Probe information types is a 32-bit unsigned integer. For type "responsible_ppb", it is the fraction of the overlay for which the peer is responsible, in parts per billion. For type "num_resources", it is the number of resources the peer is storing. For the type "uptime", it is the number of seconds the peer has been up. The responding peer SHOULD include any values that the requesting node requested and that it recognizes. They SHOULD be returned in the requested order. Any other values MUST NOT be returned.