Tech-invite3GPPspaceIETFspace
9796959493929190898887868584838281807978777675747372717069686766656463626160595857565554535251504948474645444342414039383736353433323130292827262524232221201918171615141312111009080706050403020100
in Index   Prev   Next

RFC 6940

REsource LOcation And Discovery (RELOAD) Base Protocol

Pages: 176
Proposed Standard
Errata
Part 5 of 7 – Pages 93 to 124
First   Prev   Next

Top   ToC   RFC6940 - Page 93   prevText

7.3. Access Control Policies

Every Kind which is storable in an overlay MUST be associated with an access control policy. This policy defines whether a request from a given node to operate on a given value should succeed or fail. It is anticipated that only a small number of generic access control policies are required. To that end, this section describes a small set of such policies, and Section 14.4 establishes a registry for new policies, if required. Each policy has a short string identifier which is used to reference it in the Configuration Document. In the following policies, the term "signer" refers to the signer of the StoredValue object and, in the case of non-replica stores, to the signer of the StoreReq message. That is, in a non-replica store, both the signer of the StoredValue and the signer of the StoreReq MUST conform to the policy. In the case of a replica store, the signer of the StoredValue MUST conform to the policy, and the StoreReq itself MUST be checked as described in Section 7.4.1.1.

7.3.1. USER-MATCH

In the USER-MATCH policy, a given value MUST be written (or overwritten) if and only if the signer's certificate has a user name which hashes (using the hash function for the overlay) to the Resource-ID for the resource. Recall that the certificate may, depending on the overlay configuration, be self-signed.

7.3.2. NODE-MATCH

In the NODE-MATCH policy, a given value MUST be written (or overwritten) if and only if the signer's certificate has a specified Node-ID which hashes (using the hash function for the overlay) to the Resource-ID for the resource and that Node-ID is the one indicated in the SignerIdentity value cert_hash.

7.3.3. USER-NODE-MATCH

The USER-NODE-MATCH policy may be used only with dictionary types. In the USER-NODE-MATCH policy, a given value MUST be written (or overwritten) if and only if the signer's certificate has a user name which hashes (using the hash function for the overlay) to the Resource-ID for the resource. In addition, the dictionary key MUST be equal to the Node-ID in the certificate, and that Node-ID MUST be the one indicated in the SignerIdentity value cert_hash.
Top   ToC   RFC6940 - Page 94

7.3.4. NODE-MULTIPLE

In the NODE-MULTIPLE policy, a given value MUST be written (or overwritten) if and only if the signer's certificate contains a Node-ID such that H(Node-ID || i) is equal to the Resource-ID for some small integer value of i and that Node-ID is the one indicated in the SignerIdentity value cert_hash. When this policy is in use, the maximum value of i MUST be specified in the Kind definition. Note that because i is not carried on the wire, the verifier MUST iterate through potential i values, up to the maximum value, to determine whether a store is acceptable.

7.4. Data Storage Methods

RELOAD provides several methods for storing and retrieving data: o Store values in the overlay. o Fetch values from the overlay. o Stat: Get metadata about values in the overlay. o Find the values stored at an individual peer. These methods are described in the following sections.

7.4.1. Store

The Store method is used to store data in the overlay. The format of the Store request depends on the data model, which is determined by the Kind.
7.4.1.1. Request Definition
A StoreReq message is a sequence of StoreKindData values, each of which represents a sequence of stored values for a given Kind. The same Kind-ID MUST NOT be used twice in a given store request. Each value is then processed in turn. These operations MUST be atomic. If any operation fails, the state MUST be rolled back to what it was before the request was received.
Top   ToC   RFC6940 - Page 95
   The store request is defined by the StoreReq structure:

       struct {
           KindId                 kind;
           uint64                 generation_counter;
           StoredData             values<0..2^32-1>;
       } StoreKindData;

       struct {
           ResourceId             resource;
           uint8                  replica_number;
           StoreKindData          kind_data<0..2^32-1>;
       } StoreReq;

   A single Store request stores data of a number of Kinds to a single
   resource location.  The contents of the structure are:

   resource
      The resource at which to store.

   replica_number
      The number of this replica.  When a storing peer saves replicas to
      other peers, each peer is assigned a replica number, starting from
      1, that is sent in the Store message.  This field is set to 0 when
      a node is storing its own data.  This allows peers to distinguish
      replica writes from original writes.  Different topologies may
      choose to allocate or interpret the replica number differently
      (see Section 10.4).

   kind_data
      A series of elements, one for each Kind of data to be stored.

   The peer MUST check that it is responsible for the resource if the
   replica number is zero; if it is not, the peer must reject the
   request.  The peer MUST check that it expects to be a replica for the
   resource and that the request sender is consistent with being the
   responsible node (i.e., that the receiving peer does not know of a
   better node) if the replica number is nonzero; if the request sender
   is not consistent, it should reject the request.

   Each StoreKindData element represents the data to be stored for a
   single Kind-ID.  The contents of the element are:

   kind
      The Kind-ID.  Implementations MUST reject requests corresponding
      to unknown Kinds.
Top   ToC   RFC6940 - Page 96
   generation_counter
      The expected current state of the generation counter
      (approximately the number of times that this object has been
      written; see below for details).

   values
      The value or values to be stored.  This may contain one or more
      stored_data values, depending on the data model associated with
      each Kind.

   The peer MUST perform the following checks:

   o  The Kind-ID is known and supported.

   o  The signatures over each individual data element, if any, are
      valid.  If this check fails, the request MUST be rejected with an
      Error_Forbidden error.

   o  Each element is signed by a credential which is authorized to
      write this Kind at this Resource-ID.  If this check fails, the
      request MUST be rejected with an Error_Forbidden error.

   o  For original (non-replica) stores, the StoreReq is signed by a
      credential which is authorized to write this Kind at this
      Resource-ID.  If this check fails, the request MUST be rejected
      with an Error_Forbidden error.

   o  For replica stores, the StoreReq is signed by a Node-ID which is a
      plausible node to either have originally stored the value or have
      been in the replica set.  What this means is overlay specific, but
      in the case of the Chord-based DHT defined in this specification,
      replica StoreReqs MUST come from nodes which are either in the
      known replica set for a given resource or which are closer than
      some node in the replica set.  If this check fails, the request
      MUST be rejected with an Error_Forbidden error.

   o  For original (non-replica) stores, the peer MUST check that if the
      generation counter is nonzero, it equals the current value of the
      generation counter for this Kind.  This feature allows the
      generation counter to be used in a way similar to the HTTP ETag
      feature.

   o  For replica Stores, the peer MUST set the generation counter to
      match the generation counter in the message and MUST NOT check the
      generation counter against the current value.  Replica Stores MUST
      NOT use a generation counter of 0.
Top   ToC   RFC6940 - Page 97
   o  The storage time values are greater than that of any values which
      would be replaced by this Store.

   o  The size and number of the stored values are consistent with the
      limits specified in the overlay configuration.

   o  If the data is signed with identity_type set to "none" and/or
      SignatureAndHashAlgorithm values set to {0, 0} ("anonymous" and
      "none"), the StoreReq MUST be rejected with an Error_forbidden
      error.  Only synthesized data returned by the storage can use
      these values (see Section 7.4.2.2)

   If all these checks succeed, the peer MUST attempt to store the data
   values.  For non-replica stores, if the store succeeds and the data
   is changed, then the peer MUST increase the generation counter by at
   least 1.  If there are multiple stored values in a single
   StoreKindData, it is permissible for the peer to increase the
   generation counter by only 1 for the entire Kind-ID or by 1 or more
   than 1 for each value.  Accordingly, all stored data values MUST have
   a generation counter of 1 or greater. 0 is used in the Store request
   to indicate that the generation counter should be ignored for
   processing this request.  However, the responsible peer should
   increase the stored generation counter and should return the correct
   generation counter in the response.

   When a peer stores data previously stored by another node (e.g., for
   replicas or topology shifts), it MUST adjust the lifetime value
   downward to reflect the amount of time the value was stored at the
   peer.  The adjustment SHOULD be implemented by an algorithm
   equivalent to the following: at the time the peer initially receives
   the StoreReq, it notes the local time T.  When it then attempts to do
   a StoreReq to another node, it should decrement the lifetime value by
   the difference between the current local time and T.

   Unless otherwise specified by the usage, if a peer attempts to store
   data previously stored by another node (e.g., for replicas or
   topology shifts) and that store fails with either an
   Error_Generation_Counter_Too_Low or an Error_Data_Too_Old error, the
   peer MUST fetch the newer data from the peer generating the error and
   use that to replace its own copy.  This rule allows resynchronization
   after partitions heal.

   When a network partition is being healed and unless otherwise
   specified, the default merging rule is to act as if all the values
   that need to be merged were stored and as if the order they were
   stored in corresponds to the stored time values associated with (and
   carried in) their values.  Because the stored time values are those
   associated with the peer which did the writing, clock skew is
Top   ToC   RFC6940 - Page 98
   generally not an issue.  If two nodes are on different partitions,
   write to the same location, and have clock skew, this can create
   merge conflicts.  However, because RELOAD deliberately segregates
   storage so that data from different users and peers is stored in
   different locations, and a single peer will typically only be in a
   single network partition, this case will generally not arise.

   The properties of stores for each data model are as follows:

   single-value:  A store of a new single-value element creates the
      element if it does not exist and overwrites any existing value
      with the new value.

   array:  A store of an array entry replaces (or inserts) the given
      value at the location specified by the index.  Because arrays are
      sparse, a store past the end of the array extends it with
      nonexistent values (exists = False) as required.  A store at index
      0xffffffff places the new value at the end of the array,
      regardless of the length of the array.  The resulting StoredData
      has the correct index value when it is subsequently fetched.

   dictionary:  A store of a dictionary entry replaces (or inserts) the
      given value at the location specified by the dictionary key.
Top   ToC   RFC6940 - Page 99
   The following figure shows the relationship between these structures
   for an example store which stores the following values at resource
   "1234":

   o  The value "abc" is in the single-value location for Kind X.

   o  The value "foo" at index 0 is in the array for Kind Y.

   o  The value "bar" at index 1 is in the array for Kind Y.

                                     Store
                                resource=1234
                              replica_number = 0
                                   /      \
                                  /        \
                      StoreKindData        StoreKindData
                  kind=X (Single-Value)    kind=Y (Array)
                generation_counter = 99    generation_counter = 107
                           |                    /\
                           |                   /  \
                       StoredData             /    \
             storage_time = xxxxxxx          /      \
                   lifetime = 86400         /        \
                   signature = XXXX        /          \
                           |               |           |
                           |        StoredData       StoredData
                           |    storage_time =       storage_time =
                           |          yyyyyyyy       zzzzzzz
                           |  lifetime = 86400       lifetime = 33200
                           |  signature = YYYY       signature = ZZZZ
                           |               |           |
                    StoredDataValue        |           |
                     value="abc"           |           |
                                           |           |
                                  StoredDataValue  StoredDataValue
                                        index=0      index=1
                                     value="foo"    value="bar"
Top   ToC   RFC6940 - Page 100
7.4.1.2. Response Definition
In response to a successful Store request, the peer MUST return a StoreAns message containing a series of StoreKindResponse elements, which contains the current value of the generation counter for each Kind-ID, as well as a list of the peers where the data will be replicated by the node processing the request. struct { KindId kind; uint64 generation_counter; NodeId replicas<0..2^16-1>; } StoreKindResponse; struct { StoreKindResponse kind_responses<0..2^16-1>; } StoreAns; The contents of each StoreKindResponse are: kind The Kind-ID being represented. generation_counter The current value of the generation counter for that Kind-ID. replicas The list of other peers at which the data was/will be replicated. In overlays and applications where the responsible peer is intended to store redundant copies, this allows the storing node to independently verify that the replicas have in fact been stored. It does this verification by using the Stat method (see Section 7.4.3). Note that the storing node is not required to perform this verification. The response itself is just StoreKindResponse values packed end to end. If any of the generation counters in the request precede the corresponding stored generation counter, then the peer MUST fail the entire request and respond with an Error_Generation_Counter_Too_Low error. The error_info in the ErrorResponse MUST be a StoreAns response containing the correct generation counter for each Kind and the replica list, which will be empty. For original (non-replica) stores, a node which receives such an error SHOULD attempt to fetch the data and, if the storage_time value is newer, replace its own data with that newer data. This rule improves data consistency in the case of partitions and merges.
Top   ToC   RFC6940 - Page 101
   If the data being stored is too large for the allowed limit by the
   given usage, then the peer MUST fail the request and generate an
   Error_Data_Too_Large error.

   If any type of request tries to access a data Kind that the peer does
   not know about, the peer MUST fail the request and generate an
   Error_Unknown_Kind error.  The error_info in the Error_Response is:

              KindId        unknown_kinds<0..2^8-1>;

   which lists all the Kinds that were unrecognized.  A node which
   receives this error MUST generate a ConfigUpdate message which
   contains the appropriate Kind definition (assuming which, in fact, a
   Kind which was defined in the configuration document was used).

7.4.1.3. Removing Values
RELOAD does not have an explicit Remove operation. Rather, values are Removed by storing "nonexistent" values in their place. Each DataValue contains a boolean value called "exists" which indicates whether a value is present at that location. In order to effectively remove a value, the owner stores a new DataValue with "exists" set to False: exists = False value = {} (0 length) The owner SHOULD use a lifetime for the nonexistent value that is at least as long as the remainder of the lifetime of the value it is replacing. Otherwise, it is possible for the original value to be accidentally or maliciously re-stored after the storing node has expired it. Note that a window of vulnerability for replay attack still exists after the original lifetime has expired (as with any store). This attack can be mitigated by doing a nonexistent store with a very long lifetime. Storing nodes MUST treat these nonexistent values the same way they treat any other stored value, including overwriting the existing value, replicating them, and aging them out as necessary when the lifetime expires. When a stored nonexistent value's lifetime expires, it is simply removed from the storing node, as happens when any other stored value expires. Note that in the case of arrays and dictionaries, expiration may create an implicit, unsigned "nonexistent" value to represent a gap in the data structure, as might happen when any value is aged out.
Top   ToC   RFC6940 - Page 102
   However, this value isn't persistent, nor is it replicated.  It is
   simply synthesized by the storing node.

7.4.2. Fetch

The Fetch request retrieves one or more data elements stored at a given Resource-ID. A single Fetch request can retrieve multiple different Kinds.
7.4.2.1. Request Definition
Fetch requests are defined by the FetchReq structure: struct { int32 first; int32 last; } ArrayRange; struct { KindId kind; uint64 generation; uint16 length; select (DataModel) { case single_value: ; /* Empty */ case array: ArrayRange indices<0..2^16-1>; case dictionary: DictionaryKey keys<0..2^16-1>; /* This structure may be extended */ } model_specifier; } StoredDataSpecifier; struct { ResourceId resource; StoredDataSpecifier specifiers<0..2^16-1>; } FetchReq; The contents of the Fetch requests are as follows: resource The Resource-ID to fetch from.
Top   ToC   RFC6940 - Page 103
   specifiers
      A sequence of StoredDataSpecifier values, each specifying some of
      the data values to retrieve.

   Each StoredDataSpecifier specifies a single Kind of data to retrieve
   and, if appropriate, the subset of values that are to be retrieved.
   The contents of the StoredDataSpecifier structure are as follows:

   kind
      The Kind-ID of the data being fetched.  Implementations SHOULD
      reject requests corresponding to unknown Kinds unless specifically
      configured otherwise.

   DataModel
      The data model of the data.  This is not transmitted on the wire,
      but comes from the definition of the Kind.

   generation
      The last generation counter that the requesting node saw.  This
      may be used to avoid unnecessary fetches, or it may be set to
      zero.

   length
      The length of the rest of the structure, thus allowing
      extensibility.

   model_specifier
      A reference to the data value being requested within the data
      model specified for the Kind.  For instance, if the data model is
      "array", it might specify some subset of the values.

   The model_specifier is as follows:

   o  If the data model is single value, the specifier is empty.

   o  If the data model is array, the specifier contains a list of
      ArrayRange elements, each of which contains two integers.  The
      first integer is the beginning of the range, and the second is the
      end of the range.  0 is used to indicate the first element, and
      0xffffffff is used to indicate the final element.  The first
      integer MUST be less than or equal to the second.  While multiple
      ranges MAY be specified, they MUST NOT overlap.

   o  If the data model is dictionary, then the specifier contains a
      list of the dictionary keys being requested.  If no keys are
      specified, then this is a wildcard fetch and all key-value pairs
      are returned.
Top   ToC   RFC6940 - Page 104
   The generation counter is used to indicate the requester's expected
   state of the storing peer.  If the generation counter in the request
   matches the stored counter, then the storing peer returns a response
   with no StoredData values.

7.4.2.2. Response Definition
The response to a successful Fetch request is a FetchAns message containing the data requested by the requester. struct { KindId kind; uint64 generation; StoredData values<0..2^32-1>; } FetchKindResponse; struct { FetchKindResponse kind_responses<0..2^32-1>; } FetchAns; The FetchAns structure contains a series of FetchKindResponse structures. There MUST be one FetchKindResponse element for each Kind-ID in the request. The contents of the FetchKindResponse structure are as follows: kind The Kind that this structure is for. generation The generation counter for this Kind. values The relevant values. If the generation counter in the request matches the generation counter in the stored data, then no StoredData values are returned. Otherwise, all relevant data values MUST be returned. A nonexistent value (i.e., one which the node has no knowledge of) is represented by a synthetic value with "exists" set to False and has an empty signature. Specifically, the identity_type is set to "none", the SignatureAndHashAlgorithm values are set to {0, 0} ("anonymous" and "none", respectively), and the signature value is of zero length. This removes the need for the responding node to do signatures for values which do not exist. These signatures are unnecessary, as the entire response is signed by that node. Note that entries which have been removed by the procedure given in Section 7.4.1.3 and which have not yet expired also have exists = False, but have valid signatures from the node which did the store.
Top   ToC   RFC6940 - Page 105
   Upon receipt of a FetchAns message, nodes MUST verify the signatures
   on all the received values.  Any values with invalid signatures
   (including expired certificates) MUST be discarded.  Note that this
   implies that implementations which wish to store data for long
   periods of time must have certificates with appropriate expiration
   dates or must re-store periodically.  Implementations MAY return the
   subset of values with valid signatures, but in that case, they SHOULD
   somehow signal to the application that a partial response was
   received.

   There is one subtle point about signature computation on arrays.  If
   the storing node uses the append feature (where the
   index=0xffffffff), then the index in the StoredData that is returned
   will not match that used by the storing node, which would break the
   signature.  In order to avoid this issue, the index value in the
   array is set to zero before the signature is computed.  This implies
   that malicious storing nodes can reorder array entries without being
   detected.

7.4.3. Stat

The Stat request is used to get metadata (length, generation counter, digest, etc.) for a stored element without retrieving the element itself. The name is from the UNIX stat(2) system call, which performs a similar function for files in a file system. It also allows the requesting node to get a list of matching elements without requesting the entire element.
7.4.3.1. Request Definition
The Stat request is identical to the Fetch request. It simply specifies the elements to get metadata about. struct { ResourceId resource; StoredDataSpecifier specifiers<0..2^16-1>; } StatReq;
Top   ToC   RFC6940 - Page 106
7.4.3.2. Response Definition
The Stat response contains the same sort of entries that a Fetch response would contain. However, instead of containing the element data, it contains metadata. struct { Boolean exists; uint32 value_length; HashAlgorithm hash_algorithm; opaque hash_value<0..255>; } MetaData; struct { uint32 index; MetaData value; } ArrayEntryMeta; struct { DictionaryKey key; MetaData value; } DictionaryEntryMeta; struct { select (DataModel) { case single_value: MetaData single_value_entry; case array: ArrayEntryMeta array_entry; case dictionary: DictionaryEntryMeta dictionary_entry; /* This structure may be extended */ }; } MetaDataValue; struct { uint32 value_length; uint64 storage_time; uint32 lifetime; MetaDataValue metadata; } StoredMetaData;
Top   ToC   RFC6940 - Page 107
        struct {
          KindId                 kind;
          uint64                 generation;
          StoredMetaData         values<0..2^32-1>;
        } StatKindResponse;

        struct {
          StatKindResponse      kind_responses<0..2^32-1>;
        } StatAns;

   The structures used in StatAns parallel those used in FetchAns: a
   response consists of multiple StatKindResponse values, one for each
   Kind that was in the request.  The contents of the StatKindResponse
   are the same as those in the FetchKindResponse, except that the
   values list contains StoredMetaData entries instead of StoredData
   entries.

   The contents of the StoredMetaData structure are the same as the
   corresponding fields in StoredData, except that there is no signature
   field and the value is a MetaDataValue rather than a StoredDataValue.

   A MetaDataValue is a variant structure, like a StoredDataValue,
   except for the types of each arm, which replace DataValue with
   MetaData.

   The only new structure is MetaData, which has the following contents:

   exists
      Same as in DataValue.

   value_length
      The length of the stored value.

   hash_algorithm
      The hash algorithm used to perform the digest of the value.

   hash_value
      A digest using hash_algorithm on the value field of the DataValue,
      including its 4 leading length bytes.

7.4.4. Find

The Find request can be used to explore the Overlay Instance. A Find request for a Resource-ID R and a Kind-ID T retrieves the Resource-ID, if any, of the resource of Kind T known to the target peer which is closest to R. This method can be used to walk the Overlay Instance by iteratively fetching R_n+1=nearest(1 + R_n).
Top   ToC   RFC6940 - Page 108
7.4.4.1. Request Definition
The FindReq message contains a Resource-ID and a series of Kind-IDs identifying the resource the peer is interested in. struct { ResourceId resource; KindId kinds<0..2^8-1>; } FindReq; The request contains a list of Kind-IDs which the Find is for, as indicated below: resource The desired Resource-ID. kinds The desired Kind-IDs. Each value MUST appear only once. Otherwise, the request MUST be rejected with an error.
7.4.4.2. Response Definition
A response to a successful Find request is a FindAns message containing the closest Resource-ID on the peer for each Kind specified in the request. struct { KindId kind; ResourceId closest; } FindKindData; struct { FindKindData results<0..2^16-1>; } FindAns; If the processing peer is not responsible for the specified Resource-ID, it SHOULD return an Error_Not_Found error code. For each Kind-ID in the request, the response MUST contain a FindKindData indicating the closest Resource-ID for that Kind-ID, unless the Kind is not allowed to be used with Find, in which case a FindKindData for that Kind-ID MUST NOT be included in the response. If a Kind-ID is not known, then the corresponding Resource-ID MUST be 0. Note that different Kind-IDs may have different closest Resource-IDs.
Top   ToC   RFC6940 - Page 109
   The response is simply a series of FindKindData elements, one per
   Kind, concatenated end to end.  The contents of each element are:

   kind
      The Kind-ID.

   closest
      The closest Resource-ID to the specified Resource-ID.  It is 0 if
      no Resource-ID is known.

   Note that the response does not contain the contents of the data
   stored at these Resource-IDs.  If the requester wants this, it must
   retrieve it using Fetch.

7.4.5. Defining New Kinds

There are two ways to define a new Kind. The first is by writing a document and registering the Kind-ID with IANA. This is the preferred method for Kinds which may be widely used and reused. The second method is to simply define the Kind and its parameters in the Configuration Document using the section of Kind-ID space set aside for private use. This method MAY be used to define ad hoc Kinds in new overlays. However a Kind is defined, the definition MUST include: o The meaning of the data to be stored (in some textual form). o The Kind-ID. o The data model (single value, array, dictionary, etc.). o The access control model. In addition, when Kinds are registered with IANA, each Kind is assigned a short string name which is used to refer to it in Configuration Documents. While each Kind needs to define what data model is used for its data, this does not mean that it must define new data models. Where practical, Kinds should use the existing data models. The intention is that the basic data model set be sufficient for most applications/ usages.
Top   ToC   RFC6940 - Page 110

8. Certificate Store Usage

The Certificate Store Usage allows a node to store its certificate in the overlay. A user/node MUST store its certificate at Resource-IDs derived from two Resource Names: o The user name in the certificate. o The Node-ID in the certificate. Note that in the second case, the certificate for a peer is not stored at its Node-ID but rather at a hash of its Node-ID. The intention here (as is common throughout RELOAD) is to avoid making a peer responsible for its own data. New certificates are stored at the end of the list. This structure allows users to store an old and a new certificate that both have the same Node-ID, which allows for migration of certificates when they are renewed. This usage defines the following Kinds: Name: CERTIFICATE_BY_NODE Data Model: The data model for CERTIFICATE_BY_NODE data is array. Access Control: NODE-MATCH Name: CERTIFICATE_BY_USER Data Model: The data model for CERTIFICATE_BY_USER data is array. Access Control: USER-MATCH

9. TURN Server Usage

The TURN Server Usage allows a RELOAD peer to advertise that it is prepared to be a TURN server, as defined in [RFC5766]. When a node starts up, it joins the overlay network and forms several connections in the process. If the ICE stage in any of these connections returns a reflexive address that is not the same as the peer's perceived address, then the peer is behind a NAT and SHOULD NOT be a candidate for a TURN server. Additionally, if the peer's IP address is in the private address space range as defined by [RFC1918], then it is also
Top   ToC   RFC6940 - Page 111
   SHOULD NOT be a candidate for a TURN server.  Otherwise, the peer
   SHOULD assume that it is a potential TURN server and follow the
   procedures below.

   If the node is a candidate for a TURN server, it will insert some
   pointers in the overlay so that other peers can find it.  The overlay
   configuration file specifies a turn-density parameter that indicates
   how many times each TURN server SHOULD record itself in the overlay.
   Typically, this should be set to the reciprocal of the estimate of
   what percentage of peers will act as TURN servers.  If the turn-
   density is not set to zero, for each value, called d, between 1 and
   turn-density, the peer forms a Resource Name by concatenating its
   Node-ID and the value d.  This Resource Name is hashed to form a
   Resource-ID.  The address of the peer is stored at that Resource-ID
   using type TURN-SERVICE and the TurnServer object:

        struct {
          uint8                   iteration;
          IpAddressPort           server_address;
        } TurnServer;

   The contents of this structure are as follows:

   iteration
      The d value.

   server_address
      The address at which the TURN server can be contacted.

   Note:  Correct functioning of this algorithm depends on having turn-
      density be a reasonable estimate of the reciprocal of the
      proportion of nodes in the overlay that can act as TURN servers.
      If the turn-density value in the configuration file is too low,
      the process of finding TURN servers becomes more expensive, as
      multiple candidate Resource-IDs must be probed to find a TURN
      server.

   Peers that provide this service need to support the TURN extensions
   to STUN for media relay, as defined in [RFC5766].

   This usage defines the following Kind to indicate that a peer is
   willing to act as a TURN server:

   Name:  TURN-SERVICE

   Data Model:  The TURN-SERVICE Kind stores a single value for each
      Resource-ID.
Top   ToC   RFC6940 - Page 112
   Access Control:  NODE-MULTIPLE, with a maximum iteration of counter
      20.

   Peers MAY find other servers by selecting a random Resource-ID and
   then doing a Find request for the appropriate Kind-ID with that
   Resource-ID.  The Find request gets routed to a random peer based on
   the Resource-ID.  If that peer knows of any servers, they will be
   returned.  The returned response may be empty if the peer does not
   know of any servers, in which case the process gets repeated with
   some other random Resource-ID.  As long as the ratio of servers
   relative to peers is not too low, this approach will result in
   finding a server relatively quickly.

   Note to implementers: The certificates used by TurnServer entries
   need to be retained, as described in Section 6.3.4.

10. Chord Algorithm

This algorithm is assigned the name CHORD-RELOAD to indicate that it is an adaptation of the basic Chord-based DHT algorithm. This algorithm differs from the Chord algorithm that was originally presented in [Chord]. It has been updated based on more recent research results and implementation experiences, and to adapt it to the RELOAD protocol. Here is a short list of differences: o The original Chord algorithm specified that a single predecessor and a successor list be stored. The CHORD-RELOAD algorithm attempts to have more than one predecessor and successor. The predecessor sets help other neighbors learn their successor list. o The original Chord specification and analysis called for iterative routing. RELOAD specifies recursive routing. In addition to the performance implications, the cost of NAT traversal dictates recursive routing. o Finger Table entries are indexed in the opposite order. Original Chord specifies finger[0] as the immediate successor of the peer. CHORD-RELOAD specifies finger[0] as the peer 180 degrees around the ring from the peer. This change was made to simplify discussion and implementation of variable-sized Finger Tables. However, with either approach, no more than O(log N) entries should typically be stored in a Finger Table. o The stabilize() and fix_fingers() algorithms in the original Chord algorithm are merged into a single periodic process. Stabilization is implemented slightly differently because of the larger neighborhood, and fix_fingers is not as aggressive to
Top   ToC   RFC6940 - Page 113
      reduce load, nor does it search for optimal matches of the Finger
      Table entries.

   o  RELOAD allows for a 128-bit hash instead of a 160-bit hash, as
      RELOAD is not designed to be used in networks with close to or
      more than 2^128 nodes or objects (and it is hard to see how one
      would assemble such a network).

   o  RELOAD uses randomized finger entries, as described in
      Section 10.7.4.2.

   o  The CHORD-RELOAD algorithm allows the use of either reactive or
      periodic recovery.  The original Chord paper used periodic
      recovery.  Reactive recovery provides better performance in small
      overlays, but is believed to be unstable in large overlays
      (greater than 1000) with high levels of churn
      [handling-churn-usenix04].  The overlay configuration file
      specifies a "chord-reactive" element that indicates whether
      reactive recovery should be used.

10.1. Overview

The algorithm described here, CHORD-RELOAD, is a modified version of the Chord algorithm. In Chord (and in the algorithm described here), nodes are arranged in a ring, with node n being adjacent to nodes n-1 and n+1 and with all arithmetic being done modulo 2^{k}, where k is the length of the Node-ID in bits, so that node 2^{k} - 1 is directly before node 0. Each peer keeps track of a Finger Table and a Neighbor Table. The Neighbor Table contains at least the three peers before and after this peer in the DHT ring. There may not be three entries in all cases, such as small rings or while the ring topology is changing. The first entry in the Finger Table contains the peer halfway around the ring from this peer, the second entry contains the peer that is 1/4th of the way around, the third entry contains the peer that is 1/8th of the way around, and so on. Fundamentally, the Chord DHT can be thought of as a doubly linked list formed by knowing the successors and predecessor peers in the Neighbor Table, sorted by the Node-ID. As long as the successor peers are correct, the DHT will return the correct result. The pointers to the prior peers are kept to enable the insertion of new peers into the list structure. Keeping multiple predecessor and successor pointers makes it possible to maintain the integrity of the data structure even when consecutive peers simultaneously fail. The Finger Table forms a skip list [wikiSkiplist] so that entries in the linked list can be found in O(log(N)) time instead of the typical O(N) time that a linked list would provide, where N represents the number of nodes in the DHT.
Top   ToC   RFC6940 - Page 114
   The Neighbor Table and Finger Table entries contain logical Node-IDs
   as values, but the actual mapping of an IP level addressing
   information to reach that Node-ID is kept in the Connection Table.

   A peer, x, is responsible for a particular Resource-ID, k, if k is
   less than or equal to x and k is greater than p, where p is the
   Node-ID of the previous peer in the Neighbor Table.  Care must be
   taken when computing to note that all math is modulo 2^128.

10.2. Hash Function

For this Chord-based Topology Plug-in, the size of the Resource-ID is 128 bits. The hash of a Resource-ID MUST be computed using SHA-1 [RFC3174], and then the SHA-1 result MUST be truncated to the most significant 128 bits.

10.3. Routing

The Routing Table is conceptually the union of the Neighbor Table and the Finger Table. If a peer is not responsible for a Resource-ID k, but is directly connected to a node with Node-ID k, then it MUST route the message to that node. Otherwise, it MUST route the request to the peer in the Routing Table that has the largest Node-ID that is in the interval between the peer and k. If no such node is found, the peer finds the smallest Node-ID that is greater than k and MUST route the message to that node.

10.4. Redundancy

When a peer receives a Store request for Resource-ID k and it is responsible for Resource-ID k, it MUST store the data and return a success response. It MUST then send a Store request to its successor in the Neighbor Table and to that peer's successor, incrementing the replica number for each successor. Note that these Store requests are addressed to those specific peers, even though the Resource-ID they are being asked to store is outside the range that they are responsible for. The peers receiving these SHOULD check that they came from an appropriate predecessor in their Neighbor Table and that they are in a range that this predecessor is responsible for. Then, they MUST store the data. They do not themselves perform further Stores, because they can determine that they are not responsible for the Resource-ID. Note that this Topology Plug-in does not use the replica number for purposes other than knowing the difference between a replica and a non-replica.
Top   ToC   RFC6940 - Page 115
   Managing replicas as the overlay changes is described in
   Section 10.7.3.

   The sequential replicas used in this overlay algorithm protect
   against peer failure but not against malicious peers.  Additional
   replication from the Usage is required to protect resources from such
   attacks, as discussed in Section 13.5.4.

10.5. Joining

The join process for a Joining Node (JN) with Node-ID n is as follows: 1. JN MUST connect to its chosen bootstrap node, as specified in Section 11.4. 2. JN SHOULD send an Attach request to the Admitting Peer (AP) for Resource-ID n+1. The "send_update" flag can be used to acquire the Routing Table of AP. 3. JN SHOULD send Attach requests to initiate connections to each of the peers in the Neighbor Table as well as to the desired peers in the Finger Table. Note that this does not populate their Routing Tables, but only their Connection Tables, so JN will not get messages that it is expected to route to other nodes. 4. JN MUST enter into its Routing Table all the peers that it has successfully contacted. 5. JN MUST send a Join to AP. The AP MUST send the response to the Join. 6. AP MUST do a series of Store requests to JN to store the data that JN will be responsible for. 7. AP MUST send JN an Update explicitly labeling JN as its predecessor. At this point, JN is part of the ring and is responsible for a section of the overlay. AP MAY now forget any data which is assigned to JN and not AP. AP SHOULD NOT forget any data where AP is the replica set for the data. 8. The AP MUST send an Update to all of its neighbors (including JN) with the new values of its neighbor set (including JN). 9. JN MUST send Updates to all of the peers in its Neighbor Table.
Top   ToC   RFC6940 - Page 116
   If JN sends an Attach to AP with send_update, it immediately knows
   most of its expected neighbors from AP's Routing Table update and MAY
   directly connect to them.  This is the RECOMMENDED procedure.

   If for some reason JN does not get AP's Routing Table, it MAY still
   populate its Neighbor Table incrementally.  It SHOULD send a Ping
   directed at Resource-ID n+1 (directly after its own Resource-ID).
   This allows JN to discover its own successor.  Call that node p0.  JN
   then SHOULD send a Ping to p0+1 to discover its successor (p1).  This
   process MAY be repeated to discover as many successors as desired.
   The values for the two peers before p will be found at a later stage,
   when n receives an Update.  An alternate procedure is to send
   Attaches to those nodes rather than Pings, which form the connections
   immediately, but may be slower if the nodes need to collect ICE
   candidates.

   In order to set up its i'th Finger Table entry, JN MUST send an
   Attach to peer n+2^(128-i).  This will be routed to a peer in
   approximately the right location around the ring.  (Note that the
   first entry in the Finger Table has i=1 and not i=0 in this
   formulation.)

   The Joining Node MUST NOT send any Update message placing itself in
   the overlay until it has successfully completed an Attach with each
   peer that should be in its Neighbor Table.

10.6. Routing Attaches

When a peer needs to Attach to a new peer in its Neighbor Table, it MUST source-route the Attach request through the peer from which it learned the new peer's Node-ID. Source-routing these requests allows the overlay to recover from instability. All other Attach requests, such as those for new Finger Table entries, are routed conventionally through the overlay.
Top   ToC   RFC6940 - Page 117

10.7. Updates

An Update for this DHT is defined as: enum { invalidChordUpdateType(0), peer_ready(1), neighbors(2), full(3), (255) } ChordUpdateType; struct { uint32 uptime; ChordUpdateType type; select (type){ case peer_ready: /* Empty */ ; case neighbors: NodeId predecessors<0..2^16-1>; NodeId successors<0..2^16-1>; case full: NodeId predecessors<0..2^16-1>; NodeId successors<0..2^16-1>; NodeId fingers<0..2^16-1>; }; } ChordUpdate; The "uptime" field contains the time this peer has been up in seconds. The "type" field contains the type of the update, which depends on the reason the update was sent. peer_ready This peer is ready to receive messages. This message is used to indicate that a node which has Attached is a peer and can be routed through. It is also used as a connectivity check to non- neighbor peers. neighbors This version is sent to members of the Chord Neighbor Table. full This version is sent to peers which request an Update with a RouteQueryReq.
Top   ToC   RFC6940 - Page 118
   If the message is of type "neighbors", then the contents of the
   message will be:

   predecessors
      The predecessor set of the Updating peer.

   successors
      The successor set of the Updating peer.

   If the message is of type "full", then the contents of the message
   will be:

   predecessors
      The predecessor set of the Updating peer.

   successors
      The successor set of the Updating peer.

   fingers
      The Finger Table of the Updating peer, in numerically ascending
      order.

   A peer MUST maintain an association (via Attach) to every member of
   its neighbor set.  A peer MUST attempt to maintain at least three
   predecessors and three successors, even though this will not be
   possible if the ring is very small.  It is RECOMMENDED that O(log(N))
   predecessors and successors be maintained in the neighbor set.  There
   are many ways to estimate N, some of which are discussed in
   [DHT-RELOAD].

10.7.1. Handling Neighbor Failures

Every time a connection to a peer in the Neighbor Table is lost (as determined by connectivity pings or the failure of some request), the peer MUST remove the entry from its Neighbor Table and replace it with the best match it has from the other peers in its Routing Table. If using reactive recovery, the peer MUST send an immediate Update to all nodes in its Neighbor Table. The update will contain all the Node-IDs of the current entries of the table (after the failed one has been removed). Note that when replacing a successor, the peer SHOULD delay the creation of new replicas for the successor replacement hold-down time (30 seconds) after removing the failed entry from its Neighbor Table in order to allow a triggered update to inform it of a better match for its Neighbor Table. If the neighbor failure affects the peer's range of responsible IDs, then the Update MUST be sent to all nodes in its Connection Table.
Top   ToC   RFC6940 - Page 119
   A peer MAY attempt to reestablish connectivity with a lost neighbor
   either by waiting additional time to see if connectivity returns or
   by actively routing a new Attach to the lost peer.  Details for these
   procedures are beyond the scope of this document.  In the case of an
   attempt to reestablish connectivity with a lost neighbor, the peer
   MUST be removed from the Neighbor Table.  Such a peer is returned to
   the Neighbor Table once connectivity is reestablished.

   If connectivity is lost to all successor peers in the Neighbor Table,
   then this peer SHOULD behave as if it is joining the network and MUST
   use Pings to find a peer and send it a Join.  If connectivity is lost
   to all the peers in the Finger Table, this peer SHOULD assume that it
   has been disconnected from the rest of the network, and it SHOULD
   periodically try to join the DHT.

10.7.2. Handling Finger Table Entry Failure

If a Finger Table entry is found to have failed (as determined by connectivity pings or the failure of some request), all references to the failed peer MUST be removed from the Finger Table and replaced with the closest preceding peer from the Finger Table or Neighbor Table. If using reactive recovery, the peer MUST initiate a search for a new Finger Table entry, as described below.

10.7.3. Receiving Updates

When a peer x receives an Update request, it examines the Node-IDs in the UpdateReq and at its Neighbor Table and decides if this UpdateReq would change its Neighbor Table. This is done by taking the set of peers currently in the Neighbor Table and comparing them to the peers in the Update request. There are two major cases: o The UpdateReq contains peers that match x's Neighbor Table, so no change is needed to the neighbor set. o The UpdateReq contains peers that x does not know about that should be in x's Neighbor Table; i.e., they are closer than entries in the Neighbor Table. In the first case, no change is needed. In the second case, x MUST attempt to Attach to the new peers, and if it is successful, it MUST adjust its neighbor set accordingly. Note that x can maintain the now inferior peers as neighbors, but it MUST remember the closer ones.
Top   ToC   RFC6940 - Page 120
   After any Pings and Attaches are done, if the Neighbor Table changes
   and the peer is using reactive recovery, the peer MUST send an Update
   request to each member of its Connection Table.  These Update
   requests are what end up filling in the predecessor/successor tables
   of peers that this peer is a neighbor to.  A peer MUST NOT enter
   itself in its successor or predecessor table and instead should leave
   the entries empty.

   If peer x is responsible for a Resource-ID R and x discovers that the
   replica set for R (the next two nodes in its successor set) has
   changed, it MUST send a Store for any data associated with R to any
   new node in the replica set.  It SHOULD NOT delete data from peers
   which have left the replica set.

   When peer x detects that it is no longer in the replica set for a
   resource R (i.e., there are three predecessors between x and R), it
   SHOULD delete all data associated with R from its local store.

   When a peer discovers that its range of responsible IDs has changed,
   it MUST send an Update to all entries in its Connection Table.

10.7.4. Stabilization

There are four components to stabilization: 1. Exchange Updates with all peers in its Neighbor Table to exchange state. 2. Search for better peers to place in its Finger Table. 3. Search to determine if the current Finger Table size is sufficiently large. 4. Search to determine if the overlay has partitioned and needs to recover.
10.7.4.1. Updating the Neighbor Table
A peer MUST periodically send an Update request to every peer in its Neighbor Table. The purpose of this is to keep the predecessor and successor lists up to date and to detect failed peers. The default time is about every ten minutes, but the configuration server SHOULD set this in the Configuration Document using the "chord-update- interval" element (denominated in seconds). A peer SHOULD randomly offset these Update requests so they do not occur all at once.
Top   ToC   RFC6940 - Page 121
10.7.4.2. Refreshing the Finger Table
A peer MUST periodically search for new peers to replace invalid entries in the Finger Table. For peer x, the i'th Finger Table entry is valid if it is in the range [ x+2^( 128-i ), x+2^( 128-(i-1) )-1 ]. Invalid entries occur in the Finger Table when a previous Finger Table entry has failed or when no peer has been found in that range. Two possible methods for searching for new peers for the Finger Table entries are presented: Alternative 1: A peer selects one entry in the Finger Table from among the invalid entries. It pings for a new peer for that Finger Table entry. The selection SHOULD be exponentially weighted to attempt to replace earlier (lower i) entries in the Finger Table. A simple way to implement this selection is to search through the Finger Table entries from i=1, and each time an invalid entry is encountered, send a Ping to replace that entry with probability 0.5. Alternative 2: A peer monitors the Update messages received from its connections to observe when an Update indicates a peer that would be used to replace an invalid Finger Table entry, i, and flags that entry in the Finger Table. Every "chord-ping-interval" seconds, the peer selects from among those flagged candidates using an exponentially weighted probability, as above. When searching for a better entry, the peer SHOULD send the Ping to a Node-ID selected randomly from that range. Random selection is preferred over a search for strictly spaced entries to minimize the effect of churn on overlay routing [minimizing-churn-sigcomm06]. An implementation or subsequent specification MAY choose a method for selecting Finger Table entries other than choosing randomly within the range. Any such alternate methods SHOULD be employed only on Finger Table stabilization and not for the selection of initial Finger Table entries unless the alternative method is faster and imposes less overhead on the overlay. A peer SHOULD NOT send Ping requests looking for new finger table entries more often than the configuration element "chord-ping- interval", which defaults to 3600 seconds (one per hour). A peer MAY choose to keep connections to multiple peers that can act for a given Finger Table entry.
Top   ToC   RFC6940 - Page 122
10.7.4.3. Adjusting Finger Table Size
If the Finger Table has fewer than 16 entries, the node SHOULD attempt to discover more fingers to grow the size of the table to 16. The value 16 was chosen to ensure high odds of a node maintaining connectivity to the overlay even with strange network partitions. For many overlays, 16 Finger Table entries will be enough, but as an overlay grows very large, more than 16 entries may be required in the Finger Table for efficient routing. An implementation SHOULD be capable of increasing the number of entries in the Finger Table to 128 entries. Although log(N) entries are all that are required for optimal performance, careful implementation of stabilization will result in no additional traffic being generated when maintaining a Finger Table larger than log(N) entries. Implementers are encouraged to make use of RouteQuery and algorithms for determining where new Finger Table entries may be found. Complete details of possible implementations are outside the scope of this specification. A simple approach to sizing the Finger Table is to ensure that the Finger Table is large enough to contain at least the final successor in the peer's Neighbor Table.
10.7.4.4. Detecting Partitioning
To detect that a partitioning has occurred and to heal the overlay, a peer P MUST periodically repeat the discovery process used in the initial join for the overlay to locate an appropriate bootstrap node, B. P SHOULD then send a Ping for its own Node-ID routed through B. If a response is received from peer S', which is not P's successor, then the overlay is partitioned and P SHOULD send an Attach to S' routed through B, followed by an Update sent to S'. (Note that S' may not be in P's Neighbor Table once the overlay is healed, but the connection will allow S' to discover appropriate neighbor entries for itself via its own stabilization.) Future specifications may describe alternative mechanisms for determining when to repeat the discovery process.
Top   ToC   RFC6940 - Page 123

10.8. Route Query f.in 3

For CHORD-RELOAD, the RouteQueryReq contains no additional information. The RouteQueryAns contains the single Node-ID of the next peer to which the responding peer would have routed the request message in recursive routing: struct { NodeId next_peer; } ChordRouteQueryAns; The contents of this structure are as follows: next_peer The peer to which the responding peer would route the message in order to deliver it to the destination listed in the request. If the requester has set the send_update flag, the responder SHOULD initiate an Update immediately after sending the RouteQueryAns.

10.9. Leaving

To support extensions, such as [DHT-RELOAD], peers SHOULD send a Leave request to all members of their Neighbor Table before exiting the Overlay Instance. The overlay_specific_data field MUST contain the ChordLeaveData structure, defined below: enum { invalidChordLeaveType(0), from_succ(1), from_pred(2), (255) } ChordLeaveType; struct { ChordLeaveType type; select (type) { case from_succ: NodeId successors<0..2^16-1>; case from_pred: NodeId predecessors<0..2^16-1>; }; } ChordLeaveData;
Top   ToC   RFC6940 - Page 124
   The "type" field indicates whether the Leave request was sent by a
   predecessor or a successor of the recipient:

   from_succ
      The Leave request was sent by a successor.

   from_pred
      The Leave request was sent by a predecessor.

   If the type of the request is "from_succ", the contents will be:

   successors
      The sender's successor list.

   If the type of the request is "from_pred", the contents will be:

   predecessors
      The sender's predecessor list.

   Any peer which receives a Leave for a peer n in its neighbor set MUST
   follow procedures as if it had detected a peer failure as described
   in Section 10.7.1.



(page 124 continued on part 6)

Next Section