RFC 5245

Interactive Connectivity Establishment (ICE): A Protocol for Network Address Translator (NAT) Traversal for Offer/Answer Protocols

Pages: 117
Obsoletes: 4091 4092
Obsoleted by: 8445
Updated by: 6336

Part 5 of 5 – Pages 107 to 117

noToC RFC5245 - Page 107 prevText

Appendix A.  Lite and Full Implementations

   ICE allows for two types of implementations.  A full implementation
   supports the controlling and controlled roles in a session, and can
   also perform address gathering.  In contrast, a lite implementation
   is a minimalist implementation that does little but respond to STUN
   checks.

   Because ICE requires both endpoints to support it in order to bring
   benefits to either endpoint, incremental deployment of ICE in a
   network is more complicated.  Many sessions involve an endpoint that
   is, by itself, not behind a NAT and not one that would worry about
   NAT traversal.  A very common case is to have one endpoint that
   requires NAT traversal (such as a VoIP hard phone or soft phone) make
   a call to one of these devices.  Even if the phone supports a full
   ICE implementation, ICE won't be used at all if the other device
   doesn't support it.  The lite implementation allows for a low-cost
   entry point for these devices.  Once they support the lite
   implementation, full implementations can connect to them and get the
   full benefits of ICE.

   Consequently, a lite implementation is only appropriate for devices
   that will *always* be connected to the public Internet and have a
   public IP address at which it can receive packets from any
   correspondent.  ICE will not function when a lite implementation is
   placed behind a NAT.

   ICE allows a lite implementation to have a single IPv4 host candidate
   and several IPv6 addresses.  In that case, candidate pairs are
   selected by the controlling agent using a static algorithm, such as
   the one in RFC 3484, which is recommended by this specification.
   However, static mechanisms for address selection are always prone to
   error, since they cannot ever reflect the actual topology and can
   never provide actual guarantees on connectivity.  They are always
   heuristics.  Consequently, if an agent is implementing ICE just to
   select between its IPv4 and IPv6 addresses, and none of its IP
   addresses are behind NAT, usage of full ICE is still RECOMMENDED in
   order to provide the most robust form of address selection possible.

   It is important to note that the lite implementation was added to
   this specification to provide a stepping stone to full
   implementation.  Even for devices that are always connected to the
   public Internet with just a single IPv4 address, a full
   implementation is preferable if achievable.  A full implementation
   will reduce call setup times, since ICE's aggressive mode can be
   used.  Full implementations also obtain the security benefits of ICE
   unrelated to NAT traversal; in particular, the voice hammer attack
   described in Section 18 is prevented only for full implementations,

noToC RFC5245 - Page 108

   not lite.  Finally, it is often the case that a device that finds
   itself with a public address today will be placed in a network
   tomorrow where it will be behind a NAT.  It is difficult to
   definitively know, over the lifetime of a device or product, that it
   will always be used on the public Internet.  Full implementation
   provides assurance that communications will always work.

Appendix B.  Design Motivations

   ICE contains a number of normative behaviors that may themselves be
   simple, but derive from complicated or non-obvious thinking or use
   cases that merit further discussion.  Since these design motivations
   are not neccesary to understand for purposes of implementation, they
   are discussed here in an appendix to the specification.  This section
   is non-normative.

B.1.  Pacing of STUN Transactions

   STUN transactions used to gather candidates and to verify
   connectivity are paced out at an approximate rate of one new
   transaction every Ta milliseconds.  Each transaction, in turn, has a
   retransmission timer RTO that is a function of Ta as well.  Why are
   these transactions paced, and why are these formulas used?

   Sending of these STUN requests will often have the effect of creating
   bindings on NAT devices between the client and the STUN servers.
   Experience has shown that many NAT devices have upper limits on the
   rate at which they will create new bindings.  Experiments have shown
   that once every 20 ms is well supported, but not much lower than
   that.  This is why Ta has a lower bound of 20 ms.  Furthermore,
   transmission of these packets on the network makes use of bandwidth
   and needs to be rate limited by the agent.  Deployments based on
   earlier draft versions of this document tended to overload rate-
   constrained access links and perform poorly overall, in addition to
   negatively impacting the network.  As a consequence, the pacing
   ensures that the NAT device does not get overloaded and that traffic
   is kept at a reasonable rate.

   The definition of a "reasonable" rate is that STUN should not use
   more bandwidth than the RTP itself will use, once media starts
   flowing.  The formula for Ta is designed so that, if a STUN packet
   were sent every Ta seconds, it would consume the same amount of
   bandwidth as RTP packets, summed across all media streams.  Of
   course, STUN has retransmits, and the desire is to pace those as
   well.  For this reason, RTO is set such that the first retransmit on
   the first transaction happens just as the first STUN request on the
   last transaction occurs.  Pictorially:

noToC RFC5245 - Page 109

              First Packets              Retransmits



                    |                        |
                    |                        |
             -------+------           -------+------
            /               \        /               \
           /                 \      /                 \

           +--+    +--+    +--+    +--+    +--+    +--+
           |A1|    |B1|    |C1|    |A2|    |B2|    |C2|
           +--+    +--+    +--+    +--+    +--+    +--+

        ---+-------+-------+-------+-------+-------+------------ Time
           0       Ta      2Ta     3Ta     4Ta     5Ta

   In this picture, there are three transactions that will be sent (for
   example, in the case of candidate gathering, there are three host
   candidate/STUN server pairs).  These are transactions A, B, and C.
   The retransmit timer is set so that the first retransmission on the
   first transaction (packet A2) is sent at time 3Ta.

   Subsequent retransmits after the first will occur even less
   frequently than Ta milliseconds apart, since STUN uses an exponential
   back-off on its retransmissions.

B.2.  Candidates with Multiple Bases

   Section 4.1.3 talks about eliminating candidates that have the same
   transport address and base.  However, candidates with the same
   transport addresses but different bases are not redundant.  When can
   an agent have two candidates that have the same IP address and port,
   but different bases?  Consider the topology of Figure 10:

noToC RFC5245 - Page 110

          +----------+
          | STUN Srvr|
          +----------+
               |
               |
             -----
           //     \\
          |         |
         |  B:net10  |
          |         |
           \\     //
             -----
               |
               |
          +----------+
          |   NAT    |
          +----------+
               |
               |
             -----
           //     \\
          |    A    |
         |192.168/16 |
          |         |
           \\     //
             -----
               |
               |
               |192.168.1.100      -----
          +----------+           //     \\             +----------+
          |          |          |         |            |          |
          | Offerer  |---------|  C:net10  |-----------| Answerer |
          |          |10.0.1.100|         | 10.0.1.101 |          |
          +----------+           \\     //             +----------+
                                   -----

           Figure 10: Identical Candidates with Different Bases

   In this case, the offerer is multihomed.  It has one IP address,
   10.0.1.100, on network C, which is a net 10 private network.  The
   answerer is on this same network.  The offerer is also connected to
   network A, which is 192.168/16.  The offerer has an IP address of
   192.168.1.100 on this network.  There is a NAT on this network,
   natting into network B, which is another net 10 private network, but
   not connected to network C.  There is a STUN server on network B.

   The offerer obtains a host candidate on its IP address on network C
   (10.0.1.100:2498) and a host candidate on its IP address on network A

noToC RFC5245 - Page 111

   (192.168.1.100:3344).  It performs a STUN query to its configured
   STUN server from 192.168.1.100:3344.  This query passes through the
   NAT, which happens to assign the binding 10.0.1.100:2498.  The STUN
   server reflects this in the STUN Binding response.  Now, the offerer
   has obtained a server reflexive candidate with a transport address
   that is identical to a host candidate (10.0.1.100:2498).  However,
   the server reflexive candidate has a base of 192.168.1.100:3344, and
   the host candidate has a base of 10.0.1.100:2498.

B.3.  Purpose of the <rel-addr> and <rel-port> Attributes

   The candidate attribute contains two values that are not used at all
   by ICE itself -- <rel-addr> and <rel-port>.  Why is it present?

   There are two motivations for its inclusion.  The first is
   diagnostic.  It is very useful to know the relationship between the
   different types of candidates.  By including it, an agent can know
   which relayed candidate is associated with which reflexive candidate,
   which in turn is associated with a specific host candidate.  When
   checks for one candidate succeed and not for others, this provides
   useful diagnostics on what is going on in the network.

   The second reason has to do with off-path Quality of Service (QoS)
   mechanisms.  When ICE is used in environments such as PacketCable
   2.0, proxies will, in addition to performing normal SIP operations,
   inspect the SDP in SIP messages, and extract the IP address and port
   for media traffic.  They can then interact, through policy servers,
   with access routers in the network, to establish guaranteed QoS for
   the media flows.  This QoS is provided by classifying the RTP traffic
   based on 5-tuple, and then providing it a guaranteed rate, or marking
   its Diffserv codepoints appropriately.  When a residential NAT is
   present, and a relayed candidate gets selected for media, this
   relayed candidate will be a transport address on an actual TURN
   server.  That address says nothing about the actual transport address
   in the access router that would be used to classify packets for QoS
   treatment.  Rather, the server reflexive candidate towards the TURN
   server is needed.  By carrying the translation in the SDP, the proxy
   can use that transport address to request QoS from the access router.

B.4.  Importance of the STUN Username

   ICE requires the usage of message integrity with STUN using its
   short-term credential functionality.  The actual short-term
   credential is formed by exchanging username fragments in the SDP
   offer/answer exchange.  The need for this mechanism goes beyond just
   security; it is actually required for correct operation of ICE in the
   first place.

noToC RFC5245 - Page 112

   Consider agents L, R, and Z.  L and R are within private enterprise
   1, which is using 10.0.0.0/8.  Z is within private enterprise 2,
   which is also using 10.0.0.0/8.  As it turns out, R and Z both have
   IP address 10.0.1.1.  L sends an offer to Z.  Z, in its answer,
   provides L with its host candidates.  In this case, those candidates
   are 10.0.1.1:8866 and 10.0.1.1:8877.  As it turns out, R is in a
   session at that same time, and is also using 10.0.1.1:8866 and
   10.0.1.1:8877 as host candidates.  This means that R is prepared to
   accept STUN messages on those ports, just as Z is.  L will send a
   STUN request to 10.0.1.1:8866 and another to 10.0.1.1:8877.  However,
   these do not go to Z as expected.  Instead, they go to R!  If R just
   replied to them, L would believe it has connectivity to Z, when in
   fact it has connectivity to a completely different user, R.  To fix
   this, the STUN short-term credential mechanisms are used.  The
   username fragments are sufficiently random that it is highly unlikely
   that R would be using the same values as Z.  Consequently, R would
   reject the STUN request since the credentials were invalid.  In
   essence, the STUN username fragments provide a form of transient host
   identifiers, bound to a particular offer/answer session.

   An unfortunate consequence of the non-uniqueness of IP addresses is
   that, in the above example, R might not even be an ICE agent.  It
   could be any host, and the port to which the STUN packet is directed
   could be any ephemeral port on that host.  If there is an application
   listening on this socket for packets, and it is not prepared to
   handle malformed packets for whatever protocol is in use, the
   operation of that application could be affected.  Fortunately, since
   the ports exchanged in SDP are ephemeral and usually drawn from the
   dynamic or registered range, the odds are good that the port is not
   used to run a server on host R, but rather is the agent side of some
   protocol.  This decreases the probability of hitting an allocated
   port, due to the transient nature of port usage in this range.
   However, the possibility of a problem does exist, and network
   deployers should be prepared for it.  Note that this is not a problem
   specific to ICE; stray packets can arrive at a port at any time for
   any type of protocol, especially ones on the public Internet.  As
   such, this requirement is just restating a general design guideline
   for Internet applications -- be prepared for unknown packets on any
   port.

noToC RFC5245 - Page 113

B.5.  The Candidate Pair Priority Formula

   The priority for a candidate pair has an odd form.  It is:

      pair priority = 2^32*MIN(G,D) + 2*MAX(G,D) + (G>D?1:0)

   Why is this?  When the candidate pairs are sorted based on this
   value, the resulting sorting has the MAX/MIN property.  This means
   that the pairs are first sorted based on decreasing value of the
   minimum of the two priorities.  For pairs that have the same value of
   the minimum priority, the maximum priority is used to sort amongst
   them.  If the max and the min priorities are the same, the
   controlling agent's priority is used as the tie-breaker in the last
   part of the expression.  The factor of 2*32 is used since the
   priority of a single candidate is always less than 2*32, resulting in
   the pair priority being a "concatenation" of the two component
   priorities.  This creates the MAX/MIN sorting.  MAX/MIN ensures that,
   for a particular agent, a lower-priority candidate is never used
   until all higher-priority candidates have been tried.

B.6.  The remote-candidates Attribute

   The a=remote-candidates attribute exists to eliminate a race
   condition between the updated offer and the response to the STUN
   Binding request that moved a candidate into the Valid list.  This
   race condition is shown in Figure 11.  On receipt of message 4, agent
   L adds a candidate pair to the valid list.  If there was only a
   single media stream with a single component, agent L could now send
   an updated offer.  However, the check from agent R has not yet
   generated a response, and agent R receives the updated offer (message
   7) before getting the response (message 9).  Thus, it does not yet
   know that this particular pair is valid.  To eliminate this
   condition, the actual candidates at R that were selected by the
   offerer (the remote candidates) are included in the offer itself, and
   the answerer delays its answer until those pairs validate.

noToC RFC5245 - Page 114

          Agent A               Network               Agent B
             |(1) Offer            |                     |
             |------------------------------------------>|
             |(2) Answer           |                     |
             |<------------------------------------------|
             |(3) STUN Req.        |                     |
             |------------------------------------------>|
             |(4) STUN Res.        |                     |
             |<------------------------------------------|
             |(5) STUN Req.        |                     |
             |<------------------------------------------|
             |(6) STUN Res.        |                     |
             |-------------------->|                     |
             |                     |Lost                 |
             |(7) Offer            |                     |
             |------------------------------------------>|
             |(8) STUN Req.        |                     |
             |<------------------------------------------|
             |(9) STUN Res.        |                     |
             |------------------------------------------>|
             |(10) Answer          |                     |
             |<------------------------------------------|

                      Figure 11: Race Condition Flow

B.7.  Why Are Keepalives Needed?

   Once media begins flowing on a candidate pair, it is still necessary
   to keep the bindings alive at intermediate NATs for the duration of
   the session.  Normally, the media stream packets themselves (e.g.,
   RTP) meet this objective.  However, several cases merit further
   discussion.  Firstly, in some RTP usages, such as SIP, the media
   streams can be "put on hold".  This is accomplished by using the SDP
   "sendonly" or "inactive" attributes, as defined in RFC 3264
   [RFC3264].  RFC 3264 directs implementations to cease transmission of
   media in these cases.  However, doing so may cause NAT bindings to
   timeout, and media won't be able to come off hold.

   Secondly, some RTP payload formats, such as the payload format for
   text conversation [RFC4103], may send packets so infrequently that
   the interval exceeds the NAT binding timeouts.

   Thirdly, if silence suppression is in use, long periods of silence
   may cause media transmission to cease sufficiently long for NAT
   bindings to time out.

noToC RFC5245 - Page 115

   For these reasons, the media packets themselves cannot be relied
   upon.  ICE defines a simple periodic keepalive utilizing STUN Binding
   indications.  This makes its bandwidth requirements highly
   predictable, and thus amenable to QoS reservations.

B.8.  Why Prefer Peer Reflexive Candidates?

   Section 4.1.2 describes procedures for computing the priority of
   candidate based on its type and local preferences.  That section
   requires that the type preference for peer reflexive candidates
   always be higher than server reflexive.  Why is that?  The reason has
   to do with the security considerations in Section 18.  It is much
   easier for an attacker to cause an agent to use a false server
   reflexive candidate than it is for an attacker to cause an agent to
   use a false peer reflexive candidate.  Consequently, attacks against
   address gathering with Binding requests are thwarted by ICE by
   preferring the peer reflexive candidates.

B.9.  Why Send an Updated Offer?

   Section 11.1 describes rules for sending media.  Both agents can send
   media once ICE checks complete, without waiting for an updated offer.
   Indeed, the only purpose of the updated offer is to "correct" the SDP
   so that the default destination for media matches where media is
   being sent based on ICE procedures (which will be the highest-
   priority nominated candidate pair).

   This begs the question -- why is the updated offer/answer exchange
   needed at all?  Indeed, in a pure offer/answer environment, it would
   not be.  The offerer and answerer will agree on the candidates to use
   through ICE, and then can begin using them.  As far as the agents
   themselves are concerned, the updated offer/answer provides no new
   information.  However, in practice, numerous components along the
   signaling path look at the SDP information.  These include entities
   performing off-path QoS reservations, NAT traversal components such
   as ALGs and Session Border Controllers (SBCs), and diagnostic tools
   that passively monitor the network.  For these tools to continue to
   function without change, the core property of SDP -- that the
   existing, pre-ICE definitions of the addresses used for media -- the
   m and c lines and the rtcp attribute -- must be retained.  For this
   reason, an updated offer must be sent.

B.10.  Why Are Binding Indications Used for Keepalives?

   Media keepalives are described in Section 10.  These keepalives make
   use of STUN when both endpoints are ICE capable.  However, rather
   than using a Binding request transaction (which generates a
   response), the keepalives use an Indication.  Why is that?

noToC RFC5245 - Page 116

   The primary reason has to do with network QoS mechanisms.  Once media
   begins flowing, network elements will assume that the media stream
   has a fairly regular structure, making use of periodic packets at
   fixed intervals, with the possibility of jitter.  If an agent is
   sending media packets, and then receives a Binding request, it would
   need to generate a response packet along with its media packets.
   This will increase the actual bandwidth requirements for the 5-tuple
   carrying the media packets, and introduce jitter in the delivery of
   those packets.  Analysis has shown that this is a concern in certain
   layer 2 access networks that use fairly tight packet schedulers for
   media.

   Additionally, using a Binding Indication allows integrity to be
   disabled, allowing for better performance.  This is useful for large-
   scale endpoints, such as PSTN gateways and SBCs.

B.11.  Why Is the Conflict Resolution Mechanism Needed?

   When ICE runs between two peers, one agent acts as controlled, and
   the other as controlling.  Rules are defined as a function of
   implementation type and offerer/answerer to determine who is
   controlling and who is controlled.  However, the specification
   mentions that, in some cases, both sides might believe they are
   controlling, or both sides might believe they are controlled.  How
   can this happen?

   The condition when both agents believe they are controlled shows up
   in third party call control cases.  Consider the following flow:

             A         Controller          B
             |(1) INV()     |              |
             |<-------------|              |
             |(2) 200(SDP1) |              |
             |------------->|              |
             |              |(3) INV()     |
             |              |------------->|
             |              |(4) 200(SDP2) |
             |              |<-------------|
             |(5) ACK(SDP2) |              |
             |<-------------|              |
             |              |(6) ACK(SDP1) |
             |              |------------->|

                       Figure 12: Role Conflict Flow

   This flow is a variation on flow III of RFC 3725 [RFC3725].  In fact,
   it works better than flow III since it produces fewer messages.  In
   this flow, the controller sends an offerless INVITE to agent A, which

noToC RFC5245 - Page 117

   responds with its offer, SDP1.  The agent then sends an offerless
   INVITE to agent B, which it responds to with its offer, SDP2.  The
   controller then uses the offer from each agent to generate the
   answers.  When this flow is used, ICE will run between agents A and
   B, but both will believe they are in the controlling role.  With the
   role conflict resolution procedures, this flow will function properly
   when ICE is used.

   At this time, there are no documented flows that can result in the
   case where both agents believe they are controlled.  However, the
   conflict resolution procedures allow for this case, should a flow
   arise that would fit into this category.

Author's Address

   Jonathan Rosenberg
   jdrosen.net
   Monmouth, NJ
   US

   Email: jdrosen@jdrosen.net
   URI:   http://www.jdrosen.net