This section details the basic operations that must be present to implement JSEP functionality. The actual API exposed in the W3C API may have somewhat different syntax but should map easily to these concepts.
The PeerConnection constructor allows the application to specify global parameters for the media session, such as the STUN/TURN servers and credentials to use when gathering candidates, as well as the initial ICE candidate policy and pool size, and also the bundle policy to use.
If an ICE candidate policy is specified, it functions as described in
Section 3.5.3, causing the JSEP implementation to only surface the permitted candidates (including any implementation-internal filtering) to the application and only use those candidates for connectivity checks. The set of available policies is as follows:
-
all:
-
All candidates permitted by implementation policy will be gathered and used.
-
relay:
-
All candidates except relay candidates will be filtered out. This obfuscates the location information that might be ascertained by the remote peer from the received candidates. Depending on how the application deploys and chooses relay servers, this could obfuscate location to a metro or possibly even global level.
The default ICE candidate policy
MUST be set to "all", as this is generally the desired policy and also typically reduces the use of application TURN server resources significantly.
If a size is specified for the ICE candidate pool, this indicates the number of ICE components to pre-gather candidates for. Because pre-gathering results in utilizing STUN/TURN server resources for potentially long periods of time, this
MUST only occur upon application request, and therefore the default candidate pool size
MUST be zero.
The application can specify its preferred policy regarding use of bundle, the multiplexing mechanism defined in [
RFC 8843]. Regardless of policy, the application will always try to negotiate bundle onto a single transport and will offer a single bundle group across all "m=" sections; use of this single transport is contingent upon the answerer accepting bundle. However, by specifying a policy from the list below, the application can control exactly how aggressively it will try to bundle media streams together, which affects how it will interoperate with a non-bundle-aware endpoint. When negotiating with a non-bundle-aware endpoint, only the streams not marked as bundle-only streams will be established.
The set of available policies is as follows:
-
balanced:
-
The first "m=" section of each type (audio, video, or application) will contain transport parameters, which will allow an answerer to unbundle that section. The second and any subsequent "m=" sections of each type will be marked bundle-only. The result is that if there are N distinct media types, then candidates will be gathered for N media streams. This policy balances desire to multiplex with the need to ensure that basic audio and video can still be negotiated in legacy cases. When acting as answerer, if there is no bundle group in the offer, the implementation will reject all but the first "m=" section of each type.
-
max-compat:
-
All "m=" sections will contain transport parameters; none will be marked as bundle-only. This policy will allow all streams to be received by non-bundle-aware endpoints but will require separate candidates to be gathered for each media stream.
-
max-bundle:
-
Only the first "m=" section will contain transport parameters; all streams other than the first will be marked as bundle-only. This policy aims to minimize candidate gathering and maximize multiplexing, at the cost of less compatibility with legacy endpoints. When acting as answerer, the implementation will reject any "m=" sections other than the first "m=" section, unless they are in the same bundle group as that "m=" section.
As it provides the best trade-off between performance and compatibility with legacy endpoints, the default bundle policy
MUST be set to "balanced".
The application can specify its preferred policy regarding use of RTP/RTCP multiplexing [
RFC 5761] using one of the following policies:
-
negotiate:
-
The JSEP implementation will gather both RTP and RTCP candidates but also will offer "a=rtcp-mux", thus allowing for compatibility with either multiplexing or non-multiplexing endpoints.
-
require:
-
The JSEP implementation will only gather RTP candidates and will insert an "a=rtcp-mux-only" indication into any new "m=" sections in offers it generates. This halves the number of candidates that the offerer needs to gather. Applying a description with an "m=" section that does not contain an "a=rtcp-mux" attribute will cause an error to be returned.
The default multiplexing policy
MUST be set to "require". Implementations
MAY choose to reject attempts by the application to set the multiplexing policy to "negotiate".
The addTrack method adds a MediaStreamTrack to the PeerConnection, using the MediaStream argument to associate the track with other tracks in the same MediaStream, so that they can be added to the same "LS" (Lip Synchronization) group when creating an offer or answer. Adding tracks to the same "LS" group indicates that the playback of these tracks should be synchronized for proper lip sync, as described in
RFC 5888,
Section 7. addTrack attempts to minimize the number of transceivers as follows: if the PeerConnection is in the "have-remote-offer" state, the track will be attached to the first compatible transceiver that was created by the most recent call to setRemoteDescription and does not have a local track. Otherwise, a new transceiver will be created, as described in
Section 4.1.4.
The removeTrack method removes a MediaStreamTrack from the PeerConnection, using the RtpSender argument to indicate which sender should have its track removed. The sender's track is cleared, and the sender stops sending. Future calls to createOffer will mark the "m=" section associated with the sender as recvonly (if transceiver.direction is sendrecv) or as inactive (if transceiver.direction is sendonly).
The addTransceiver method adds a new RtpTransceiver to the PeerConnection. If a MediaStreamTrack argument is provided, then the transceiver will be configured with that media type and the track will be attached to the transceiver. Otherwise, the application
MUST explicitly specify the type; this mode is useful for creating recvonly transceivers as well as for creating transceivers to which a track can be attached at some later point.
At the time of creation, the application can also specify a transceiver direction attribute, a set of MediaStreams that the transceiver is associated with (allowing "LS" group assignments), and a set of encodings for the media (used for simulcast as described in
Section 3.7).
The onaddtrack event is dispatched to the application when a new remote track has been signaled as a result of a setRemoteDescription call. The new track is supplied as a MediaStreamTrack object in the event, along with the MediaStream(s) the track is part of.
The createDataChannel method creates a new data channel and attaches it to the PeerConnection. If no data channel currently exists for this PeerConnection, then a new offer/answer exchange is required. All data channels on a given PeerConnection share the same SCTP/DTLS association ("SCTP" stands for "Stream Control Transmission Protocol") and therefore the same "m=" section, so subsequent creation of data channels does not have any impact on the JSEP state.
The createDataChannel method also includes a number of arguments that are used by the PeerConnection (e.g., maxPacketLifetime) but are not reflected in the SDP and do not affect the JSEP state.
The ondatachannel event is dispatched to the application when a new data channel has been negotiated by the remote side, which can occur at any time after the underlying SCTP/DTLS association has been established. The new data channel object is supplied in the event.
The createOffer method generates a blob of SDP that contains an offer per [
RFC 3264] with the supported configurations for the session, including descriptions of the media added to this PeerConnection, the codec, RTP, and RTCP options supported by this implementation, and any candidates that have been gathered by the ICE agent. An options parameter may be supplied to provide additional control over the generated offer. This options parameter allows an application to trigger an ICE restart, for the purpose of reestablishing connectivity.
In the initial offer, the generated SDP will contain all desired functionality for the session (functionality that is supported but not desired by default may be omitted); for each SDP line, the generation of the SDP will follow the process defined for generating an initial offer from the specification that defines the given SDP line. The exact handling of initial offer generation is detailed in
Section 5.2.1 below.
In the event createOffer is called after the session is established, createOffer will generate an offer to modify the current session based on any changes that have been made to the session, e.g., adding or stopping RtpTransceivers, or requesting an ICE restart. For each existing stream, the generation of each SDP line
MUST follow the process defined for generating an updated offer from the RFC that specifies the given SDP line. For each new stream, the generation of the SDP
MUST follow the process of generating an initial offer, as mentioned above. If no changes have been made, or for SDP lines that are unaffected by the requested changes, the offer will only contain the parameters negotiated by the last offer/answer exchange. The exact handling of subsequent offer generation is detailed in
Section 5.2.2 below.
Session descriptions generated by createOffer
MUST be immediately usable by setLocalDescription; if a system has limited resources (e.g., a finite number of decoders), createOffer
SHOULD return an offer that reflects the current state of the system, so that setLocalDescription will succeed when it attempts to acquire those resources.
Calling this method may do things such as generating new ICE credentials, but it does not change the PeerConnection state, trigger candidate gathering, or cause media to start or stop flowing. Specifically, the offer is not applied, and does not become the pending local description, until setLocalDescription is called.
The createAnswer method generates a blob of SDP that contains an SDP answer per [
RFC 3264] with the supported configuration for the session that is compatible with the parameters supplied in the most recent call to setRemoteDescription; setRemoteDescription
MUST have been called prior to calling createAnswer. Like createOffer, the returned blob contains descriptions of the media added to this PeerConnection, the codec/RTP/RTCP options negotiated for this session, and any candidates that have been gathered by the ICE agent. An options parameter may be supplied to provide additional control over the generated answer.
As an answer, the generated SDP will contain a specific configuration that specifies how the media plane should be established; for each SDP line, the generation of the SDP
MUST follow the process defined for generating an answer from the specification that defines the given SDP line. The exact handling of answer generation is detailed in
Section 5.3 below.
Session descriptions generated by createAnswer
MUST be immediately usable by setLocalDescription; like createOffer, the returned description
SHOULD reflect the current state of the system.
Calling this method may do things such as generating new ICE credentials, but it does not change the PeerConnection state, trigger candidate gathering, or cause a media state change. Specifically, the answer is not applied, and does not become the current local description, until setLocalDescription is called.
Session description objects (RTCSessionDescription) may be of type "offer", "pranswer", "answer", or "rollback". These types provide information as to how the description parameter should be parsed and how the media state should be changed.
"offer" indicates that a description
MUST be parsed as an offer; said description may include many possible media configurations. A description used as an "offer" may be applied any time the PeerConnection is in a "stable" state or applied as an update to a previously supplied but unanswered "offer".
"pranswer" indicates that a description
MUST be parsed as an answer, but not a final answer, and so
MUST NOT result in the freeing of allocated resources. It may result in the start of media transmission, if the answer does not specify an inactive media direction. A description used as a "pranswer" may be applied as a response to an "offer" or as an update to a previously sent "pranswer".
"answer" indicates that a description
MUST be parsed as an answer, the offer/answer exchange
MUST be considered complete, and any resources (decoders, candidates) that are no longer needed
SHOULD be released. A description used as an "answer" may be applied as a response to an "offer" or as an update to a previously sent "pranswer".
The only difference between a provisional and final answer is that the final answer results in the freeing of any unused resources that were allocated as a result of the offer. As such, the application can use some discretion on whether an answer should be applied as provisional or final and can change the type of the session description as needed. For example, in a serial forking scenario, an application may receive multiple "final" answers, one from each remote endpoint. The application could choose to accept the initial answers as provisional answers and only apply an answer as final when it receives one that meets its criteria (e.g., a live user instead of voicemail).
"rollback" is a special session description type indicating that the state machine
MUST be rolled back to the previous "stable" state, as described in
Section 4.1.10.2. The contents
MUST be empty.
Most applications will not need to create answers using the "pranswer" type. While it is good practice to send an immediate response to an offer, in order to warm up the session transport and prevent media clipping, the preferred handling for a JSEP application is to create and send a "sendonly" final answer with a null MediaStreamTrack immediately after receiving the offer, which will prevent media from being sent by the caller and allow media to be sent immediately upon answer by the callee. Later, when the callee actually accepts the call, the application can plug in the real MediaStreamTrack and create a new "sendrecv" offer to update the previous offer/answer pair and start bidirectional media flow. While this could also be done with a "sendonly" pranswer followed by a "sendrecv" answer, the initial pranswer leaves the offer/answer exchange open, which means that the caller cannot send an updated offer during this time.
As an example, consider a typical JSEP application that wants to set up audio and video as quickly as possible. When the callee receives an offer with audio and video MediaStreamTracks, it will send an immediate answer accepting these tracks as sendonly (meaning that the caller will not send the callee any media yet, and because the callee has not yet added its own MediaStreamTracks, the callee will not send any media either). It will then ask the user to accept the call and acquire the needed local tracks. Upon acceptance by the user, the application will plug in the tracks it has acquired, which, because ICE handshaking and DTLS handshaking have likely completed by this point, can start transmitting immediately. The application will also send a new offer to the remote side indicating call acceptance and moving the audio and video to be two-way media. A detailed example flow along these lines is shown in
Section 7.3.
Of course, some applications may not be able to perform this double offer/answer exchange, particularly ones that are attempting to gateway to legacy signaling protocols. In these cases, pranswer can still provide the application with a mechanism to warm up the transport.
In certain situations, it may be desirable to "undo" a change made to setLocalDescription or setRemoteDescription. Consider a case where a call is ongoing and one side wants to change some of the session parameters; that side generates an updated offer and then calls setLocalDescription. However, the remote side, either before or after setRemoteDescription, decides it does not want to accept the new parameters and sends a reject message back to the offerer. Now, the offerer, and possibly the answerer as well, needs to return to a "stable" state and the previous local/remote description. To support this, we introduce the concept of "rollback", which discards any proposed changes to the session, returning the state machine to the "stable" state. A rollback is performed by supplying a session description of type "rollback" with empty contents to either setLocalDescription or setRemoteDescription.
The setLocalDescription method instructs the PeerConnection to apply the supplied session description as its local configuration. The type field indicates whether the description should be processed as an offer, provisional answer, final answer, or rollback; offers and answers are checked differently, using the various rules that exist for each SDP line.
This API changes the local media state; among other things, it sets up local resources for receiving and decoding media. In order to successfully handle scenarios where the application wants to offer to change from one media format to a different, incompatible format, the PeerConnection
MUST be able to simultaneously support use of both the current and pending local descriptions (e.g., support the codecs that exist in either description). This dual processing begins when the PeerConnection enters the "have-local-offer" state, and it continues until setRemoteDescription is called with either (1) a final answer, at which point the PeerConnection can fully adopt the pending local description or (2) a rollback, which results in a revert to the current local description.
This API indirectly controls the candidate gathering process. When a local description is supplied and the number of transports currently in use does not match the number of transports needed by the local description, the PeerConnection will create transports as needed and begin gathering candidates for each transport, using ones from the candidate pool if available.
If (1) setRemoteDescription was previously called with an offer, (2) setLocalDescription is called with an answer (provisional or final), (3) the media directions are compatible, and (4) media is available to send, this will result in the starting of media transmission.
The setRemoteDescription method instructs the PeerConnection to apply the supplied session description as the desired remote configuration. As in setLocalDescription, the type field of the description indicates how it should be processed.
This API changes the local media state; among other things, it sets up local resources for sending and encoding media.
If (1) setLocalDescription was previously called with an offer, (2) setRemoteDescription is called with an answer (provisional or final), (3) the media directions are compatible, and (4) media is available to send, this will result in the starting of media transmission.
The currentLocalDescription method returns the current negotiated local description -- i.e., the local description from the last successful offer/answer exchange -- in addition to any local candidates that have been generated by the ICE agent since the local description was set.
A null object will be returned if an offer/answer exchange has not yet been completed.
The pendingLocalDescription method returns a copy of the local description currently in negotiation -- i.e., a local offer set without any corresponding remote answer -- in addition to any local candidates that have been generated by the ICE agent since the local description was set.
A null object will be returned if the state of the PeerConnection is "stable" or "have-remote-offer".
The currentRemoteDescription method returns a copy of the current negotiated remote description -- i.e., the remote description from the last successful offer/answer exchange -- in addition to any remote candidates that have been supplied via processIceMessage since the remote description was set.
A null object will be returned if an offer/answer exchange has not yet been completed.
The pendingRemoteDescription method returns a copy of the remote description currently in negotiation -- i.e., a remote offer set without any corresponding local answer -- in addition to any remote candidates that have been supplied via processIceMessage since the remote description was set.
A null object will be returned if the state of the PeerConnection is "stable" or "have-local-offer".
The canTrickleIceCandidates property indicates whether the remote side supports receiving trickled candidates. There are three potential values:
-
null:
-
No SDP has been received from the other side, so it is not known if it can handle trickle. This is the initial value before setRemoteDescription is called.
-
true:
-
SDP has been received from the other side indicating that it can support trickle.
-
false:
-
SDP has been received from the other side indicating that it cannot support trickle.
As described in
Section 3.5.2, JSEP implementations always provide candidates to the application individually, consistent with what is needed for Trickle ICE. However, applications can use the canTrickleIceCandidates property to determine whether their peer can actually do Trickle ICE, i.e., whether it is safe to send an initial offer or answer followed later by candidates as they are gathered. As "true" is the only value that definitively indicates remote Trickle ICE support, an application that compares canTrickleIceCandidates against "true" will by default attempt Half Trickle on initial offers and Full Trickle on subsequent interactions with a Trickle ICE-compatible agent.
The setConfiguration method allows the global configuration of the PeerConnection, which was initially set by constructor parameters, to be changed during the session. The effects of calling this method depend on when it is invoked, and they will differ, depending on which specific parameters are changed:
-
Any changes to the STUN/TURN servers to use affect the next gathering phase. If an ICE gathering phase has already started or completed, the 'needs-ice-restart' bit mentioned in Section 3.5.1 will be set. This will cause the next call to createOffer to generate new ICE credentials, for the purpose of forcing an ICE restart and kicking off a new gathering phase, in which the new servers will be used. If the ICE candidate pool has a nonzero size and a local description has not yet been applied, any existing candidates will be discarded, and new candidates will be gathered from the new servers.
-
Any change to the ICE candidate policy affects the next gathering phase. If an ICE gathering phase has already started or completed, the 'needs-ice-restart' bit will be set. Either way, changes to the policy have no effect on the candidate pool, because pooled candidates are not made available to the application until a gathering phase occurs, and so any necessary filtering can still be done on any pooled candidates.
-
The ICE candidate pool size MUST NOT be changed after applying a local description. If a local description has not yet been applied, any changes to the ICE candidate pool size take effect immediately; if increased, additional candidates are pre-gathered; if decreased, the now-superfluous candidates are discarded.
-
The bundle and RTCP-multiplexing policies MUST NOT be changed after the construction of the PeerConnection.
Calling this method may result in a change to the state of the ICE agent.
The addIceCandidate method provides an update to the ICE agent via an IceCandidate object (
Section 3.5.2.1). If the IceCandidate's candidate field is non-null, the IceCandidate is treated as a new remote ICE candidate, which will be added to the current and/or pending remote description according to the rules defined for Trickle ICE. Otherwise, the IceCandidate is treated as an end-of-candidates indication, as defined in
RFC 8838,
Section 14.
In either case, the "m=" section index, MID, and ufrag fields from the supplied IceCandidate are used to determine which "m=" section and ICE candidate generation the IceCandidate belongs to, as described in
Section 3.5.2.1 above. In the case of an end-of-candidates indication, null values for the "m=" section index and MID fields are interpreted to mean that the indication applies to all "m=" sections in the specified ICE candidate generation. However, if both fields are null for a new remote candidate, this
MUST be treated as an invalid condition, as specified below.
If any IceCandidate fields contain invalid values or an error occurs during the processing of the IceCandidate object, the supplied IceCandidate
MUST be ignored and an error
MUST be returned.
Otherwise, the new remote candidate or end-of-candidates indication is supplied to the ICE agent. In the case of a new remote candidate, connectivity checks will be sent to the new candidate, assuming setLocalDescription has already been called to initialize the ICE gathering process.
The onicecandidate event is dispatched to the application in two situations: (1) when the ICE agent has discovered a new allowed local ICE candidate during ICE gathering, as outlined in
Section 3.5.1 and subject to the restrictions discussed in
Section 3.5.3, or (2) when an ICE gathering phase completes. The event contains a single IceCandidate object, as defined in
Section 3.5.2.1.
In the first case, the newly discovered candidate is reflected in the IceCandidate object, and all of its fields
MUST be non-null. This candidate will also be added to the current and/or pending local description according to the rules defined for Trickle ICE.
In the second case, the event's IceCandidate object
MUST have its candidate field set to null to indicate that the current gathering phase is complete, i.e., there will be no further onicecandidate events in this phase. However, the IceCandidate's ufrag field
MUST be specified to indicate which ICE candidate generation is ending. The IceCandidate's "m=" section index and MID fields
MAY be specified to indicate that the event applies to a specific "m=" section, or set to null to indicate it applies to all "m=" sections in the current ICE candidate generation. This event can be used by the application to generate an end-of-candidates indication, as defined in
RFC 8838,
Section 13.