3.3 Format of response headers
The response header is composed of a response line, optionally followed by headers that encode the response parameters. An example of a response header could be: 200 1203 OK The response line starts with the response code, which is a three digit numeric value. The code is followed by a white space, and the transaction identifier. Response codes defined in packages (8xx) are followed by white space, a slash ("/") and the package name. All response codes may furthermore be followed by optional commentary preceded by a white space. The following table describes the parameters whose presence is mandatory or optional in a response header, as a function of the command that triggered the response. The letter M stands for mandatory, O for optional and F for forbidden. Unless otherwise specified, a parameter MUST NOT be present more than once. Note that the table only reflects the default for responses that have not defined any other behavior. If a response is received with a parameter that is either not understood or marked as forbidden, the offending parameter(s) MUST simply be ignored.
------------------------------------------------------------------ | Parameter name | EP | CR | MD | DL | RQ | NT | AU | AU | RS | | | CF | CX | CX | CX | NT | FY | EP | CX | IP | |---------------------|----|----|----|----|----|----|----|----|----| | BearerInformation | F | F | F | F | F | F | O | F | F | | CallId | F | F | F | F | F | F | F | O | F | | Capabilities | F | F | F | F | F | F | O*| F | F | | ConnectionId | F | O*| F | F | F | F | O*| F | F | | ConnectionMode | F | F | F | F | F | F | F | O | F | | Connection- | F | F | F | O*| F | F | F | O | F | | Parameters | | | | | | | | | | | DetectEvents | F | F | F | F | F | F | O | F | F | | DigitMap | F | F | F | F | F | F | O | F | F | | EventStates | F | F | F | F | F | F | O | F | F | | LocalConnection- | F | F | F | F | F | F | F | O | F | | Options | | | | | | | | | | | MaxMGCPDatagram | F | F | F | F | F | F | O | F | F | | NotifiedEntity | F | F | F | F | F | F | O | O | O | | ObservedEvents | F | F | F | F | F | F | O | F | F | | QuarantineHandling | F | F | F | F | F | F | O | F | F | | PackageList | O*| O*| O*| O*| O*| O*| O | O*| O*| | ReasonCode | F | F | F | F | F | F | O | F | F | | RequestIdentifier | F | F | F | F | F | F | O | F | F | | ResponseAck | O*| O*| O*| O*| O*| O*| O*| O*| O*| | RestartDelay | F | F | F | F | F | F | O | F | F | | RestartMethod | F | F | F | F | F | F | O | F | F | | RequestedEvents | F | F | F | F | F | F | O | F | F | | RequestedInfo | F | F | F | F | F | F | F | F | F | | SecondConnectionId | F | O | F | F | F | F | F | F | F | | SecondEndpointId | F | O | F | F | F | F | F | F | F | | SignalRequests | F | F | F | F | F | F | O | F | F | | SpecificEndpointId | F | O | F | F | F | F | O*| F | F | |---------------------|----|----|----|----|----|----|----|----|----| | LocalConnection- | F | O*| O | F | F | F | F | O*| F | | Descriptor | | | | | | | | | | | RemoteConnection- | F | F | F | F | F | F | F | O*| F | | Descriptor | | | | | | | | | | ------------------------------------------------------------------ Notes (*): * The PackageList parameter is only allowed with return code 518 (unsupported package), except for AuditEndpoint, where it may also be returned if audited.
* The ResponseAck parameter MUST NOT be used with any other responses than a final response issued after a provisional response for the transaction in question. In that case, the presence of the ResponseAck parameter SHOULD trigger a Response Acknowledgement - any ResponseAck values provided will be ignored. * In the case of a CreateConnection message, the response line is followed by a Connection-Id parameter and a LocalConnectionDescriptor. It may also be followed a Specific- Endpoint-Id parameter, if the creation request was sent to a wildcarded Endpoint-Id. The connection-Id and LocalConnectionDescriptor parameter are marked as optional in the Table. In fact, they are mandatory with all positive responses, when a connection was created, and forbidden when the response is negative, and no connection was created. * A LocalConnectionDescriptor MUST be transmitted with a positive response (code 200) to a CreateConnection. It MUST also be transmitted in response to a ModifyConnection command, if the modification resulted in a modification of the session parameters. The LocalConnectionDescriptor is encoded as a "session description", as defined in section 3.4. It is separated from the response header by an empty line. * Connection-Parameters are only valid in a response to a non- wildcarded DeleteConnection command sent by the Call Agent. * Multiple ConnectionId, SpecificEndpointId, and Capabilities parameters may be present in the response to an AuditEndpoint command. * When several session descriptors are encoded in the same response, they are encoded one after each other, separated by an empty line. This is the case for example when the response to an audit connection request carries both a local session description and a remote session description, as in:
200 1203 OK C: A3C47F21456789F0 N: [128.96.41.12] L: p:10, a:PCMU;G726-32 M: sendrecv P: PS=1245, OS=62345, PR=780, OR=45123, PL=10, JI=27,LA=48 v=0 o=- 25678 753849 IN IP4 128.96.41.1 s=- c=IN IP4 128.96.41.1 t=0 0 m=audio 1296 RTP/AVP 0 v=0 o=- 33343 346463 IN IP4 128.96.63.25 s=- c=IN IP4 128.96.63.25 t=0 0 m=audio 1296 RTP/AVP 0 96 a=rtpmap:96 G726-32/8000 In this example, according to the SDP syntax, each description starts with a "version" line, (v=...). The local description is always transmitted before the remote description. If a connection descriptor is requested, but it does not exist for the connection audited, that connection descriptor will appear with the SDP protocol version field only. The response parameters are described for each of the commands in the following.3.3.1 CreateConnection Response
In the case of a CreateConnection message, the response line is followed by a Connection-Id parameter with a successful response (code 200). A LocalConnectionDescriptor is furthermore transmitted with a positive response. The LocalConnectionDescriptor is encoded as a "session description", as defined by SDP (RFC 2327). It is separated from the response header by an empty line, e.g.:
200 1204 OK I: FDE234C8 v=0 o=- 25678 753849 IN IP4 128.96.41.1 s=- c=IN IP4 128.96.41.1 t=0 0 m=audio 3456 RTP/AVP 96 a=rtpmap:96 G726-32/8000 When a provisional response has been issued previously, the final response SHOULD furthermore contain the Response Acknowledgement parameter (final responses issued by entities adhering to this specification will include the parameter, but older RFC 2705 implementations MAY not): 200 1204 OK K: I: FDE234C8 v=0 o=- 25678 753849 IN IP4 128.96.41.1 s=- c=IN IP4 128.96.41.1 t=0 0 m=audio 3456 RTP/AVP 96 a=rtpmap:96 G726-32/8000 The final response SHOULD then be acknowledged by a Response Acknowledgement: 000 12043.3.2 ModifyConnection Response
In the case of a successful ModifyConnection message, the response line is followed by a LocalConnectionDescriptor, if the modification resulted in a modification of the session parameters (e.g., changing only the mode of a connection does not alter the session parameters). The LocalConnectionDescriptor is encoded as a "session description", as defined by SDP. It is separated from the response header by an empty line.
200 1207 OK v=0 o=- 25678 753849 IN IP4 128.96.41.1 s=- c=IN IP4 128.96.41.1 t=0 0 m=audio 3456 RTP/AVP 0 When a provisional response has been issued previously, the final response SHOULD furthermore contain the Response Acknowledgement parameter as in: 200 1207 OK K: The final response SHOULD then be acknowledged by a Response Acknowledgement: 000 1207 OK3.3.3 DeleteConnection Response
Depending on the variant of the DeleteConnection message, the response line may be followed by a Connection Parameters parameter line, as defined in Section 3.2.2.7. 250 1210 OK P: PS=1245, OS=62345, PR=780, OR=45123, PL=10, JI=27, LA=483.3.4 NotificationRequest Response
A successful NotificationRequest response does not include any additional response parameters.3.3.5 Notify Response
A successful Notify response does not include any additional response parameters.3.3.6 AuditEndpoint Response
In the case of a successful AuditEndPoint the response line may be followed by information for each of the parameters requested - each parameter will appear on a separate line. Parameters for which no
value currently exists, e.g., digit map, will still be provided but with an empty value. Each local endpoint name "expanded" by a wildcard character will appear on a separate line using the "SpecificEndPointId" parameter code, e.g.: 200 1200 OK Z: aaln/1@rgw.whatever.net Z: aaln/2@rgw.whatever.net When connection identifiers are audited and multiple connections exist on the endpoint, a comma-separated list of connection identifiers SHOULD be returned as in: 200 1200 OK I: FDE234C8, DFE233D1 Alternatively, multiple connection id parameter lines may be returned - the two forms should not be mixed although doing so does not constitute an error. When capabilities are audited, the response may include multiple capabilities parameter lines as in: 200 1200 OK A: a:PCMU;G728, p:10-100, e:on, s:off, t:1, v:L, m:sendonly;recvonly;sendrecv;inactive A: a:G729, p:30-90, e:on, s:on, t:1, v:L, m:sendonly;recvonly;sendrecv;inactive;confrnce Note: The carriage return for Capabilities shown above is present for formatting reasons only. It is not permissible in a real command encoding.3.3.7 AuditConnection Response
In the case of a successful AuditConnection, the response may be followed by information for each of the parameters requested. Parameters for which no value currently exists will still be provided. Connection descriptors will always appear last and each will be preceded by an empty line, as for example:
200 1203 OK C: A3C47F21456789F0 N: [128.96.41.12] L: p:10, a:PCMU;G728 M: sendrecv P: PS=622, OS=31172, PR=390, OR=22561, PL=5, JI=29, LA=50 v=0 o=- 4723891 7428910 IN IP4 128.96.63.25 s=- c=IN IP4 128.96.63.25 t=0 0 m=audio 1296 RTP/AVP 96 a=rtpmap:96 G726-32/8000 If both a local and a remote connection descriptor are provided, the local connection descriptor will be the first of the two. If a connection descriptor is requested, but it does not exist for the connection audited, that connection descriptor will appear with the SDP protocol version field only ("v=0"), as for example: 200 1203 OK v=03.3.8 RestartInProgress Response
A successful RestartInProgress response may include a NotifiedEntity parameter, but otherwise does not include any additional response parameters. Also, a 521 response to a RestartInProgress MUST include a NotifiedEntity parameter with the name of another Call Agent to contact when the first Call Agent redirects the endpoint to another Call Agent as in: 521 1204 Redirect N: CA-1@whatever.net3.4 Encoding of the Session Description (SDP)
The session description (SDP) is encoded in conformance with the session description protocol, SDP. MGCP implementations are REQUIRED to be fully capable of parsing any conformant SDP message, and MUST send session descriptions that strictly conform to the SDP standard.
The general description and explanation of SDP parameters can be found in RFC 2327 (or its successor). In particular, it should be noted that the * Origin ("o="), * Session Name ("s="), and * Time active ("t=") are all mandatory in RFC 2327. While they are of little use to MGCP, they MUST be provided in conformance with RFC 2327 nevertheless. The following suggests values to be used for each of the fields, however the reader is encouraged to consult RFC 2327 (or its successor) for details: Origin o = <username> <session id> <version> <network type> <address type> <address> * The username SHOULD be set to hyphen ("-"). * The session id is RECOMMENDED to be an NTP timestamp as suggested in RFC 2327. * The version is a version number that MUST increment with each change to the SDP. A counter initialized to zero or an NTP timestamp as suggested in RFC 2327 is RECOMMENDED. * The network type defines the type of network. For RTP sessions the network type SHOULD be "IN". * The address type defines the type of address. For RTP sessions the address type SHOULD be "IP4" (or "IP6"). * The address SHOULD be the same address as provided in the connection information ("c=") field. Session Name s = <session name> The session name should be hyphen ("-"). Time active t = <start time> <stop time>
* The start time may be set to zero. * The stop time should be set to zero. Each of the three fields can be ignored upon reception. To further accommodate the extensibility principles of MGCP, implementations are ENCOURAGED to support the PINT "a=require" attribute - please refer to RFC 2848 for further details. The usage of SDP actually depends on the type of session that is being established. Below we describe usage of SDP for an audio service using the RTP/AVP profile [4], or the LOCAL interconnect defined in this document. In case of any conflicts between what is described below and SDP (RFC 2327 or its successor), the SDP specification takes precedence.3.4.1 Usage of SDP for an Audio Service
In a telephony gateway, we only have to describe sessions that use exactly one media, audio. The usage of SDP for this is straightforward and described in detail in RFC 2327. The following is an example of an RFC 2327 conformant session description for an audio connection: v=0 o=- A7453949499 0 IN IP4 128.96.41.1 s=- c=IN IP4 128.96.41.1 t=0 0 m=audio 3456 RTP/AVP 0 96 a=rtpmap:96 G726-32/80003.4.2 Usage of SDP for LOCAL Connections
When MGCP is used to set up internal connections within a single gateway, the SDP format is used to encode the parameters of that connection. The connection and media parameters will be used as follows: * The connection parameter (c=) will specify that the connection is local, using the keyword "LOCAL" as network type, the keyword "EPN" (endpoint name) as address type, and the local name of the endpoint as the connection-address.
* The "m=audio" parameter will specify a port number, which will always be set to 0, the type of protocol, always set to the keyword LOCAL, and the type of encoding, using the same conventions used for the RTP AVP profile (RTP payload numbers). The type of encoding should normally be set to 0 (PCMU). A session-level attribute identifying the connection MAY furthermore be present. This enables endpoints to support multiple LOCAL connections. Use of this attribute is OPTIONAL and indeed unnecessary for endpoints that only support a single LOCAL connection. The attribute is defined as follows: a=MGCPlocalcx:<ConnectionID> The MGCP Local Connection attribute is a session level only case- insensitive attribute that identifies the MGCP LOCAL connection, on the endpoint identified in the connection information, to which the SDP applies. The ConnectionId is a hexadecimal string containing at most 32 characters. The ConnectionId itself is case-insensitive. The MGCP Local Connection attribute is not subject to the charset attribute. An example of a LOCAL session description could be: v=0 o=- A7453949499 0 LOCAL EPN X35V3+A4/13 s=- c=LOCAL EPN X35V3+A4/13 t=0 0 a=MGCPlocalcx:FDE234C8 m=audio 0 LOCAL 0 Note that the MGCP Local Connection attribute is specified at the session level and that it could have been omitted in case only a single LOCAL connection per endpoint is supported.3.5 Transmission over UDP
MGCP messages are transmitted over UDP. Commands are sent to one of the IP addresses defined in the DNS for the specified endpoint. The responses are sent back to the source address (i.e., IP address and UDP port number) of the commands - the response may or may not arrive from the same address as the command was sent to.
When no port is specified for the endpoint, the commands MUST by default be sent: * by the Call Agents, to the default MGCP port for gateways, 2427. * by the Gateways, to the default MGCP port for Call Agents, 2727.3.5.1 Providing the At-Most-Once Functionality
MGCP messages, being carried over UDP, may be subject to losses. In the absence of a timely response, commands are retransmitted. Most MGCP commands are not idempotent. The state of the gateway would become unpredictable if, for example, CreateConnection commands were executed several times. The transmission procedures MUST thus provide an "at-most-once" functionality. MGCP entities are expected to keep in memory a list of the responses that they sent to recent transactions, and a list of the transactions that are currently being executed. The numerical value of transaction identifiers of incoming commands are compared to the transaction identifiers of the recent responses. If a match is found, the MGCP entity does not execute the transaction again, but simply resends the response. The remaining commands will be compared to the list of current transactions, i.e., transactions received previously which have not yet finished executing. If a match is found, the MGCP entity does not execute the transaction again, but a provisional response (Section 3.5.5) SHOULD be issued to acknowledge receipt of the command. The procedure uses a long timer value, noted T-HIST in the following. The timer MUST be set larger than the maximum duration of a transaction, which MUST take into account the maximum number of repetitions, the maximum value of the repetition timer and the maximum propagation delay of a packet in the network. A suggested value is 30 seconds. The copy of the responses MAY be destroyed either T-HIST seconds after the response is issued, or when the gateway (or the Call Agent) receives a confirmation that the response has been received, through the "Response Acknowledgement". For transactions that are acknowledged through this attribute, the gateway SHALL keep a copy of the transaction-id (as opposed to the entire transaction response) for T-HIST seconds after the response is issued, in order to detect and ignore duplicate copies of the transaction request that could be produced by the network.
3.5.2 Transaction Identifiers and Three Ways Handshake
Transaction identifiers are integer numbers in the range from 1 to 999,999,999 (both included). Call-agents may decide to use a specific number space for each of the gateways that they manage, or to use the same number space for all gateways that belong to some arbitrary group. Call agents may decide to share the load of managing a large gateway between several independent processes. These processes MUST then share the transaction number space. There are multiple possible implementations of this sharing, such as having a centralized allocation of transaction identifiers, or pre- allocating non-overlapping ranges of identifiers to different processes. The implementations MUST guarantee that unique transaction identifiers are allocated to all transactions that originate from a logical call agent, as defined in Section 4. Gateways can simply detect duplicate transactions by looking at the transaction identifier only. The Response Acknowledgement Attribute can be found in any command. It carries a set of "confirmed transaction-id ranges" for final responses received - provisional responses MUST NOT be confirmed. A given response SHOULD NOT be confirmed in two separate messages. MGCP entities MAY choose to delete the copies of the responses (but not the transaction-id) to transactions whose id is included in "confirmed transaction-id ranges" received in the Response Confirmation messages (command or response). They SHOULD then silently discard further commands from that entity when the transaction-id falls within these ranges, and the response was issued less than T-HIST seconds ago. Entities MUST exercise due caution when acknowledging responses. In particular, a response SHOULD only be acknowledged if the response acknowledgement is sent to the same entity as the corresponding command (i.e., the command whose response is being acknowledged) was sent to. Likewise, entities SHOULD NOT blindly accept a response acknowledgement for a given response. However it is considered safe to accept a response acknowledgement for a given response, when that response acknowledgement is sent by the same entity as the command that generated that response. It should be noted, that use of response acknowledgments in commands (as opposed to the Response Acknowledgement response following a provisional response) is OPTIONAL. The benefit of using it is that it reduces overall memory consumption. However, in order to avoid large messages, implementations SHOULD NOT generate large response
acknowledgement lists. One strategy is to manage responses to commands on a per endpoint basis. A command for an endpoint can confirm a response to an older command for that same endpoint. Responses to commands with wildcarded endpoint names can be confirmed selectively with due consideration to message sizes, or alternatively simply not be acknowledged (unless the response explicitly required a Response Acknowledgement). Care must be taken to not confirm the same response twice or a response that is more than T-HIST seconds old. The "confirmed transaction-id ranges" values SHALL NOT be used if more than T-HIST seconds have elapsed since the entity issued its last response to the other entity, or when an entity resumes operation. In this situation, commands MUST be accepted and processed, without any test on the transaction-id. Commands that carry the "Response Acknowledgement attribute" may be transmitted in disorder. The union of the "confirmed transaction-id ranges" received in recent messages SHALL be retained.3.5.3 Computing Retransmission Timers
It is the responsibility of the requesting entity to provide suitable time outs for all outstanding commands, and to retry commands when time outs have been exceeded. Furthermore, when repeated commands fail to be acknowledged, it is the responsibility of the requesting entity to seek redundant services and/or clear existing or pending associations. The specification purposely avoids specifying any value for the retransmission timers. These values are typically network dependent. The retransmission timers SHOULD normally estimate the timer by measuring the time spent between the sending of a command and the return of the first response to the command. At a minimum, a retransmission strategy involving exponential backoff MUST be implemented. One possibility is to use the algorithm implemented in TCP/IP, which uses two variables: * the average acknowledgement delay, AAD, estimated through an exponentially smoothed average of the observed delays, * the average deviation, ADEV, estimated through an exponentially smoothed average of the absolute value of the difference between the observed delay and the current average.
The retransmission timer, RTO, in TCP, is set to the sum of the average delay plus N times the average deviation, where N is a constant. In MGCP, the maximum value of the timer SHOULD however be bounded, in order to guarantee that no repeated packet will be received by the gateways after T-HIST seconds. A suggested maximum value for RTO (RTO-MAX) is 4 seconds. Implementers SHOULD consider bounding the minimum value of this timer as well [19]. After any retransmission, the MGCP entity SHOULD do the following: * It should double the estimated value of the acknowledgement delay for this transaction, T-DELAY. * It should compute a random value, uniformly distributed between 0.5 T-DELAY and T-DELAY. * It should set the retransmission timer (RTO) to the minimum of: - the sum of that random value and N times the average deviation, - RTO-MAX. This procedure has two effects. Because it includes an exponentially increasing component, it will automatically slow down the stream of messages in case of congestion. Because it includes a random component, it will break the potential synchronization between notifications triggered by the same external event. Note that the estimators AAD and ADEV SHOULD NOT be updated for transactions that involve retransmissions. Also, the first new transmission following a successful retransmission SHOULD use the RTO for that last retransmission. If this transmission succeeds without any retransmissions, the AAD and ADEV estimators are updated and RTO is determined as usual again. See, e.g., [18] for further details.3.5.4 Maximum Datagram Size, Fragmentation and Reassembly
MGCP messages being transmitted over UDP rely on IP for fragmentation and reassembly of large datagrams. The maximum theoretical size of an IP datagram is 65535 bytes. With a 20-byte IP header and an 8- byte UDP header, this leaves us with a maximum theoretical MGCP message size of 65507 bytes when using UDP. However, IP does not require a host to receive IP datagrams larger than 576 bytes [21], which would provide an unacceptably small MGCP message size. Consequently, MGCP mandates that implementations MUST support MGCP datagrams up to at least 4000 bytes, which requires the
corresponding IP fragmentation and reassembly to be supported. Note, that the 4000 byte limit applies to the MGCP level. Lower layer overhead will require support for IP datagrams that are larger than this: UDP and IP overhead will be at least 28 bytes, and, e.g., use of IPSec will add additional overhead. It should be noted, that the above applies to both Call Agents and endpoints. Call Agents can audit endpoints to determine if they support larger MGCP datagrams than specified above. Endpoints do currently not have a similar capability to determine if a Call Agent supports larger MGCP datagram sizes.3.5.5 Piggybacking
There are cases when a Call Agent will want to send several messages at the same time to the same gateways, and vice versa. When several MGCP messages have to be sent in the same datagram, they MUST be separated by a line of text that contains a single dot, as in for example: 200 2005 OK . DLCX 1244 card23/21@tgw-7.example.net MGCP 1.0 C: A3C47F21456789F0 I: FDE234C8 The piggybacked messages MUST be processed exactly as if they had been received one at a time in several separate datagrams. Each message in the datagram MUST be processed to completion and in order starting with the first message, and each command MUST be responded to. Errors encountered in a message that was piggybacked MUST NOT affect any of the other messages received in that datagram - each message is processed on its own. Piggybacking can be used to achieve two things: * Guaranteed in-order delivery and processing of messages. * Fate sharing of message delivery. When piggybacking is used to guarantee in-order delivery of messages, entities MUST ensure that this in-order delivery property is retained on retransmissions of the individual messages. An example of this is when multiple Notify's are sent using piggybacking (as described in Section 4.4.1).
Fate sharing of message delivery ensures that either all the messages are delivered, or none of them are delivered. When piggybacking is used to guarantee this fate-sharing, entities MUST also ensure that this property is retained upon retransmission. For example, upon receiving a Notify from an endpoint operating in lockstep mode, the Call Agent may wish to send the response and a new NotificationRequest command in a single datagram to ensure message delivery fate-sharing of the two.3.5.6 Provisional Responses
Executing some transactions may require a long time. Long execution times may interact with the timer based retransmission procedure. This may result either in an inordinate number of retransmissions, or in timer values that become too long to be efficient. Gateways (and Call Agents) that can predict that a transaction will require a long execution time SHOULD send a provisional response with response code 100. As a guideline, a transaction that requires external communication to complete, e.g., network resource reservation, SHOULD issue a provisional response. Furthermore entities SHOULD send a provisional response if they receive a repetition of a transaction that has not yet finished executing. Gateways (or Call Agents) that start building up queues of transactions to be executed may send a provisional response with response code 101 to indicate this (see Section 4.4.8 for further details). Pure transactional semantics would imply, that provisional responses SHOULD NOT return any other information than the fact that the transaction is currently executing, however an optimistic approach allowing some information to be returned enables a reduction in the delay that would otherwise be incurred in the system. In order to reduce the delay in the system, it is RECOMMENDED to include a connection identifier and session description in a 100 provisional response to the CreateConnection command. If a session description would be returned by the ModifyConnection command, the session description SHOULD be included in the provisional response here as well. If the transaction completes successfully, the information returned in the provisional response MUST be repeated in the final response. It is considered a protocol error not to repeat this information or to change any of the previously supplied information in a successful response. If the transaction fails, an error code is returned - the information returned previously is no longer valid.
A currently executing CreateConnection or ModifyConnection transaction MUST be cancelled if a DeleteConnection command for the endpoint is received. In that case, a final response for the cancelled transaction SHOULD still be returned automatically (error code 407 - transaction aborted, is RECOMMENDED), and a final response for the cancelled transaction MUST be returned if a retransmission of the cancelled transaction is detected (see also Section 4.4.4). MGCP entities that receive a provisional response SHALL switch to a longer repetition timer (LONGTRAN-TIMER) for that transaction. The purpose of this timer is primarily to detect processing failures. The default value of LONGTRAN-TIMER is 5 seconds, however the provisioning process may alter this. Note, that retransmissions MUST still satisfy the timing requirements specified in Section 3.5.1 and 3.5.3. Consequently LONGTRAN-TIMER MUST be smaller than T-HIST (it should in fact be considerably smaller). Also, entities MUST NOT let a transaction run forever. A transaction that is timed out by the entity SHOULD return error code 406 (transaction time-out). Per the definition of T-HIST (Section 3.5.1), the maximum transaction execution time is smaller than T-HIST (in a network with low delay, it can reasonably safely be approximated as T-HIST minus T-MAX), and a final response should be received no more than T-HIST seconds after the command was sent initially. Nevertheless, entities SHOULD wait for 2*T-HIST seconds before giving up on receiving a final response. Retransmission of the command MUST still cease after T-MAX seconds though. If a response is not received, the outcome of the transaction is not known. If the entity sending the command was a gateway, it now becomes "disconnected" and SHALL initiate the "disconnected" procedure (see Section 4.4.7). When the transaction finishes execution, the final response is sent and the by now obsolete provisional response is deleted. In order to ensure rapid detection of a lost final response, final responses issued after provisional responses for a transaction SHOULD be acknowledged (unfortunately older RFC 2705 implementations may not do this, which is the only reason it is not an absolute requirement). The endpoint SHOULD therefore include an empty "ResponseAck" parameter in those, and only those, final responses. The presence of the "ResponseAck" parameter in the final response SHOULD trigger a "Response Acknowledgement" response to be sent back to the endpoint. The Response Acknowledgement" response will then include the transaction-id of the response it acknowledges in the response header. Note that, for backwards compatibility, entities cannot depend on receiving such a "response acknowledgement", however it is strongly RECOMMENDED to support this behavior, as excessive delays in case of packet loss as well as excessive retransmissions may occur otherwise.
Receipt of a "Response Acknowledgement" response is subject to the same time-out and retransmission strategies and procedures as responses to commands, i.e., the sender of the final response will retransmit it if a "Response Acknowledgement" is not received in time. For backwards compatibility, failure to receive a "response acknowledgement" SHOULD NOT affect the roundtrip time estimates for subsequent commands, and furthermore MUST NOT lead to the endpoint becoming "disconnected". The "Response Acknowledgment" response is never acknowledged.4. States, Failover and Race Conditions
In order to implement proper call signaling, the Call Agent must keep track of the state of the endpoint, and the gateway must make sure that events are properly notified to the Call Agent. Special conditions exist when the gateway or the Call Agent are restarted: the gateway must be redirected to a new Call Agent during "failover" procedures, the Call Agent must take special action when the gateway is taken offline, or restarted.4.1 Failover Assumptions and Highlights
The following protocol highlights are important to understanding Call Agent fail-over mechanisms: * Call Agents are identified by their domain name (and optional port), not their network addresses, and several addresses can be associated with a domain name. * An endpoint has one and only one Call Agent associated with it at any given point in time. The Call Agent associated with an endpoint is the current value of the "notified entity". The "notified entity" determines where the gateway will send it's commands. If the "notified entity" does not include a port number, the default Call Agent port number (2727) is assumed. * NotifiedEntity is a parameter sent by the Call Agent to the gateway to set the "notified entity" for the endpoint. * The "notified entity" for an endpoint is the last value of the NotifiedEntity parameter received for this endpoint. If no explicit NotifiedEntity parameter has ever been received, the "notified entity" defaults to a provisioned value. If no value was provisioned or an empty NotifiedEntity parameter was provided (both strongly discouraged) thereby making the "notified entity" empty, the "notified entity" is set to the source address of the last non-audit command for the endpoint. Thus auditing will not change the "notified entity".
* Responses to commands are sent to the source address of the command, regardless of the current "notified entity". When a Notify message needs to be piggybacked with the response, the datagram is still sent to the source address of the new command received, regardless of the current "notified entity". The ability for the "notified entity" to resolve to multiple network addresses, allows a "notified entity" to represent a Call Agent with multiple physical interfaces on it and/or a logical Call Agent made up of multiple physical systems. The order of network addresses when a DNS name resolves to multiple addresses is non-deterministic so Call Agent fail-over schemes MUST NOT depend on any order (e.g., a gateway MUST be able to send a "Notify" to any of the resolved network addresses). On the other hand, the system is likely to be most efficient if the gateway sends commands to the interface with which it already has a current association. It is RECOMMENDED that gateways use the following algorithm to achieve that goal: * If the "notified entity" resolves to multiple network addresses, and the source address of the request is one of those addresses, that network address is the preferred destination address for commands. * If on the other hand, the source address of the request is not one of the resolved addresses, the gateway must choose one of the resolved addresses for commands. * If the gateway fails to contact the network address chosen, it MUST try the alternatives in the resolved list as described in Section 4.3. If an entire Call Agent becomes unavailable, the endpoints managed by that Call Agent will eventually become "disconnected". The only way for these endpoints to become connected again is either for the failed Call Agent to become available, or for a backup call agent to contact the affected endpoints with a new "notified entity". When a backup Call Agent has taken over control of a group of endpoints, it is assumed that the failed Call Agent will communicate and synchronize with the backup Call Agent in order to transfer control of the affected endpoints back to the original Call Agent. Alternatively, the failed Call Agent could simply become the backup Call Agent.
We should note that handover conflict resolution between separate CA's is not in place - we are relying strictly on the CA's knowing what they are doing and communicating with each other (although AuditEndpoint can be used to learn about the current "notified entity"). If this is not the case, unexpected behavior may occur. Note that as mentioned earlier, the default "notified entity" is provisioned and may include both domain name and port. For small gateways, provisioning may be done on a per endpoint basis. For much larger gateways, a single provisioning element may be provided for multiple endpoints or even for the entire gateway itself. In either case, once the gateway powers up, each endpoint MUST have its own "notified entity", so provisioned values for an aggregation of endpoints MUST be copied to the "notified entity" for each endpoint in the aggregation before operation proceeds. Where possible, the RestartInProgress command on restart SHOULD be sent to the provisioned "notified entity" based on an aggregation that allows the "all of" wild-card to be used. This will reduce the number of RestartInProgress messages. Another way of viewing the use of "notified entity" is in terms of associations between gateways and Call Agents. The "notified entity" is a means to set up that association, and governs where the gateway will send commands to. Commands received by the gateway however may come from any source. The association is initially provisioned with a provisioned "notified entity", so that on power up RestartInProgress and persistent events that occur prior to the first NotificationRequest from Call Agents will be sent to the provisioned Call Agent. Once a Call Agent makes a request, however it may include the NotifiedEntity parameter and set up a new association. Since the "notified entity" persists across calls, the association remains intact until a new "notified entity" is provided.4.2 Communicating with Gateways
Endpoint names in gateways include a local name indicating the specific endpoint and a domain name indicating the host/gateway where the endpoint resides. Gateways may have several interfaces for redundancy. In gateways that have routing capability, the domain name may resolve to a single network address with internal routing to that address from any of the gateway's interfaces. In others, the domain name may resolve to multiple network addresses, one for each interface. In the latter case, if a Call Agent fails to contact the gateway on one of the addresses, it MUST try the alternates.
4.3 Retransmission, and Detection of Lost Associations:
The media gateway control protocol is organized as a set of transactions, each of which is composed of a command and a response, commonly referred to as an acknowledgement. The MGCP messages, being carried over UDP, may be subject to losses. In the absence of a timely response, commands are retransmitted. MGCP entities MUST keep in memory a list of the responses that they sent to recent transactions, i.e., a list of all the responses they sent over the last T-HIST seconds, and a list of the transactions that have not yet finished executing. The transaction identifiers of incoming commands are compared to the transaction identifiers of the recent responses. If a match is found, the MGCP entity does not execute the transaction, but simply repeats the response. If a match to a previously responded to transaction is not found, the transaction identifier of the incoming command is compared to the list of transactions that have not yet finished executing. If a match is found, the MGCP entity does not execute the transaction again, but SHOULD simply send a provisional response - a final response will be provided when the execution of the command is complete (see Section 3.5.6 for further detail). The repetition mechanism is used to guard against four types of possible errors: * transmission errors, when for example a packet is lost due to noise on a line or congestion in a queue, * component failure, when for example an interface to a Call Agent becomes unavailable, * Call Agent failure, when for example an entire Call Agent becomes unavailable, * failover, when a new Call Agent is "taking over" transparently. The elements should be able to derive from the past history an estimate of the packet loss rate due to transmission errors. In a properly configured system, this loss rate should be very low, typically less than 1%. If a Call Agent or a gateway has to repeat a message more than a few times, it is very legitimate to assume that something other than a transmission error is occurring. For example, given a loss rate of 1%, the probability that 5 consecutive transmission attempts fail is 1 in 100 billion, an event that should occur less than once every 10 days for a Call Agent that processes 1,000 transactions per second. (Indeed, the number of retransmissions that is considered excessive should be a function of
the prevailing packet loss rate.) We should note that the "suspicion
threshold", which we will call "Max1", is normally lower than the
"disconnection threshold", which we will call "Max2". Max2 MUST be
set to a larger value than Max1.
The MGCP retransmission algorithm is illustrated in the Figure below
and explained further in the following:
Command issued: N=0, T=0 | | +------------ retransmission: N++ <--------------+ | | | | | if T <= T-Max then | | | transmission | | | +-- to new address, <-+<----------------------|--+ | | | N=0 | | | V V V | | | +-----------+ | | | +-->| awaiting |- new Call Agent ->+ +------------+ | | | | response |--- timer elapsed --->| T > T-Max ?| | | | +-----------+ +------------+ ^ ^ | | | | | | | v +-----(yes)-----+ (no) | | | (response | | | | | received) | +------------+ | | | | | | N >= Max1 ?|-(no)>+ | | v | +------------+ ^ ^ | +--------+ | | | | +<(no)-| final ?| | (yes) | | ^ +--------+ | | | | | | | (if first address & N=Max1, | | | v | or last address & N=Max2 | | | (yes) | check DNS) | | | | | | | | | v V +---------------+ | | | (end) | |more addresses?|(yes)-|->+ | | +---------------+ | | | | ^ | | (no) | | | | | | | +------------+ | | | | N >= Max2 ?|(no)--+ | | +------------+ | | | | | (yes) | | | | | +----------------+ | +----------->| T >= 2*T-HIST ?| | +----------------+ | | | | (no) (yes) +---------------<-----------------------+ | v (disconnected)
A classic retransmission algorithm would simply count the number of successive repetitions, and conclude that the association is broken after re-transmitting the packet an excessive number of times (typically between 7 and 11 times). In order to account for the possibility of an undetected or in-progress "failover", we modify the classic algorithm as follows: * We require that the gateway always checks for the presence of a new Call Agent. It can be noticed either by: - receiving a command where the NotifiedEntity points to the new Call Agent, or - receiving a redirection response pointing to a new Call Agent. If a new Call Agent is detected, the gateway MUST start retransmitting outstanding commands for the endpoint(s) redirected to that new Call Agent. Responses to new or old commands are still transmitted to the source address of the command. * Prior to any retransmission, it is checked that the time elapsed since the sending of the initial datagram is no greater than T-MAX. If more than T-MAX time has elapsed, then retransmissions MUST cease. If more than 2*T-HIST has elapsed, then the endpoint becomes disconnected. * If the number of repetitions for this Call Agent is equal to "Max1", and its domain name was not resolved recently (e.g., within the last 5 seconds or otherwise provisioned), and it is not in the process of being resolved, then the gateway MAY actively query the domain name server in order to detect the possible change of the Call Agent interfaces. Note that the first repetition is the second transmission. * The gateway may have learned several IP addresses for the call agent. If the number of repetitions for this IP address is greater than or equal to "Max1" and lower than "Max2", and there are more addresses that have not been tried, then the gateway MUST direct the retransmissions to alternate addresses. Also, receipt of explicit network notifications such as, e.g., ICMP network, host, protocol, or port unreachable SHOULD lead the gateway to try alternate addresses (with due consideration to possible security issues).
* If there are no more interfaces to try, and the number of repetitions for this address is Max2, then the gateway SHOULD contact the DNS one more time to see if any other interfaces have become available, unless the domain name was resolved recently (e.g., within the last 5 seconds or otherwise provisioned), or it is already in the process of being resolved. If there still are no more interfaces to try, the gateway is then disconnected and MUST initiate the "disconnected" procedure (see Section 4.4.7). In order to automatically adapt to network load, MGCP specifies exponentially increasing timers. If the initial timer is set to 200 milliseconds, the loss of a fifth retransmission will be detected after about 6 seconds. This is probably an acceptable waiting delay to detect a failover. The repetitions should continue after that delay not only in order to perhaps overcome a transient connectivity problem, but also in order to allow some more time for the execution of a failover - waiting a total delay of 30 seconds is probably acceptable. It is however important that the maximum delay of retransmissions be bounded. Prior to any retransmission, it is checked that the time (T) elapsed since the sending of the initial datagram is no greater than T-MAX. If more than T-MAX time has elapsed, retransmissions MUST cease. If more than 2*T-HIST time has elapsed, the endpoint becomes disconnected. The value T-MAX is related to the T-HIST value: the T-HIST value MUST be greater than or equal to T-MAX plus the maximum propagation delay in the network. The default value for T-MAX is 20 seconds. Thus, if the assumed maximum propagation delay is 10 seconds, then responses to old transactions would have to be kept for a period of at least 30 seconds. The importance of having the sender and receiver agree on these values cannot be overstated. The default value for Max1 is 5 retransmissions and the default value for Max2 is 7 retransmissions. Both of these values may be altered by the provisioning process. The provisioning process MUST be able to disable one or both of the Max1 and Max2 DNS queries.