5. ZRTP Messages
All ZRTP messages use the message format defined in Figure 2. All word lengths referenced in this specification are 32 bits, or 4 octets. All integer fields are carried in network byte order, that is, most-significant byte (octet) first, commonly known as big- endian.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 0 0 1|Not Used (set to zero) | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Magic Cookie 'ZRTP' (0x5a525450) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZRTP Message (length depends on Message Type) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | CRC (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2: ZRTP Packet Format The Sequence Number is a count that is incremented for each ZRTP packet sent. The count is initialized to a random value. This is useful in estimating ZRTP packet loss and also detecting when ZRTP packets arrive out of sequence. The ZRTP Magic Cookie is a 32-bit string that uniquely identifies a ZRTP packet and has the value 0x5a525450. Source Identifier is the SSRC number of the RTP stream to which this ZRTP packet relates. For cases of forking or forwarding, RTP, and hence ZRTP, may arrive at the same port from several different sources -- each of these sources will have a different SSRC and may initiate an independent ZRTP protocol session. SSRC collisions would be disruptive to ZRTP. SSRC collision handling procedures are described in Section 4.1. This format is clearly identifiable as non-RTP due to the first two bits being zero, which looks like RTP version 0, which is not a valid RTP version number. It is clearly distinguishable from STUN since the Magic Cookies are different. The 12 unused bits are set to zero and MUST be ignored when received. In early versions of this spec, ZRTP messages were encapsulated in RTP header extensions, which made ZRTP an eponymous variant of RTP. In later versions, the packet format changed to make it syntactically distinguishable from RTP. The ZRTP messages are defined in Figures 3 to 17 and are of variable length.
The ZRTP protocol uses a 32-bit Cyclic Redundancy Check (CRC) as defined in RFC 4960, Appendix B [RFC4960], in each ZRTP packet to detect transmission errors. ZRTP packets are typically transported by UDP, which carries its own built-in 16-bit checksum for integrity, but ZRTP does not rely on it. This is because of the effect of an undetected transmission error in a ZRTP message. For example, an undetected error in the DH exchange could appear to be an active man- in-the-middle attack. A false announcement of this by ZRTP clients can be psychologically distressing. The probability of such a false alarm hinges on a mere 16-bit checksum that usually protects UDP packets, so more error detection is needed. For these reasons, this belt-and-suspenders approach is used to minimize the chance of a transmission error affecting the ZRTP key agreement. The CRC is calculated across the entire ZRTP packet shown in Figure 2, including the ZRTP header and the ZRTP message, but not including the CRC field. If a ZRTP message fails the CRC check, it is silently discarded.5.1. ZRTP Message Formats
ZRTP messages are designed to simplify endpoint parsing requirements and to reduce the opportunities for buffer overflow attacks (a good goal of any security extension should be to not introduce new attack vectors). ZRTP uses a block of 8 octets (2 words) to encode the Message Type. 4-octet (1 word) blocks are used to encode Hash Type, Cipher Type, Key Agreement Type, and Authentication Tag Type. The values in the blocks are ASCII strings that are extended with spaces (0x20) to make them the desired length. Currently defined block values are listed in Tables 1-6. Additional block values may be defined and used. ZRTP uses this ASCII encoding to simplify debugging and make it "Wireshark (Ethereal) friendly".5.1.1. Message Type Block
Currently, 16 Message Type Blocks are defined -- they represent the set of ZRTP message primitives. ZRTP endpoints MUST support the Hello, HelloACK, Commit, DHPart1, DHPart2, Confirm1, Confirm2, Conf2ACK, SASrelay, RelayACK, Error, ErrorACK, and PingACK message types. ZRTP endpoints MAY support the GoClear, ClearACK, and Ping messages. In order to generate a PingACK message, it is necessary to parse a Ping message. Additional messages may be defined in extensions to ZRTP.
Message Type Block | Meaning --------------------------------------------------- "Hello " | Hello Message --------------------------------------------------- "HelloACK" | HelloACK Message --------------------------------------------------- "Commit " | Commit Message --------------------------------------------------- "DHPart1 " | DHPart1 Message --------------------------------------------------- "DHPart2 " | DHPart2 Message --------------------------------------------------- "Confirm1" | Confirm1 Message --------------------------------------------------- "Confirm2" | Confirm2 Message --------------------------------------------------- "Conf2ACK" | Conf2ACK Message --------------------------------------------------- "Error " | Error Message --------------------------------------------------- "ErrorACK" | ErrorACK Message --------------------------------------------------- "GoClear " | GoClear Message --------------------------------------------------- "ClearACK" | ClearACK Message --------------------------------------------------- "SASrelay" | SASrelay Message --------------------------------------------------- "RelayACK" | RelayACK Message --------------------------------------------------- "Ping " | Ping Message --------------------------------------------------- "PingACK " | PingACK Message --------------------------------------------------- Table 1. Message Type Block Values5.1.2. Hash Type Block
The hash algorithm and its related MAC algorithm are negotiated via the Hash Type Block found in the Hello message (Section 5.2) and the Commit message (Section 5.4). All ZRTP endpoints MUST support a Hash Type of SHA-256 [FIPS-180-3]. SHA-384 SHOULD be supported and MUST be supported if ECDH-384 is used. Additional Hash Types MAY be used, such as the NIST SHA-3 hash [SHA-3] when it becomes available. Note that the Hash Type refers to
the hash algorithm that will be used throughout the ZRTP key exchange, not the hash algorithm to be used in the SRTP Authentication Tag. The choice of the negotiated Hash Type is coupled to the Key Agreement Type, as explained in Section 5.1.5. Hash Type Block | Meaning ---------------------------------------------------------- "S256" | SHA-256 Hash defined in FIPS 180-3 ---------------------------------------------------------- "S384" | SHA-384 Hash defined in FIPS 180-3 ---------------------------------------------------------- "N256" | NIST SHA-3 256-bit hash (when published) ---------------------------------------------------------- "N384" | NIST SHA-3 384-bit hash (when published) ---------------------------------------------------------- Table 2. Hash Type Block Values At the time of this writing, the NIST SHA-3 hashes [SHA-3] are not yet available. NIST is expected to publish SHA-3 in 2012, as a successor to the SHA-2 hashes in [FIPS-180-3].5.1.2.1. Negotiated Hash and MAC Algorithm
ZRTP makes use of message authentication codes (MACs) that are keyed hashes based on the negotiated Hash Type. For the SHA-2 and SHA-3 hashes, the negotiated MAC is the HMAC based on the negotiated hash. This MAC function is also used in the ZRTP key derivation function (Section 4.5.1). The HMAC function is defined in [FIPS-198-1]. A discussion of the general security of the HMAC construction may be found in [RFC2104]. Test vectors for HMAC-SHA-256 and HMAC-SHA-384 may be found in [RFC4231]. The negotiated Hash Type does not apply to the hash used in the digital signature defined in Section 7.2. For example, even if the negotiated Hash Type is SHA-256, the digital signature may use SHA- 384 if an Elliptic Curve Digital Signature Algorithm (ECDSA) P-384 signature key is used. Digital signatures are optional in ZRTP. Except for the aforementioned digital signatures, and the special cases noted in Section 5.1.2.2, all the other hashes and MACs used throughout the ZRTP protocol will use the negotiated Hash Type.
A future hash may include its own built-in MAC, not based on the HMAC construct, for example, the Skein hash function [Skein]. If NIST chooses such a hash as the SHA-3 winner, Hash Types "N256", and "N384" will still use the related HMAC as the negotiated MAC. If an implementer wishes to use Skein and its built-in MAC as the negotiated MAC, new Hash Types must be used.5.1.2.2. Implicit Hash and MAC Algorithm
While most of the hash and MAC usage in ZRTP is defined by the negotiated Hash Type (Section 5.1.2), some hashes and MACs must be precomputed prior to negotiations, and thus cannot have their algorithms negotiated during the ZRTP exchange. They are implicitly predetermined to use SHA-256 [FIPS-180-3] and HMAC-SHA-256. These are the hashes and MACs that MUST use the Implicit hash and MAC algorithm: The hash chain H0-H3 defined in Section 9. The MACs that are keyed by this hash chain, as defined in Section 8.1.1. The Hello Hash in the a=zrtp-hash attribute defined in Section 8.1. ZRTP defines a method for negotiating different ZRTP protocol versions (Section 4.1.1). SHA-256 is the Implicit Hash and HMAC-SHA- 256 is the Implicit MAC for ZRTP protocol version 1.10. Future ZRTP protocol versions may, if appropriate, use another hash algorithm as the Implicit Hash, such as the NIST SHA-3 hash [SHA-3], when it becomes available. For example, a future SIP packet may list two a=zrtp-hash SDP attributes, one based on SHA-256 for ZRTP version 1.10, and another based on SHA-3 for ZRTP version 2.00.5.1.3. Cipher Type Block
The block cipher algorithm is negotiated via the Cipher Type Block found in the Hello message (Section 5.2) and the Commit message (Section 5.4). All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES- 192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced Encryption Standard is defined in [FIPS-197].
The use of AES-128 in SRTP is defined by [RFC3711]. The use of AES- 192 and AES-256 in SRTP is defined by [RFC6188]. The choice of the AES key length is coupled to the Key Agreement Type, as explained in Section 5.1.5. Other block ciphers may be supported that have the same block size and key sizes as AES. If implemented, they may be used anywhere in ZRTP or SRTP in place of the AES, in the same modes of operation and key size. Notably, in counter mode to replace AES-CM in [RFC3711] and [RFC6188], as well as in CFB mode to encrypt a portion of the Confirm message (Figure 10) and SASrelay message (Figure 16). ZRTP endpoints MAY support the TwoFish [TwoFish] block cipher. Cipher Type Block | Meaning ------------------------------------------------- "AES1" | AES with 128-bit keys ------------------------------------------------- "AES2" | AES with 192-bit keys ------------------------------------------------- "AES3" | AES with 256-bit keys ------------------------------------------------- "2FS1" | TwoFish with 128-bit keys ------------------------------------------------- "2FS2" | TwoFish with 192-bit keys ------------------------------------------------- "2FS3" | TwoFish with 256-bit keys ------------------------------------------------- Table 3. Cipher Type Block Values5.1.4. Auth Tag Type Block
All ZRTP endpoints MUST support HMAC-SHA1 authentication tags for SRTP, with both 32-bit and 80-bit length tags as defined in [RFC3711]. ZRTP endpoints MAY support 32-bit and 64-bit SRTP authentication tags based on the Skein hash function [Skein]. The Skein-512-MAC key length is fixed at 256 bits for this application, and the output length is adjustable. The Skein MAC is defined in Sections 2.6 and 4.3 of [Skein] and is not based on the HMAC construct. Reference implementations for Skein may be found at [Skein1]. A Skein-based MAC is significantly more efficient than HMAC-SHA1, especially for short SRTP payloads. The Skein MAC key is computed by the SRTP key derivation function, which is also referred to as the AES-CM PRF, or pseudorandom function. This is defined either in [RFC3711] or in [RFC6188],
depending on the selected SRTP AES key length. To compute a Skein MAC key, the SRTP PRF output for the authentication key is left untruncated at 256 bits, instead of the usual truncated length of 160 bits (the key length used by HMAC-SHA1). Auth Tag Type Block | Meaning ---------------------------------------------------------- "HS32" | 32-bit authentication tag based on | HMAC-SHA1 as defined in RFC 3711. ---------------------------------------------------------- "HS80" | 80-bit authentication tag based on | HMAC-SHA1 as defined in RFC 3711. ---------------------------------------------------------- "SK32" | 32-bit authentication tag based on | Skein-512-MAC as defined in [Skein], | with 256-bit key, 32-bit MAC length. ---------------------------------------------------------- "SK64" | 64-bit authentication tag based on | Skein-512-MAC as defined in [Skein], | with 256-bit key, 64-bit MAC length. ---------------------------------------------------------- Table 4. Auth Tag Type Values Implementers should be aware that AES-GCM and AES-CCM for SRTP are expected to become available when [SRTP-AES-GCM] is published as an RFC. If an implementer wishes to use these modes when they become available, new Auth Tag Types must be added.5.1.5. Key Agreement Type Block
All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and MAY support EC25, EC38, and DH2k. If a ZRTP endpoint supports multiple concurrent media streams, such as audio and video, it MUST support Multistream (Section 4.4.3) mode. Also, if a ZRTP endpoint supports the GoClear message (Section 4.7.2), it SHOULD support Multistream, to be used if the two parties choose to return to the secure state after going Clear (as explained in Section 4.7.2.1). For Finite Field Diffie-Hellman, ZRTP endpoints MUST use the DH parameters defined in [RFC3526], as follows. DH3k uses the 3072-bit modular exponentiation group (MODP). DH2k uses the 2048-bit MODP group. The DH generator g is 2. The random Diffie-Hellman secret exponent SHOULD be twice as long as the AES key length. If AES-128 is used, the DH secret value SHOULD be 256 bits long. If AES-256 is used, the secret value SHOULD be 512 bits long.
If Elliptic Curve DH is used, the ECDH algorithm and key generation is from [NIST-SP800-56A]. The curves used are from [NSA-Suite-B], which uses the same curves as ECDSA defined by [FIPS-186-3], and can also be found in RFC 5114, Sections 2.6 through 2.8 [RFC5114]. ECDH test vectors may be found in RFC 5114, appendices A.6 through A.8 [RFC5114]. The validation procedures are from [NIST-SP800-56A], Section 5.6.2.6, method 3, Elliptic Curve Cryptography (ECC) Partial Validation. Both the X and Y coordinates of the point on the curve are sent, in the first and second half of the ECDH public value, respectively. The ECDH result returns only the X coordinate, as specified in SP 800-56A. Useful strategies for implementing ECC may be found in [RFC6090]. The choice of the negotiated hash algorithm (Section 5.1.2) is coupled to the choice of Key Agreement Type. If ECDH-384 (EC38) is chosen as the key agreement, the negotiated hash algorithm MUST be either SHA-384 or the corresponding SHA-3 successor. The choice of AES key length is coupled to the choice of Key Agreement Type. If EC38 is chosen as the key agreement, AES-256 (AES3) SHOULD be used but AES-192 MAY be used. If DH3k or EC25 is chosen, any AES key size MAY be used. Note that SRTP as defined in [RFC3711] only supports AES-128. DH2k is intended to provide acceptable security for low power applications, or for applications that require faster key negotiations. NIST asserts in Table 4 of [NIST-SP800-131A] that DH- 2048 is safe to use through 2013. The security of DH2k can be augmented by implementing ZRTP's key continuity features (Section 15.1). DH2k SHOULD use AES-128. If an implementor must use slow hardware, DH2k should precede DH3k in the Hello message. ECDH-521 SHOULD NOT be used, due to disruptive computational delays. These delays may lead to exhaustion of the retransmission schedule, unless both endpoints have very fast hardware. Note that ECDH-521 is not part of NSA Suite B. ZRTP also defines two non-DH modes, Multistream and Preshared, in which the SRTP key is derived from a shared secret and some nonce material. The table below lists the pv length in words and DHPart1 and DHPart2 message length in words for each Key Agreement Type Block.
Key Agreement | pv | message | Meaning Type Block | words | words | ----------------------------------------------------------- "DH3k" | 96 | 117 | DH mode with p=3072 bit prime | | | per RFC 3526, Section 4. ----------------------------------------------------------- "DH2k" | 64 | 85 | DH mode with p=2048 bit prime | | | per RFC 3526, Section 3. ----------------------------------------------------------- "EC25" | 16 | 37 | Elliptic Curve DH, P-256 | | | per RFC 5114, Section 2.6 ----------------------------------------------------------- "EC38" | 24 | 45 | Elliptic Curve DH, P-384 | | | per RFC 5114, Section 2.7 ----------------------------------------------------------- "EC52" | 33 | 54 | Elliptic Curve DH, P-521 | | | per RFC 5114, Section 2.8 | | | (deprecated - do not use) ----------------------------------------------------------- "Prsh" | - | - | Preshared Non-DH mode ----------------------------------------------------------- "Mult" | - | - | Multistream Non-DH mode ----------------------------------------------------------- Table 5. Key Agreement Type Block Values5.1.6. SAS Type Block
The SAS Type determines how the SAS is rendered to the user so that the user may verbally compare it with his partner over the voice channel. This allows detection of a MiTM attack. All ZRTP endpoints MUST support the base32 and MAY support the base256 rendering schemes for the Short Authentication String, and other SAS rendering schemes. See Section 4.5.2 for how the sasvalue is computed and Section 7 for how the SAS is used. SAS Type Block | Meaning --------------------------------------------------- "B32 " | Short Authentication String using | base32 encoding --------------------------------------------------- "B256" | Short Authentication String using | base256 encoding (PGP Word List) --------------------------------------------------- Table 6. SAS Type Block Values
For the SAS Type of "B256", the most-significant (leftmost) 16 bits of the 32-bit sasvalue are rendered in network byte order using the PGP Word List [pgpwordlist] [Juola1][Juola2]. For the SAS Type of "B32 ", the most-significant (leftmost) 20 bits of the 32-bit sasvalue are rendered as a form of base32 encoding. The leftmost 20 bits of the sasvalue results in four base32 characters that are rendered, most-significant quintet first, to both ZRTP endpoints. Here is a normative pseudocode implementation of the base32 function: char[4] base32(uint32 bits) { int i, n, shift; char result[4]; for (i=0,shift=27; i!=4; ++i,shift-=5) { n = (bits>>shift) & 31; result[i] = "ybndrfg8ejkmcpqxot1uwisza345h769"[n]; } return result; } This base32 encoding scheme differs from RFC 4648, and was designed (by Bryce Wilcox-O'Hearn) to represent bit sequences in a form that is convenient for human users to manipulate with minimal ambiguity. The unusually permuted character ordering was designed for other applications that use bit sequences that do not end on quintet boundaries.5.1.7. Signature Type Block
The Signature Type Block specifies what signature algorithm is used to sign the SAS as discussed in Section 7.2. The 4-octet Signature Type Block, along with the accompanying signature block, are OPTIONAL and may be present in the Confirm message (Figure 10) or the SASrelay message (Figure 16). The signature types are given in the table below. Signature | Meaning Type Block | ------------------------------------------------ "PGP " | OpenPGP Signature, per RFC 4880 | ------------------------------------------------ "X509" | ECDSA, with X.509v3 cert | per RFC 5759 and FIPS-186-3 ------------------------------------------------ Table 7. Signature Type Block Values
Additional details on the signature and signing key format may be found in Section 7.2. OpenPGP signatures (Signature Type "PGP ") are discussed in Section 7.2.1. The ECDSA curves are over prime fields only, drawn from Appendix D.1.2 of [FIPS-186-3]. X.509v3 ECDSA Signatures (Signature Type "X509") are discussed in Section 7.2.2.5.2. Hello Message
The Hello message has the format shown in Figure 3. All ZRTP messages begin with the preamble value 0x505a, then a 16-bit length in 32-bit words. This length includes only the ZRTP message (including the preamble and the length) but not the ZRTP packet header or CRC. The 8-octet Message Type follows the length field. Next, there is a 4-character string containing the version (ver) of the ZRTP protocol, which is "1.10" for this specification. Next, there is the Client Identifier string (cid), which is 4 words long and identifies the vendor and release of the ZRTP software. The 256- bit hash image H3 is defined in Section 9. The next parameter is the ZID, the 96-bit-long unique identifier for the ZRTP endpoint, defined in Section 4.9. The next four bits include three flag bits: o The Signature-capable flag (S) indicates this Hello message is sent from a ZRTP endpoint which is able to parse and verify digital signatures, as described in Section 7.2. If signatures are not supported, the (S) flag MUST be set to zero. o The MiTM flag (M) is a Boolean that is set to true if and only if this Hello message is sent from a device, usually a PBX, that has the capability to send an SASrelay message (Section 5.13). o The Passive flag (P) is a Boolean normally set to false, and is set to true if and only if this Hello message is sent from a device that is configured to never send a Commit message (Section 5.4). This would mean it cannot initiate secure sessions, but may act as a responder. The next 8 bits are unused and SHOULD be set to zero when sent and MUST be ignored on receipt. Next is a list of supported Hash algorithms, Cipher algorithms, SRTP Auth Tag Types, Key Agreement Types, and SAS Types. The number of listed algorithms are listed for each type: hc=hash count, cc=cipher count, ac=auth tag count, kc=key agreement count, and sc=sas count. The values for these algorithms are defined in Tables 2, 3, 4, 5, and
6. A count of zero means that only the mandatory-to-implement algorithms are supported. Mandatory algorithms MAY be included in the list. The order of the list indicates the preferences of the endpoint. If a mandatory algorithm is not included in the list, it is implicitly added to the end of the list for preference. The 64-bit MAC at the end of the message is computed across the whole message, not including the MAC, using the MAC algorithm defined in Section 5.1.2.2. The MAC key is the sender's H2 (defined in Section 9), and thus the MAC cannot be checked by the receiving party until the sender's H2 value is known to the receiving party later in the protocol.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Hello " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | version="1.10" (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Client Identifier (4 words) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash image H3 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZID (3 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0|S|M|P| unused (zeros)| hc | cc | ac | kc | sc | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | hash algorithms (0 to 7 values) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cipher algorithms (0 to 7 values) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | auth tag types (0 to 7 values) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Agreement Types (0 to 7 values) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAS Types (0 to 7 values) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3: Hello Message Format
5.3. HelloACK Message
The HelloACK message is used to stop retransmissions of a Hello message. A HelloACK is sent regardless if the version number in the Hello is supported or the algorithm list supported. The receipt of a HelloACK stops retransmission of the Hello message. The format is shown in the figure below. A Commit message may be sent in place of a HelloACK by an Initiator, if a Commit message is ready to be sent promptly. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="HelloACK" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 4: HelloACK Message Format5.4. Commit Message
The Commit message is sent to initiate the key agreement process after both sides have received a Hello message, which means it can only be sent after receiving both a Hello message and a HelloACK message. There are three subtypes of Commit messages, whose formats are shown in Figures 5, 6, and 7. The Commit message contains the Message Type Block, then the 256-bit hash image H2, which is defined in Section 9. The next parameter is the initiator's ZID, the 96-bit-long unique identifier for the ZRTP endpoint, which MUST have the same value as was used in the Hello message. Next, there is a list of algorithms selected by the initiator (hash, cipher, auth tag type, key agreement, sas type). For a DH Commit, the hash value hvi is a hash of the DHPart2 of the Initiator and the Responder's Hello message, as explained in Section 4.4.1.1. The 64-bit MAC at the end of the message is computed across the whole message, not including the MAC, using the MAC algorithm defined in Section 5.1.2.2. The MAC key is the sender's H1 (defined in Section 9), and thus the MAC cannot be checked by the receiving party until the sender's H1 value is known to the receiving party later in the protocol.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Commit " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash image H2 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZID (3 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | hash algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cipher algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | auth tag type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Agreement Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAS Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | hvi (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 5: DH Commit Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Commit " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash image H2 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZID (3 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | hash algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cipher algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | auth tag type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Agreement Type = "Mult" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAS Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | nonce (4 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 6: Multistream Commit Message Format
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Commit " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash image H2 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | ZID (3 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | hash algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cipher algorithm | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | auth tag type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Key Agreement Type = "Prsh" | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SAS Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | nonce (4 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | keyID (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 7: Preshared Commit Message Format
5.5. DHPart1 Message
The DHPart1 message shown in Figure 8 begins the DH exchange. It is sent by the Responder if a valid Commit message is received from the Initiator. The length of the pvr value and the length of the DHPart1 message depends on the Key Agreement Type chosen. This information is contained in the table in Section 5.1.5. Note that for both Multistream and Preshared modes, no DHPart1 or DHPart2 message will be sent. The 256-bit hash image H1 is defined in Section 9. The next four parameters are non-invertible hashes (computed in Section 4.3.1) of potential shared secrets used in generating the ZRTP secret s0. The first two, rs1IDr and rs2IDr, are the hashes of the responder's two retained shared secrets, truncated to 64 bits. Next, there is auxsecretIDr, a hash of the responder's auxsecret (defined in Section 4.3), truncated to 64 bits. The last parameter is a hash of the trusted MiTM PBX shared secret pbxsecret, defined in Section 7.3.1. The 64-bit MAC at the end of the message is computed across the whole message, not including the MAC, using the MAC algorithm defined in Section 5.1.2.2. The MAC key is the sender's H0 (defined in Section 9), and thus the MAC cannot be checked by the receiving party until the sender's H0 value is known to the receiving party later in the protocol.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="DHPart1 " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash image H1 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs1IDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs2IDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | auxsecretIDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | pbxsecretIDr (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | pvr (length depends on KA Type) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 8: DHPart1 Message Format
5.6. DHPart2 Message
The DHPart2 message, shown in Figure 9, completes the DH exchange. It is sent by the Initiator if a valid DHPart1 message is received from the Responder. The length of the pvi value and the length of the DHPart2 message depends on the Key Agreement Type chosen. This information is contained in the table in Section 5.1.5. Note that for both Multistream and Preshared modes, no DHPart1 or DHPart2 message will be sent. The 256-bit hash image H1 is defined in Section 9. The next four parameters are non-invertible hashes (computed in Section 4.3.1) of potential shared secrets used in generating the ZRTP secret s0. The first two, rs1IDi and rs2IDi, are the hashes of the initiator's two retained shared secrets, truncated to 64 bits. Next, there is auxsecretIDi, a hash of the initiator's auxsecret (defined in Section 4.3), truncated to 64 bits. The last parameter is a hash of the trusted MiTM PBX shared secret pbxsecret, defined in Section 7.3.1. The 64-bit MAC at the end of the message is computed across the whole message, not including the MAC, using the MAC algorithm defined in Section 5.1.2.2. The MAC key is the sender's H0 (defined in Section 9), and thus the MAC cannot be checked by the receiving party until the sender's H0 value is known to the receiving party later in the protocol.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="DHPart2 " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Hash image H1 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs1IDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rs2IDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | auxsecretIDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | pbxsecretIDi (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | pvi (length depends on KA Type) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 9: DHPart2 Message Format5.7. Confirm1 and Confirm2 Messages
The Confirm1 message is sent by the Responder in response to a valid DHPart2 message after the SRTP session key and parameters have been negotiated. The Confirm2 message is sent by the Initiator in response to a Confirm1 message. The format is shown in Figure 10. The message contains the Message Type Block "Confirm1" or "Confirm2". Next, there is the confirm_mac, a MAC computed over the encrypted part of the message (shown enclosed by "====" in Figure 10). This confirm_mac is keyed and computed according to Section 4.6. The next 16 octets contain the CFB Initialization Vector. The rest of the message is encrypted using CFB and protected by the confirm_mac.
The first field inside the encrypted region is the hash preimage H0, which is defined in detail in Section 9. The next 15 bits are not used and SHOULD be set to zero when sent and MUST be ignored when received in Confirm1 or Confirm2 messages. The next 9 bits contain the signature length. If no SAS signature (described in Section 7.2) is present, all bits are set to zero. The signature length is in words and includes the signature type block. If the calculated signature octet count is not a multiple of 4, zeros are added to pad it out to a word boundary. If no signature is present, the overall length of the Confirm1 or Confirm2 message will be set to 19 words. The next 8 bits are used for flags. Undefined flags are set to zero and ignored. Four flags are currently defined. The PBX Enrollment flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure Flag (D) is a Boolean bit defined in Section 11. The cache expiration interval is defined in Section 4.9. If the signature length (in words) is non-zero, a signature type block will be present along with a signature block. Next, there is the signature block. The signature block includes the signature and the key (or a link to the key) used to generate the signature (Section 7.2). CFB mode [NIST-SP800-38A] is applied with a feedback length of 128 bits, a full cipher block, and the final block is truncated to match the exact length of the encrypted data. The CFB Initialization Vector is a 128-bit random nonce. The block cipher algorithm and the key size are the same as the negotiated block cipher (Section 5.1.3) for media encryption. CFB is used to encrypt the part of the Confirm1 message beginning after the CFB IV to the end of the message (the encrypted region is enclosed by "====" in Figure 10). The responder uses the zrtpkeyr to encrypt the Confirm1 message. The initiator uses the zrtpkeyi to encrypt the Confirm2 message.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Confirm1" or "Confirm2" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | confirm_mac (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | CFB Initialization Vector (4 words) | | | | | +===============================================================+ | | | Hash preimage H0 (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | cache expiration interval (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optional signature type block (1 word if present) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | optional signature block (variable length) | | . . . | | | | | +===============================================================+ Figure 10: Confirm1 and Confirm2 Message Format
5.8. Conf2ACK Message
The Conf2ACK message is sent by the Responder in response to a valid Confirm2 message. The message format for the Conf2ACK is shown in the figure below. The receipt of a Conf2ACK stops retransmission of the Confirm2 message. Note that the first SRTP media (with a valid SRTP auth tag) from the responder also stops retransmission of the Confirm2 message. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Conf2ACK" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 11: Conf2ACK Message Format5.9. Error Message
The Error message is sent to terminate an in-process ZRTP key agreement exchange due to an error. The format is shown in the figure below. The use of the Error message is described in Section 4.7.1. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Error " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Integer Error Code (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 12: Error Message Format Defined hexadecimal values for the Error Code are listed in the table below.
Error Code | Meaning ----------------------------------------------------------- 0x10 | Malformed packet (CRC OK, but wrong structure) ----------------------------------------------------------- 0x20 | Critical software error ----------------------------------------------------------- 0x30 | Unsupported ZRTP version ----------------------------------------------------------- 0x40 | Hello components mismatch ----------------------------------------------------------- 0x51 | Hash Type not supported ----------------------------------------------------------- 0x52 | Cipher Type not supported ----------------------------------------------------------- 0x53 | Public key exchange not supported ----------------------------------------------------------- 0x54 | SRTP auth tag not supported ----------------------------------------------------------- 0x55 | SAS rendering scheme not supported ----------------------------------------------------------- 0x56 | No shared secret available, DH mode required ----------------------------------------------------------- 0x61 | DH Error: bad pvi or pvr ( == 1, 0, or p-1) ----------------------------------------------------------- 0x62 | DH Error: hvi != hashed data ----------------------------------------------------------- 0x63 | Received relayed SAS from untrusted MiTM ----------------------------------------------------------- 0x70 | Auth Error: Bad Confirm pkt MAC ----------------------------------------------------------- 0x80 | Nonce reuse ----------------------------------------------------------- 0x90 | Equal ZIDs in Hello ----------------------------------------------------------- 0x91 | SSRC collision ----------------------------------------------------------- 0xA0 | Service unavailable ----------------------------------------------------------- 0xB0 | Protocol timeout error ----------------------------------------------------------- 0x100 | GoClear message received, but not allowed ----------------------------------------------------------- Table 8. ZRTP Error Codes
5.10. ErrorACK Message
The ErrorACK message is sent in response to an Error message. The receipt of an ErrorACK stops retransmission of the Error message. The format is shown in the figure below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="ErrorACK" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 13: ErrorACK Message Format5.11. GoClear Message
Support for the GoClear message is OPTIONAL in the protocol, and it is sent to switch from SRTP to RTP. The format is shown in the figure below. The clear_mac is used to authenticate the GoClear message so that bogus GoClear messages introduced by an attacker can be detected and discarded. The use of GoClear is described in Section 4.7.2. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="GoClear " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | clear_mac (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 14: GoClear Message Format
5.12. ClearACK Message
Support for the ClearACK message is OPTIONAL in the protocol, and it is sent to acknowledge receipt of a GoClear. A ClearACK is only sent if the clear_mac from the GoClear message is authenticated. Otherwise, no response is returned. The format is shown in the figure below. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="ClearACK" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 15: ClearACK Message Format5.13. SASrelay Message
The SASrelay message is sent by a trusted MiTM, most often a PBX. It is not sent as a response to a packet, but is sent as a self- initiated packet by the trusted MiTM (Section 7.3). It can only be sent after the rest of the ZRTP key negotiations have completed, after the Confirm messages and their ACKs. It can only be sent after the trusted MiTM has finished key negotiations with the other party, because it is the other party's SAS that is being relayed. It is sent with retry logic until a RelayACK message (Section 5.14) is received or the retry schedule has been exhausted. If a device, usually a PBX, sends an SASrelay message, it MUST have previously declared itself as a MiTM device by setting the MiTM (M) flag in the Hello message (Section 5.2). If the receiver of the SASrelay message did not previously receive a Hello message with the MiTM (M) flag set, the Relayed SAS SHOULD NOT be rendered. A RelayACK is still sent, but no Error message is sent. The SASrelay message format is shown in Figure 16. The message contains the Message Type Block "SASrelay". Next, there is a MAC computed over the encrypted part of the message (shown enclosed by "====" in Figure 16). This MAC is keyed the same way as the confirm_mac in the Confirm messages (see Section 4.6). The next 16 octets contain the CFB Initialization Vector. The rest of the message is encrypted using CFB and protected by the MAC. The next 15 bits are not used and SHOULD be set to zero when sent, and they MUST be ignored when received in SASrelay messages.
The next 9 bits contain the signature length. The trusted MiTM MAY compute a digital signature on the SAS hash, as described in Section 7.2, using a persistent signing key owned by the trusted MiTM. If no SAS signature is present, all bits are set to zero. The signature length is in words and includes the signature type block. If the calculated signature octet count is not a multiple of 4, zeros are added to pad it out to a word boundary. If no signature block is present, the overall length of the SASrelay message will be set to 19 words. The next 8 bits are used for flags. Undefined flags are set to zero and ignored. Three flags are currently defined. The Disclosure Flag (D) is a Boolean bit defined in Section 11. The Allow Clear flag (A) is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V) is a Boolean bit defined in Section 7.1. These flags are updated values to the same flags provided earlier in the Confirm message, but they are updated to reflect the new flag information relayed by the PBX from the other party. The next 32-bit word contains the SAS rendering scheme for the relayed sashash, which will be the same rendering scheme used by the other party on the other side of the trusted MiTM. Section 7.3 describes how the PBX determines whether the ZRTP client regards the PBX as a trusted MiTM. If the PBX determines that the ZRTP client trusts the PBX, the next 8 words contain the sashash relayed from the other party. The first 32-bit word of the sashash contains the sasvalue, which may be rendered to the user using the specified SAS rendering scheme. If this SASrelay message is being sent to a ZRTP client that does not trust this MiTM, the sashash will be ignored by the recipient and should be set to zeros by the PBX. If the signature length (in words) is non-zero, a signature type block will be present along with a signature block. Next, there is the signature block. The signature block includes the signature and the key (or a link to the key) used to generate the signature (Section 7.2). CFB mode [NIST-SP800-38A] is applied with a feedback length of 128 bits, a full cipher block, and the final block is truncated to match the exact length of the encrypted data. The CFB Initialization Vector is a 128-bit random nonce. The block cipher algorithm and the key size is same as the negotiated block cipher (Section 5.1.3) for media encryption. CFB is used to encrypt the part of the SASrelay message beginning after the CFB IV to the end of the message (the encrypted region is enclosed by "====" in Figure 16).
Depending on whether the trusted MiTM had taken the role of the initiator or the responder during the ZRTP key negotiation, the SASrelay message is encrypted with zrtpkeyi or zrtpkeyr. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="SASrelay" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | MAC (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | CFB Initialization Vector (4 words) | | | | | +===============================================================+ | Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | rendering scheme of relayed SAS (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | Trusted MiTM relayed sashash (8 words) | | . . . | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | optional signature type block (1 word if present) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | optional signature block (variable length) | | . . . | | | | | +===============================================================+ Figure 16: SASrelay Message Format
5.14. RelayACK Message
The RelayACK message is sent in response to a valid SASrelay message. The message format for the RelayACK is shown in the figure below. The receipt of a RelayACK stops retransmission of the SASrelay message. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="RelayACK" (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 17: RelayACK Message Format5.15. Ping Message
The Ping and PingACK messages are unrelated to the rest of the ZRTP protocol. No ZRTP endpoint is required to generate a Ping message, but every ZRTP endpoint MUST respond to a Ping message with a PingACK message. Although Ping and PingACK messages have no effect on the rest of the ZRTP protocol, their inclusion in this specification simplifies the design of "bump-in-the-wire" ZRTP proxies (Section 10) (notably, [Zfone]). It enables proxies to be designed that do not rely on assistance from the signaling layer to map out the associations between media streams and ZRTP endpoints. Before sending a ZRTP Hello message, a ZRTP proxy MAY send a Ping message as a means to sort out which RTP media streams are connected to particular ZRTP endpoints. Ping messages are generated only by ZRTP proxies. If neither party is a ZRTP proxy, no Ping messages will be encountered. Ping retransmission behavior is discussed in Section 6. The Ping message (Figure 18) contains an "EndpointHash", defined in Section 5.16. The Ping message contains a version number that defines what version of PingACK is requested. If that version number is supported by the Ping responder, a PingACK with a format that matches that version will be received. Otherwise, a PingACK with a lower version number may be received.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=6 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="Ping " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | version="1.10" (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EndpointHash (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 18: Ping Message Format5.16. PingACK Message
A PingACK message is sent only in response to a Ping. A ZRTP endpoint MUST respond to a Ping with a PingACK message. The version of PingACK requested is contained in the Ping message. If that version number is supported, a PingACK with a format that matches that version MUST be sent. Otherwise, if the version number of the Ping is not supported, a PingACK SHOULD be sent in the format of the highest supported version known to the Ping responder. Only version "1.10" is supported in this specification. The PingACK message carries its own 64-bit EndpointHash, distinct from the EndpointHash of the other party's Ping message. It is REQUIRED that it be highly improbable for two participants in a call to have the same EndpointHash and that an EndpointHash maintains a persistent value between calls. For a normal ZRTP endpoint, such as a ZRTP-enabled VoIP client, the EndpointHash can be just the truncated ZID. For a ZRTP endpoint such as a PBX that has multiple endpoints behind it, the EndpointHash must be a distinct value for each endpoint behind it. It is recommended that the EndpointHash be a truncated hash of the ZID of the ZRTP endpoint concatenated with something unique about the actual endpoint or phone behind the PBX. This may be the SIP URI of the phone, the PBX extension number, or the local IP address of the phone, whichever is more readily available in the application environment: EndpointHash = hash(ZID || SIP URI of the endpoint) EndpointHash = hash(ZID || PBX extension number of the endpoint) EndpointHash = hash(ZID || local IP address of the endpoint)
Any of these formulae confer uniqueness for the simple case of terminating the ZRTP connection at the VoIP client, or the more complex case of a PBX terminating the ZRTP connection for multiple VoIP phones in a conference call, all sharing the PBX's ZID, but with separate IP addresses behind the PBX. There is no requirement for the same hash function to be used by both parties. The PingACK message contains the EndpointHash of the sender of the PingACK as well as the EndpointHash of the sender of the Ping. The Source Identifier (SSRC) received in the ZRTP header from the Ping packet (Figure 2) is copied into the PingACK message body (Figure 19). This SSRC is not the SSRC of the sender of the PingACK. 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=9 words | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Message Type Block="PingACK " (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | version="1.10" (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EndpointHash of PingACK Sender (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | EndpointHash of Received Ping (2 words) | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Source Identifier (SSRC) of Received Ping (1 word) | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 19: PingACK Message Format