4.2. Buffering of Sample Descriptions
The buffering of sample descriptions is a matter of the client's timed text codec implementation. In order to work properly, this payload format requires that: o Static sample descriptions MUST be buffered at the client, at least, for the duration of the session.
o If dynamic sample descriptions are used, their buffering and update of the SIDX values MUST follow the mechanism described in the next section.4.2.1. Dynamic SIDX Wraparound Mechanism
The use of dynamic sample descriptions by senders is OPTIONAL. However, if they are used, senders MUST implement this mechanism. Receivers MUST always implement it. Dynamic SIDX values remain active either during the entire duration of the session (if used just once) or in different intervals of it (if used once or more). Note: In the following, SIDX means dynamic SIDX. For choosing the wraparound mechanism, the following rationale was used: There are 128 dynamic SIDX values possible, [0..127]. If one chooses to allow a maximum of 127 to be used as dynamic SIDXs, then any reordered packet with a new sample description would make the mechanism fail. For example, if the last packet received is SIDX=5, then all 127 values except SIDX=6 would be "active". Now, if a reordered packet arrives with a new description, SIDX=9, it will be mistakenly discarded, because the SIDX=9 is, at that moment, marked as "active" and active sample descriptions shall not be re-written. Therefore, a "guard interval" is introduced. This guard interval reduces the number of active SIDXs at any point in time to 64. Although most timed text applications will probably need less than 64 sample descriptions during a session (in total), a wraparound mechanism to handle the need for more is described here. Thereby, a sliding window of 64 active SIDX values is used. Values within the window are "active"; all others are marked "inactive". An SIDX value becomes active if at least one sample description identified by that SIDX has been received. Since sample descriptions MAY be sent redundantly, it is possible that a client receives a given SIDX several times. However, active sample descriptions SHALL NOT be overwritten: The receiver SHALL ignore redundant sample descriptions and it MUST use the already cached copy. The "guard interval" of (64) inactive values ensures that the correct association SIDX <-> sample description is always used. Informative note: As for the "guard interval" value itself, 64 as 128/2 was considered simple enough while still meeting the expected maximum number of sample descriptions. Besides that, there's no other motivation for choosing 64 or a different value.
The following algorithm is used to buffer dynamic sample descriptions and to maintain the dynamic SIDX values: Let X be the last SIDX received that updated the range of active sample descriptions. Let Y be a value within the allowed range for dynamic SIDX: [0,127], and different from X. Let Z be the SIDX of the last received sample description. Then: 1. Initialize all dynamic SIDX values as inactive. For stored contents, read the sample description index in the Sample to Chunk box ("stsc") for that sample. For live streaming, the first value MAY be zero or any other value in the interval above. Go to step 2. 2. First, in-band sample description with SIDX=Z is received and stored; set X=Z. Go to step 3. 3. Any SIDX within the interval [X+1 modulo(128), X+64 modulo(128)] is marked as inactive, and any corresponding sample description is deleted. Any SIDX within the interval [X+65 modulo(128), X] is set active. Go to step 4 (wait state). 4. Wait for next sample description. Once the client is initialized, the interval of active SIDX values MUST change whenever a sample description with an SIDX value in the inactive set is received. That is, upon reception of a sample description with SIDX=Z, do the following: a. If Z is in the (closed) interval [X+1 modulo(128), X+64 modulo(128)] then set X=Z, store the sample description, and go to step 3. b. Else, Z must be in the interval [X+65 modulo(128), X], thus: i. If SIDX=Z is not stored, then store the sample description. Go to beginning of step 4 (wait state). ii. Else, go to the beginning of step 4 (wait state). Informative note: It is allowed that any value of SIDX=X be sent in the interval [0,127]. For example, if [64..127] is the current active set and SIDX=0 is sent, a new sample description is defined (0) and an old one deleted (64); thus [65..127] and [0] are active. Similarly, one could now send SIDX=64, thus inverting the active and inactive sets. Example: If X=4, any SIDX in the interval [5,68] is inactive. Active SIDX values are in the complementary interval [69,127] plus
[0,4]. For example, if the client receives a SIDX=6, then the active interval is now different: [0,6] plus [71,127]. If the received SIDX is in the current active interval, no change SHALL be applied.4.3. Finding Payload Header Values in 3GP Files
For the purpose of streaming timed text contents, some values in the boxes contained in a 3GP file are mapped to fields of this payload header. This section explains where to find those values. Additionally, for the duration and sample description indexes, extension mechanisms are provided. All senders MUST implement the extension mechanisms described herein. If the file is streamed out of a 3GP file, the following guidelines SHALL be followed. Note: All fields in the objects (boxes) of a 3GP file are found in network byte order. Information obtained from the Sample Table Box (stbl): o Sample Descriptions and Sample Description length: The Sample Description box (stsd, inside the stbl) contains the sample descriptions. For timed text media, each element of stsd is a timed text sample entry (type "tx3g"). The (unsigned) 32 bits of the "size" field in the stsd box represent the length (in bytes) of the sample description, as carried in TYPE 5 units. On the other hand, the LEN field of TYPE 5 units is restricted to 16 bits. Therefore, if the value of "size" is greater than (2^16-1-3)[bytes], then the sample description SHALL NOT be streamed with this payload format. There is no extension mechanism defined in this case, since fragmentation of sample descriptions is not defined (sample descriptions are typically up to some 200 bytes in size). Note: The three (3) accounts for the TYPE 5 header fields included in the LEN value. o SDUR from the Decoding Time to Sample Box (stts). The (unsigned) 32 bits of the "sample delta" field are used for calculating SDUR. However, since the SDUR field is only 3 bytes long, text samples with duration values larger than (2^24-1)/(timestamp clockrate)[seconds] cannot be streamed directly. The solution is simple: Copies of the corresponding text sample SHALL be sent. Thereby, the timestamp and duration values SHALL be adjusted so that a continuous display
is guaranteed as if just one sample would have been sent. That is, a sample with timestamp TS and duration SDUR can be sent as two samples having timestamps TS1 and TS2 and durations SDUR1 and SDUR2, such that TS1=TS, TS2=TS1+SDUR1, and SDUR=SDUR1+SDUR2. o Text sample length from the Sample Size Box (stsz). The (unsigned) 32 bits of the "sample size" or "entry size" (one of them, depending on whether the sample size is fixed or variable) indicate the length (in bytes) of the 3GP text sample. For obtaining the length of the (actual) streamed text sample, the lengths of the text string byte count (2 bytes) and, in case of UTF-16 strings, the length the BOM (also 2 bytes) SHALL be deducted. This is illustrated in Figure 9. Text Sample according to 3GPP TS 26.245 TEXT SAMPLE (length=stsz) .--------------------------------------------------. / \ TEXT STRING (length=TBC) .------------------------------------. / \ TBC BOM MODIFIERS +---+---+----------------------------------+-----------+ || || TBC BOM -> TLEN field || +---+---+ U bit || \/ Text Sample according to this Payload Format TEXT SAMPLE (length=SLEN w/o TBC,BOM) .--------------------------------------------. / \ TEXT STRING (length=TLEN) .--------------------------------. / \ TEXT STRING MODIFIERS +----------------------------------+-----------+ KEY: TBC = Text string Byte Count BOM = Byte Order Mark Figure 9. Text sample composition
Moreover, since the LEN field in TYPE 1 unit header is 16 bits long, larger text sample sizes than (2^16-1-8) [bytes] SHALL NOT be streamed. Also, in this case, no extension mechanism is defined. This is because this maximum is considered enough for the targeted streaming applications. (Note: The eight (8) accounts for the TYPE 1 header fields included in the LEN value). o SIDX from the Sample to Chunk Box (stsc): The stsc Box is used to find samples and their corresponding sample descriptions. These are referenced by the "sample description index", a 32-bit (unsigned) integer. If possible, these indices may be directly mapped to the SIDX field. However, there are several cases where this may not be possible: a) The total number of indices used is greater than the number of indices available, i.e., if the static sample descriptions are more than 127 or the dynamic ones are more than 64. b) The original SIDX value ranges do not fit in the allowed ranges for static (129-254) or dynamic (0-127) values. Therefore, when assigning SIDX values to the sample descriptions, the following guidelines are provided: o Static sample descriptions can simply be assigned consecutive values within the range 129-254 (closed interval). This range should be well enough for static sample descriptions. o As for dynamic sample descriptions: a) Streams that use less than 64 dynamic sample descriptions SHOULD use consecutive values for SIDX anywhere in the range 0-127 (closed interval). b) For streams with more than 64 sample descriptions, the SIDX values MUST be assigned in usage order, and if any sample description shall be used after it has been set inactive, it will need to be re-sent and assigned a new SIDX value (according to the algorithm in Section 4.2.1).
Information obtained from the Media Data Box: o Text strings, TLEN, U bit, and modifiers from the Media Data Box (mdat). Text strings, 16-bit text string byte count, Byte Order Mark (BOM, indicating UTF encoding), and modifier boxes can be found here. For TYPE 1 units, the value of TLEN is extracted from the text string byte count that precedes the text string in the text sample, as stored in the 3GP file. If UTF-16 encoding is used, two (2) more bytes have to be deducted from this byte count beforehand, in order to exclude the BOM. See Figure 9.4.4. Fragmentation of Timed Text Samples
This section explains why text samples may have to be fragmented and discusses some of the possible approaches to doing it. A solution is proposed together with rules and recommendations for fragmenting and transporting text samples. 3GPP Timed Text applications are expected to operate at low bitrates. This fact, added to the small size of timed text samples (typically one or two hundred bytes) makes fragmentation of text samples a rare event. Samples should usually fit into the MTU size of the used network path. Nevertheless, some text strings (e.g., ending roll in a movie) and some modifier boxes (i.e., for hyperlinks, for karaoke, or for styles) may become large. This may also apply for future modifier boxes. In such cases, the first option to consider is whether it is possible to adjust the encoding (e.g., the size of sample) in such a way that fragmentation is avoided. If it is, this is preferred to fragmentation and SHOULD be done. Otherwise, if this is not possible or other constraints prevent it, fragmentation MAY be used, and the basic guidelines given in this document MUST be followed: o It is RECOMMENDED that text samples be fragmented as seldom as possible, i.e., the least possible number of fragments is created out of a text sample. o If there is some bitrate and free space in the payload available, sample descriptions (if at hand) SHOULD be aggregated. o Text strings MUST split at character boundaries; see TYPE 2 header. Otherwise, it is not possible to display the text contents of a fragment if a previous fragment was lost. As a consequence, text
string fragmentation requires knowledge of the UTF-8/UTF-16 encoding formats to determine character boundaries. o Unlike text strings, the modifier boxes are NOT REQUIRED to be split at meaningful boundaries. However, it is RECOMMENDED that this be done whenever possible. This decreases the effects of packet loss. This payload format does not ensure that partially received modifiers are applied to text strings. If only part of the modifiers is received, it is an application issue how to deal with these, i.e., whether or not to use them. Informative note: Ensuring that partially received modifiers can be applied to text strings in all cases (for all modifier types and for all fragment loss constellations) would place additional requirements on the payload format. In particular, this would require that: a) senders understand the semantics of the modifier boxes and b) specific fragment headers for each of the modifier boxes are defined, in addition to the payload formats defined below. Understanding the modifiers semantics means knowing, e.g., where each modifier starts and ends, which text fragments are affected, which modifiers may or may not be split, or what the fields indicate. This is necessary to be able to split the modifiers in such a way that each fragment can be applied independently of previous packet losses. This would require a more intelligent fragmentation entity and more complex headers. Given the low probability of fragmentation and the desire to keep the requirements low, it does not seem reasonable to specify such modifier box specific headers. o Modifier and text string fragments SHOULD be protected against packet losses, i.e., using FEC [7], retransmission [11], repetition (Section 5), or an equivalent technique. This minimizes the effects of packet loss. o An additional requirement when fragmenting text samples is that the start of the modifiers MUST be indicated using the payload header defined for that purpose, i.e., a TYPE 3 unit MUST be used (see Section 4.1.4). This enables a receiver to detect the start of the modifiers as long as there are not two or more consecutive packet losses. o Finally, sample descriptions SHALL NOT be fragmented because they contain important information that may affect several text samples.
4.5. Reassembling Text Samples at the Receiver
The payload headers defined in this document allow reassembling fragmented text samples. For this purpose, the standard RTP timestamp, the duration field (SDUR), and the fields TOTAL/THIS in the payload headers are used. Units that belong to the same text sample MUST have the same timestamp. TYPE 5 units do not comply with this rule since they are not part of any particular text sample. The process for collecting the different fragments (units) of a text sample is as follows: 1. Search for units having the same timestamp value, i.e., units that belong to the same text sample or sample descriptions that shall become available at that time instant. If several units of the same sample are repeated, only one of them SHALL be used. Repeated units are those that have the same timestamp and the same values for TOTAL/THIS. Note that, as mentioned in Section 4.1.1, the receiver SHALL ignore units with unrecognized TYPE value. However, the RTP header fields and the rest of the units (if any) in the payload are still useful. 2. Check within this set whether any of the units from the text sample is missing. This is done using the TOTAL and THIS fields; the TOTAL field indicates how many fragments were created out of the text sample, and the THIS field indicates the position of this fragment in the text sample. As result of this operation, two outcomes are possible: a. No fragment is missing. Then, the THIS field SHALL be used to order the fragments and reassemble the text sample before forwarding it to the decoding application. Special care SHALL be taken when reassembling the text string as indicated in bullet 4 below. b. One or more fragments are missing: Check whether this fragment belongs to the text string or to the modifiers. TYPE 2 units identify text string fragments, and TYPE 3 and 4 identify modifier fragments: i. If the fragment or fragments missing belong to the text string and the modifiers were received complete, then the received text characters may, at least, be displayed as plain text. Some modifiers may only be
applied as long as it is possible to identify the character numbers, e.g., if only the last text string fragment is lost. This is the case for modifiers defining specific font styles ('styl'), highlighted characters ('hlit'), karaoke feature ('krok'), and blinking characters ('blnk'). Other modifiers such as 'dlay' or 'tbox' can be applied without the knowledge of the character number. It is an application issue to decide whether or not to apply the modifiers. ii. If the fragment missing belongs to the modifiers and the text strings were received complete, then the incomplete modifiers may be used. The text string SHOULD at least be displayed as plain text. As mentioned in Section 4.4, modifiers may split without observing meaningful boundaries. Hence, it may not always be possible to make use of partially received modifiers. However, to avoid this, it is RECOMMENDED that the modifiers do split at meaningful boundaries. iii. A third possibility is that it is not possible to discern whether modifiers or text strings were received complete. For example, if the TYPE 3 unit of a sample plus the following or preceding packet is lost, there is no way for the RTP receiver to know if one or both packets lost belong to the modifiers or if there are also some missing text strings. Repetition, FEC, retransmission, or other protection mechanisms as per section 4.6 are RECOMMENDED to avoid this situation. iv. Finally, if it is sure that neither text strings nor modifiers were received complete, then the text strings and the modifiers may be rendered partially or may be discarded. This is an application choice. 3. Sample descriptions can be directly associated with the reassembled text samples, via the sample description index (SIDX). 4. Reassembling of text strings: Since the text strings transported in RTP packets MUST NOT include any byte order mark (BOM), the receiver MUST prepend it to the reassembled UTF-16 string before handling it to the timed text decoder (see Figure 9). The value of the BOM is 0xFEFF because only big endian serialization of UTF-16 strings is supported by this payload format.
4.6. On Aggregate Payloads
Units SHOULD be aggregated to avoid overhead, whenever possible. The aggregate payloads MUST comply with one of the following ordered configurations: 1. Zero or more sample descriptions (TYPE 5) followed by zero or more whole text samples (TYPE 1 units). At least one unit of either type MUST be present. 2. Zero or more sample descriptions followed by zero or one modifier fragment, either TYPE 3 or TYPE 4. At least one unit MUST be present. 3. Zero or more sample descriptions, followed by zero or one text string fragment (TYPE 2), followed by zero or one TYPE 3 unit. If a TYPE 2 unit and a TYPE 3 unit are present, then they MUST belong to the same text sample. At least one unit MUST be present. Some observations: o Different aggregates than the ones listed above SHALL NOT be used. o Sample descriptions MUST be placed in the aggregate payload before the occurrence of any non-TYPE 5 units. o Correct reception of TYPE 5 units is important since their contents may be referenced by several other units in the stream. Receivers are unable to use text samples until their corresponding sample descriptions are received. Accordingly, a sender SHOULD send multiple copies of a sample description to ensure reliability (see Section 5). Receivers MAY use payload-specific feedback messages [21] to tell a sender that they have received a particular sample description. o Regarding timestamp calculation: In general, the rules for calculating the timestamp of units in an aggregate payload depend on the type of unit. Based on the possible constellations for aggregate payloads, as above, we have: o Sample descriptions MUST receive the RTP timestamp of the packet in which they are included. Note that for TYPE 5 units, the timestamp actually does not represent the instant when they are played out, but instead the instant at which they become available for use.
o For the first configuration: The first TYPE 1 unit receives the RTP timestamp. The timestamp of any subsequent TYPE 1 unit MUST be obtained by adding sample duration and timestamp, both of the preceding TYPE 1 unit. o For the second and third configuration, all units, TYPE 2, 3, and 4, MUST receive the RTP timestamp. Refer to detailed examples on the timestamp calculation below. o As per configuration 3 above, a payload MAY contain several fragments of one (and only one) text sample. If it does, then exactly one TYPE 2 unit followed by exactly one TYPE 3 unit is allowed in the same payload. This is in line with RFC 3640 [12], Section 2.4, which explicitly disallows combining fragments of different samples in the same RTP payload. Note that, in this special case, no timestamp calculation is needed. That is, the RTP timestamp of both units is equal to the timestamp in the packet's RTP header. o Finally, note that the use of empty text samples allows for aggregating non-consecutive TYPE 1 units in the same payload. Two text samples, with timestamps TS1 and TS3 and durations SDUR1 and SDUR3, are not consecutive if it holds TS1+SDUR1 < TS3. A solution for this is to include an empty TYPE 1 unit with duration SDUR2 between them, such that TS2+SDUR2 = TS1+SDUR1+SDUR2 = TS3. Some examples of aggregate payloads are illustrated in Figure 10. (Note: The figure is not scaled.)
N/A TS1 TS2 TS3 +------+-----+------+-----+ |TYPE5 |TYPE1|TYPE1 |TYPE1| +------+-----+------+-----+ N/A sdur1 sdur2 sdur3 N/A TS4 +-----+-------+ |TYPE5| TYPE 1| a) +-----+-------+ N/A sdur4 TS4 TS4 TS4 +--------------+ +--------------+ | TYPE2 | |TYPE2 |TYPE 3 | b) +--------------+ +--------------+ sdur4 sdur4 sdur4 TS4 TS4 +--------------+ +--------------+ | TYPE2| TYPE 3| | TYPE4 | c) +--------------+ +--------------+ sdur4 sdur4 sdur4 |----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---| rtpts1 rtpts2 rtpts3 KEY: TSx = Text Sample x rtptsy = the standard RTP timestamp for PAYLOAD y sdurx = the duration of Text Sample x N/A = not applicable Figure 10. Example aggregate payloads In Figure 10, four text samples (TS1 through TS4) are sent using three RTP packets. These configurations have been chosen to show how the 5 TYPE headers are used. Additionally, three different possibilities for the last text sample, TS4, are depicted: a), b), and c). In Figure 11, option b) from Figure 10 is chosen to illustrate how the timestamp for each unit is found.
N/A TS1 TS2 TS3 TS4 TS4 TS4 +------+-----+------+-----+ +--------------+ +--------------+ |TYPE5 |TYPE1|TYPE1 |TYPE1| | TYPE2 | |TYPE2 |TYPE 3 | +------+-----+------+-----+ +--------------+ +--------------+ N/A sdur1 sdur2 sdur3 sdur4 sdur4 sdur4 (#1) (#2) (#3) (#4) (#5) (#6) (#7) |----------PAYLOAD 1------| |--PAYLOAD 2---| |--PAYLOAD 3---| rtpts1 rtpts2 rtpts3 Figure 11. Selected payloads from Figure 10 Assuming TSx means Text Sample x, rtptsy represents the standard RTP timestamp for PAYLOAD y and sdurx, the duration of Text Sample x, the timestamp for unit #z, ts(#z), can be found as the sum of rtptsy and the cumulative sum of the durations of preceding units in that payload (except in the case of PAYLOAD 3 as per rule 3 above). Thus, we have: 1. for the units in the first aggregate payload, PAYLOAD 1: ts(#1) = rtpts1 ts(#2) = rtpts1 ts(#3) = rtpts1 + sdur1 ts(#4) = rtpts1 + sdur1 + sdur2 Note that the TYPE 5 and the first TYPE 1 unit have both the RTP timestamp. 2. for PAYLOAD 2: ts(#5) = rtpts2 3. for PAYLOAD 3: ts(#6) = ts(#7) = rtpsts2 = rtpts3 According to configuration 3 above, the TYPE2 and the TYPE 3 units shall belong to the same sample. Hence, rtpts3 must be equal to rtpts2. For the same reason, the value of SDUR is not be used to calculate the timestamp of the next unit.
4.7. Payload Examples
Some examples of payloads using the defined headers are shown below: 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +---------------+ | | text string (no.bytes=TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modifiers (no.bytes=LEN - 8 - TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +---------------+ | | text string (no.bytes=TLEN) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | modifiers (no.bytes=LEN - 8 - TLEN) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 12. A payload carrying two TYPE 1 units In Figure 12, an RTP packet carrying two TYPE 1 units is depicted. It can be seen how the length fields LEN and TLEN can be used to find the start of the next unit (LEN), the start of the modifiers (TLEN), and the length of the modifiers (LEN-TLEN).
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE5| LEN( always >3) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | sample description (no.bytes=LEN - 3) | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE1| LEN (always >=8) | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | TLEN | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TLEN | | +-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=TLEN) | | | | | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 13. An RTP packet carrying a TYPE 5 and a TYPE 1 unit In Figure 13, a sample description and a TYPE 1 unit are aggregated. The TYPE 1 unit happens to contain only text strings and is small, so an additional TYPE 5 unit is included to take advantage of the available bits in the packet.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=1 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=LEN - 9) | | | : : : : | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 14. Payload with first text string fragment of a sample In Figures 14, 15, and 16, a text sample is split into three RTP packets. In Figure 14, the text string is big and takes the whole packet length. In Figure 15, the only possibility for carrying two fragments of the same text sample is represented (see configuration 3 in Section 4.6). The last packet, shown in Figure 16, carries the last modifier fragment, a TYPE 4.
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE2| LEN( always >9) |TOTAL=4|THIS=2 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | SIDX | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SLEN | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | text string fragment (no.bytes=LEN - 9) | | | | | | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE3| LEN( always >6) |TOTAL=4|THIS=3 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | modifiers (no.bytes=LEN - 6) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 15. An RTP packet carrying a TYPE 2 unit and a TYPE 3 unit
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |V=2|P|X| CC |M| PT | sequence number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | timestamp | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | synchronization source (SSRC) identifier | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |U| R |TYPE4| LEN( always >6) |TOTAL=4|THIS=4 | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SDUR | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | | | modifiers (no.bytes=LEN - 6) | | +-+-+-+-+-+-+-+-+ | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 16. An RTP packet carrying last modifiers fragment (TYPE 4)4.8. Relation to RFC 3640
RFC 3640 [12] defines a payload format for the transport of any non- multiplexed MPEG-4 elementary stream. One of the various MPEG-4 elementary stream types is MPEG-4 timed text streams, specified in MPEG-4 part 17 [26], also known as ISO/IEC 14496-17. MPEG-4 timed text streams are capable of carrying 3GPP timed text data, as specified in 3GPP TS 26.245 [1]. MPEG-4 timed text streams are intentionally constructed so as to guarantee interoperability between RFC 3640 and this payload format. This means that the construction of the RTP packets carrying timed text is the same. That is, the MPEG-4 timed text elementary stream as per ISO/IEC 14496-17 is identical to the (aggregate) payloads constructed using this payload format. Figure 17 illustrates the process of constructing an RTP packet containing timed text. As can be seen in the partition block, the (transport) units used in this payload format are identical to the Timed Text Units (TTUs) defined in ISO/IEC 14496-17. Likewise, the rules for payload aggregation as per Section 4.6 are identical to those defined in ISO/IEC 14496-17 and are compliant with RFC 3640. As a result, an RTP packet that uses this payload format is identical to an RTP packet using RFC 3640 conveying TTUs according to ISO/IEC 14496-17. In particular, MPEG-4 Part 17 specifies that when using
RFC 3640 for transporting timed text streams, the "streamType" parameter value is set to 0x0D, and the value of the "objectTypeIndication" in "config" takes the value 0x08. +--------------------------------------+ Text samples | +--------------+ +--------------+ | as per 3GPP | |Text Sample 1 | |Text Sample N | | TS 26245 | +--------------+ +--------------+ | +--------------------------------------+ \/ +-------------------------------------------------------------------+ | Partition Text Samples into units. TTU[i]= TYPE i units. | | | |[U R TYPE LEN][{TOTAL,THIS}SIDX{SDUR}{TLEN}{SLEN}][SampleContents] | |{..} means present if applicable, [..] means always present | +-------------------------------------------------------------------+ \/ \/ +-------------------------------------------------------------------+ | Aggregation (if possible) | +-------------------------------------------------------------------+ \/ \/ +-------------------------------------------------------------------+ | RTP Entity adds and fills RTP header and Sends RTP packet, where | | RTP packets according to this Payload Format = | | RTP packets carrying MPEG-4 Timed Text ES over RFC 3640 | +-------------------------------------------------------------------+ Figure 17. Relation to RFC 3640 Note: The use of RFC 3640 for transport of ISO/IEC 14496-17 data does not require any new SDP parameters or any new mode definition.4.9. Relation to RFC 2793
RFC 2793 [22] and its revision, RFC 4103 [23], specify a protocol for enabling text conversation. Typical applications of this payload format are text communication terminals and text conferencing tools. Text session contents are specified in ITU-T Recommendation T.140 [24]. T.140 text is UTF-8 coded as specified in T.140 [24] with no extra framing. The T140block contains one or more T.140 code elements as specified in T.140. Code elements are control sequences such as "New Line", "Interrupt", "String Terminator", or "Start of String". Most T.140 code elements are single ISO 10646 [25] characters, but some are multiple character sequences. Each character is UTF-8 encoded [18] into one or more octets.
This payload format may also be used for conversational applications (even for instant messaging). However, this is not its main target. The differentiating feature of 3GPP Timed Text media format is that it allows text decoration. This is especially useful in multimedia presentations, karaoke, commercial banners, news tickers, clickable text strings, and captions. T.140 text contents used in RFC 2793 do not allow the use of text decoration. Furthermore, the conversational text RTP payload format recommends a method to include redundant text from already transmitted packets in order to reduce the risk of text loss caused by packet loss. Thereby payloads would include a redundant copy of the last payload sent. This payload format does not describe such a method, but this is also applicable here. As explained in Section 5, packet redundancy SHOULD be used, whenever possible. The aggregation guidelines in Section 4.6 allow redundant payloads.