Three classes of numbers are of interest: unsigned integers (uint), signed integers (two's complement, sint), and IEEE 754 binary floating point numbers (which are always signed). For each of these classes, there are multiple representation lengths in active use:
Length ll |
uint |
sint |
float |
0 |
uint8 |
sint8 |
binary16 |
1 |
uint16 |
sint16 |
binary32 |
2 |
uint32 |
sint32 |
binary64 |
3 |
uint64 |
sint64 |
binary128 |
Table 1: Length Values
Here, sintN stands for a signed integer of exactly N bits (for instance, sint16), and uintN stands for an unsigned integer of exactly N bits (for instance, uint32). The name binaryN stands for the number form of the same name defined in IEEE 754 [
IEEE754].
Since one objective of these tags is to be able to directly ship the ArrayBuffers underlying the Typed Arrays without re-encoding them, and these may be either in big-endian (network byte order) or in little-endian form, we need to define tags for both variants.
In total, this leads to 24 variants. In the tag, we need to express the choice between integer and floating point, the signedness (for integers), the endianness, and one of the four length values.
In order to simplify implementation, a range of tags is being allocated that allows retrieving all this information from the bits of the tag: tag values from 64 to 87.
The value is split up into 5 bit fields: 0b010, f, s, e, and ll as detailed in
Table 2.
Field |
Use |
0b010 |
the constant bits 0, 1, 0 |
f |
0 for integer, 1 for float |
s |
0 for float or unsigned integer, 1 for signed integer |
e |
0 for big endian, 1 for little endian |
ll |
A number for the length (Table 1). |
Table 2: Bit Fields in the Low 8 Bits of the Tag
The number of bytes in each array element can then be calculated by
2**(f + ll) (or
1 << (f + ll) in a typical programming language). (Notice that 0f and ll are the two least significant bits, respectively, of each 4-bit nibble in the byte.)
In the CBOR representation, the total number of elements in the array is not expressed explicitly but is implied from the length of the byte string and the length of each representation. It can be computed from the length, in bytes, of the byte string comprising the representation of the array by inverting the previous formula:
bytelength >> (f + ll).
For the uint8/sint8 values, the endianness is redundant. Only the tag for the big-endian variant is used and assigned as such. The tag that would signify the little-endian variant of sint8
MUST NOT be used; its tag number is marked as reserved. As a special case, the tag that would signify the little-endian variant of uint8 is instead assigned to signify that the numbers in the array are using clamped conversion from integers, as described in more detail in
Section 7.1.11 of the ES10 JavaScript specification (
ToUint8Clamp) [
ECMA-ES10]; the assumption here is that a program-internal representation of this array after decoding would be marked this way for further processing providing "roundtripping" of JavaScript-typed arrays through CBOR.
IEEE 754 binary floating numbers are always signed. Therefore, for the float variants (
f == 1), there is no need to distinguish between signed and unsigned variants; the
s bit is always zero. The tag numbers where
s would be one (which would have tag values 88 to 95) remain free to use by other specifications.