10. References
10.1. Normative References
[ECMA262] European Computer Manufacturers Association, "ECMAScript Language Specification 5.1 Edition", ECMA Standard ECMA-262, June 2011, <http://www.ecma-international.org/ publications/files/ecma-st/ECMA-262.pdf>. [RFC2045] Freed, N. and N. Borenstein, "Multipurpose Internet Mail Extensions (MIME) Part One: Format of Internet Message Bodies", RFC 2045, November 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC3339] Klyne, G., Ed. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002. [RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003. [RFC3986] Berners-Lee, T., Fielding, R., and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005. [RFC4287] Nottingham, M., Ed. and R. Sayre, Ed., "The Atom Syndication Format", RFC 4287, December 2005. [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data Encodings", RFC 4648, October 2006. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008.
[TIME_T] The Open Group Base Specifications, "Vol. 1: Base Definitions, Issue 7", Section 4.15 'Seconds Since the Epoch', IEEE Std 1003.1, 2013 Edition, 2013, <http://pubs.opengroup.org/onlinepubs/9699919799/ basedefs/V1_chap04.html#tag_04_15>.10.2. Informative References
[ASN.1] International Telecommunication Union, "Information Technology -- ASN.1 encoding rules: Specification of Basic Encoding Rules (BER), Canonical Encoding Rules (CER) and Distinguished Encoding Rules (DER)", ITU-T Recommendation X.690, 1994. [BSON] Various, "BSON - Binary JSON", 2013, <http://bsonspec.org/>. [CNN-TERMS] Bormann, C., Ersue, M., and A. Keranen, "Terminology for Constrained Node Networks", Work in Progress, July 2013. [MessagePack] Furuhashi, S., "MessagePack", 2013, <http://msgpack.org/>. [RFC0713] Haverty, J., "MSDTP-Message Services Data Transmission Protocol", RFC 713, April 1976. [RFC4627] Crockford, D., "The application/json Media Type for JavaScript Object Notation (JSON)", RFC 4627, July 2006. [RFC6838] Freed, N., Klensin, J., and T. Hansen, "Media Type Specifications and Registration Procedures", BCP 13, RFC 6838, January 2013. [UBJSON] The Buzz Media, "Universal Binary JSON Specification", 2013, <http://ubjson.org/>. [YAML] Ben-Kiki, O., Evans, C., and I. Net, "YAML Ain't Markup Language (YAML[TM]) Version 1.2", 3rd Edition, October 2009, <http://www.yaml.org/spec/1.2/spec.html>.
Appendix A. Examples
The following table provides some CBOR-encoded values in hexadecimal (right column), together with diagnostic notation for these values (left column). Note that the string "\u00fc" is one form of diagnostic notation for a UTF-8 string containing the single Unicode character U+00FC, LATIN SMALL LETTER U WITH DIAERESIS (u umlaut). Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a single character U+6C34 (CJK UNIFIED IDEOGRAPH-6C34, often representing "water"), and "\ud800\udd51" is a UTF-8 string in diagnostic notation with a single character U+10151 (GREEK ACROPHONIC ATTIC FIFTY STATERS). (Note that all these single-character strings could also be represented in native UTF-8 in diagnostic notation, just not in an ASCII-only specification like the present one.) In the diagnostic notation provided for bignums, their intended numeric value is shown as a decimal number (such as 18446744073709551616) instead of showing a tagged byte string (such as 2(h'010000000000000000')). +------------------------------+------------------------------------+ | Diagnostic | Encoded | +------------------------------+------------------------------------+ | 0 | 0x00 | | | | | 1 | 0x01 | | | | | 10 | 0x0a | | | | | 23 | 0x17 | | | | | 24 | 0x1818 | | | | | 25 | 0x1819 | | | | | 100 | 0x1864 | | | | | 1000 | 0x1903e8 | | | | | 1000000 | 0x1a000f4240 | | | | | 1000000000000 | 0x1b000000e8d4a51000 | | | | | 18446744073709551615 | 0x1bffffffffffffffff | | | | | 18446744073709551616 | 0xc249010000000000000000 | | | | | -18446744073709551616 | 0x3bffffffffffffffff | | | |
| -18446744073709551617 | 0xc349010000000000000000 |
| | |
| -1 | 0x20 |
| | |
| -10 | 0x29 |
| | |
| -100 | 0x3863 |
| | |
| -1000 | 0x3903e7 |
| | |
| 0.0 | 0xf90000 |
| | |
| -0.0 | 0xf98000 |
| | |
| 1.0 | 0xf93c00 |
| | |
| 1.1 | 0xfb3ff199999999999a |
| | |
| 1.5 | 0xf93e00 |
| | |
| 65504.0 | 0xf97bff |
| | |
| 100000.0 | 0xfa47c35000 |
| | |
| 3.4028234663852886e+38 | 0xfa7f7fffff |
| | |
| 1.0e+300 | 0xfb7e37e43c8800759c |
| | |
| 5.960464477539063e-8 | 0xf90001 |
| | |
| 0.00006103515625 | 0xf90400 |
| | |
| -4.0 | 0xf9c400 |
| | |
| -4.1 | 0xfbc010666666666666 |
| | |
| Infinity | 0xf97c00 |
| | |
| NaN | 0xf97e00 |
| | |
| -Infinity | 0xf9fc00 |
| | |
| Infinity | 0xfa7f800000 |
| | |
| NaN | 0xfa7fc00000 |
| | |
| -Infinity | 0xfaff800000 |
| | |
| Infinity | 0xfb7ff0000000000000 |
| | |
| NaN | 0xfb7ff8000000000000 |
| | |
| -Infinity | 0xfbfff0000000000000 |
| | |
| false | 0xf4 |
| | |
| true | 0xf5 |
| | |
| null | 0xf6 |
| | |
| undefined | 0xf7 |
| | |
| simple(16) | 0xf0 |
| | |
| simple(24) | 0xf818 |
| | |
| simple(255) | 0xf8ff |
| | |
| 0("2013-03-21T20:04:00Z") | 0xc074323031332d30332d32315432303a |
| | 30343a30305a |
| | |
| 1(1363896240) | 0xc11a514b67b0 |
| | |
| 1(1363896240.5) | 0xc1fb41d452d9ec200000 |
| | |
| 23(h'01020304') | 0xd74401020304 |
| | |
| 24(h'6449455446') | 0xd818456449455446 |
| | |
| 32("http://www.example.com") | 0xd82076687474703a2f2f7777772e6578 |
| | 616d706c652e636f6d |
| | |
| h'' | 0x40 |
| | |
| h'01020304' | 0x4401020304 |
| | |
| "" | 0x60 |
| | |
| "a" | 0x6161 |
| | |
| "IETF" | 0x6449455446 |
| | |
| "\"\\" | 0x62225c |
| | |
| "\u00fc" | 0x62c3bc |
| | |
| "\u6c34" | 0x63e6b0b4 |
| | |
| "\ud800\udd51" | 0x64f0908591 |
| | |
| [] | 0x80 |
| | |
| [1, 2, 3] | 0x83010203 |
| | |
| [1, [2, 3], [4, 5]] | 0x8301820203820405 |
| | |
| [1, 2, 3, 4, 5, 6, 7, 8, 9, | 0x98190102030405060708090a0b0c0d0e |
| 10, 11, 12, 13, 14, 15, 16, | 0f101112131415161718181819 |
| 17, 18, 19, 20, 21, 22, 23, | |
| 24, 25] | |
| | |
| {} | 0xa0 |
| | |
| {1: 2, 3: 4} | 0xa201020304 |
| | |
| {"a": 1, "b": [2, 3]} | 0xa26161016162820203 |
| | |
| ["a", {"b": "c"}] | 0x826161a161626163 |
| | |
| {"a": "A", "b": "B", "c": | 0xa5616161416162614261636143616461 |
| "C", "d": "D", "e": "E"} | 4461656145 |
| | |
| (_ h'0102', h'030405') | 0x5f42010243030405ff |
| | |
| (_ "strea", "ming") | 0x7f657374726561646d696e67ff |
| | |
| [_ ] | 0x9fff |
| | |
| [_ 1, [2, 3], [_ 4, 5]] | 0x9f018202039f0405ffff |
| | |
| [_ 1, [2, 3], [4, 5]] | 0x9f01820203820405ff |
| | |
| [1, [2, 3], [_ 4, 5]] | 0x83018202039f0405ff |
| | |
| [1, [_ 2, 3], [4, 5]] | 0x83019f0203ff820405 |
| | |
| [_ 1, 2, 3, 4, 5, 6, 7, 8, | 0x9f0102030405060708090a0b0c0d0e0f |
| 9, 10, 11, 12, 13, 14, 15, | 101112131415161718181819ff |
| 16, 17, 18, 19, 20, 21, 22, | |
| 23, 24, 25] | |
| | |
| {_ "a": 1, "b": [_ 2, 3]} | 0xbf61610161629f0203ffff |
| | |
| ["a", {_ "b": "c"}] | 0x826161bf61626163ff | | | | | {_ "Fun": true, "Amt": -2} | 0xbf6346756ef563416d7421ff | +------------------------------+------------------------------------+ Table 4: Examples of Encoded CBOR Data ItemsAppendix B. Jump Table
For brevity, this jump table does not show initial bytes that are reserved for future extension. It also only shows a selection of the initial bytes that can be used for optional features. (All unsigned integers are in network byte order.) +-----------------+-------------------------------------------------+ | Byte | Structure/Semantics | +-----------------+-------------------------------------------------+ | 0x00..0x17 | Integer 0x00..0x17 (0..23) | | | | | 0x18 | Unsigned integer (one-byte uint8_t follows) | | | | | 0x19 | Unsigned integer (two-byte uint16_t follows) | | | | | 0x1a | Unsigned integer (four-byte uint32_t follows) | | | | | 0x1b | Unsigned integer (eight-byte uint64_t follows) | | | | | 0x20..0x37 | Negative integer -1-0x00..-1-0x17 (-1..-24) | | | | | 0x38 | Negative integer -1-n (one-byte uint8_t for n | | | follows) | | | | | 0x39 | Negative integer -1-n (two-byte uint16_t for n | | | follows) | | | | | 0x3a | Negative integer -1-n (four-byte uint32_t for n | | | follows) | | | | | 0x3b | Negative integer -1-n (eight-byte uint64_t for | | | n follows) | | | | | 0x40..0x57 | byte string (0x00..0x17 bytes follow) | | | | | 0x58 | byte string (one-byte uint8_t for n, and then n | | | bytes follow) | | | | | 0x59 | byte string (two-byte uint16_t for n, and then | | | n bytes follow) |
| | |
| 0x5a | byte string (four-byte uint32_t for n, and then |
| | n bytes follow) |
| | |
| 0x5b | byte string (eight-byte uint64_t for n, and |
| | then n bytes follow) |
| | |
| 0x5f | byte string, byte strings follow, terminated by |
| | "break" |
| | |
| 0x60..0x77 | UTF-8 string (0x00..0x17 bytes follow) |
| | |
| 0x78 | UTF-8 string (one-byte uint8_t for n, and then |
| | n bytes follow) |
| | |
| 0x79 | UTF-8 string (two-byte uint16_t for n, and then |
| | n bytes follow) |
| | |
| 0x7a | UTF-8 string (four-byte uint32_t for n, and |
| | then n bytes follow) |
| | |
| 0x7b | UTF-8 string (eight-byte uint64_t for n, and |
| | then n bytes follow) |
| | |
| 0x7f | UTF-8 string, UTF-8 strings follow, terminated |
| | by "break" |
| | |
| 0x80..0x97 | array (0x00..0x17 data items follow) |
| | |
| 0x98 | array (one-byte uint8_t for n, and then n data |
| | items follow) |
| | |
| 0x99 | array (two-byte uint16_t for n, and then n data |
| | items follow) |
| | |
| 0x9a | array (four-byte uint32_t for n, and then n |
| | data items follow) |
| | |
| 0x9b | array (eight-byte uint64_t for n, and then n |
| | data items follow) |
| | |
| 0x9f | array, data items follow, terminated by "break" |
| | |
| 0xa0..0xb7 | map (0x00..0x17 pairs of data items follow) |
| | |
| 0xb8 | map (one-byte uint8_t for n, and then n pairs |
| | of data items follow) |
| | |
| 0xb9 | map (two-byte uint16_t for n, and then n pairs |
| | of data items follow) |
| | |
| 0xba | map (four-byte uint32_t for n, and then n pairs |
| | of data items follow) |
| | |
| 0xbb | map (eight-byte uint64_t for n, and then n |
| | pairs of data items follow) |
| | |
| 0xbf | map, pairs of data items follow, terminated by |
| | "break" |
| | |
| 0xc0 | Text-based date/time (data item follows; see |
| | Section 2.4.1) |
| | |
| 0xc1 | Epoch-based date/time (data item follows; see |
| | Section 2.4.1) |
| | |
| 0xc2 | Positive bignum (data item "byte string" |
| | follows) |
| | |
| 0xc3 | Negative bignum (data item "byte string" |
| | follows) |
| | |
| 0xc4 | Decimal Fraction (data item "array" follows; |
| | see Section 2.4.3) |
| | |
| 0xc5 | Bigfloat (data item "array" follows; see |
| | Section 2.4.3) |
| | |
| 0xc6..0xd4 | (tagged item) |
| | |
| 0xd5..0xd7 | Expected Conversion (data item follows; see |
| | Section 2.4.4.2) |
| | |
| 0xd8..0xdb | (more tagged items, 1/2/4/8 bytes and then a |
| | data item follow) |
| | |
| 0xe0..0xf3 | (simple value) |
| | |
| 0xf4 | False |
| | |
| 0xf5 | True |
| | |
| 0xf6 | Null |
| | |
| 0xf7 | Undefined |
| | |
| 0xf8 | (simple value, one byte follows) | | | | | 0xf9 | Half-Precision Float (two-byte IEEE 754) | | | | | 0xfa | Single-Precision Float (four-byte IEEE 754) | | | | | 0xfb | Double-Precision Float (eight-byte IEEE 754) | | | | | 0xff | "break" stop code | +-----------------+-------------------------------------------------+ Table 5: Jump Table for Initial ByteAppendix C. Pseudocode
The well-formedness of a CBOR item can be checked by the pseudocode in Figure 1. The data is well-formed if and only if: o the pseudocode does not "fail"; o after execution of the pseudocode, no bytes are left in the input (except in streaming applications) The pseudocode has the following prerequisites: o take(n) reads n bytes from the input data and returns them as a byte string. If n bytes are no longer available, take(n) fails. o uint() converts a byte string into an unsigned integer by interpreting the byte string in network byte order. o Arithmetic works as in C. o All variables are unsigned integers of sufficient range.
well_formed (breakable = false) { // process initial bytes ib = uint(take(1)); mt = ib >> 5; val = ai = ib & 0x1f; switch (ai) { case 24: val = uint(take(1)); break; case 25: val = uint(take(2)); break; case 26: val = uint(take(4)); break; case 27: val = uint(take(8)); break; case 28: case 29: case 30: fail(); case 31: return well_formed_indefinite(mt, breakable); } // process content switch (mt) { // case 0, 1, 7 do not have content; just use val case 2: case 3: take(val); break; // bytes/UTF-8 case 4: for (i = 0; i < val; i++) well_formed(); break; case 5: for (i = 0; i < val*2; i++) well_formed(); break; case 6: well_formed(); break; // 1 embedded data item } return mt; // finite data item } well_formed_indefinite(mt, breakable) { switch (mt) { case 2: case 3: while ((it = well_formed(true)) != -1) if (it != mt) // need finite embedded fail(); // of same type break; case 4: while (well_formed(true) != -1); break; case 5: while (well_formed(true) != -1) well_formed(); break; case 7: if (breakable) return -1; // signal break out else fail(); // no enclosing indefinite default: fail(); // wrong mt } return 0; // no break out } Figure 1: Pseudocode for Well-Formedness Check Note that the remaining complexity of a complete CBOR decoder is about presenting data that has been parsed to the application in an appropriate form.
Major types 0 and 1 are designed in such a way that they can be encoded in C from a signed integer without actually doing an if-then- else for positive/negative (Figure 2). This uses the fact that (-1-n), the transformation for major type 1, is the same as ~n (bitwise complement) in C unsigned arithmetic; ~n can then be expressed as (-1)^n for the negative case, while 0^n leaves n unchanged for non-negative. The sign of a number can be converted to -1 for negative and 0 for non-negative (0 or positive) by arithmetic- shifting the number by one bit less than the bit length of the number (for example, by 63 for 64-bit numbers). void encode_sint(int64_t n) { uint64t ui = n >> 63; // extend sign to whole length mt = ui & 0x20; // extract major type ui ^= n; // complement negatives if (ui < 24) *p++ = mt + ui; else if (ui < 256) { *p++ = mt + 24; *p++ = ui; } else ... Figure 2: Pseudocode for Encoding a Signed IntegerAppendix D. Half-Precision
As half-precision floating-point numbers were only added to IEEE 754 in 2008, today's programming platforms often still only have limited support for them. It is very easy to include at least decoding support for them even without such support. An example of a small decoder for half-precision floating-point numbers in the C language is shown in Figure 3. A similar program for Python is in Figure 4; this code assumes that the 2-byte value has already been decoded as an (unsigned short) integer in network byte order (as would be done by the pseudocode in Appendix C).
#include <math.h> double decode_half(unsigned char *halfp) { int half = (halfp[0] << 8) + halfp[1]; int exp = (half >> 10) & 0x1f; int mant = half & 0x3ff; double val; if (exp == 0) val = ldexp(mant, -24); else if (exp != 31) val = ldexp(mant + 1024, exp - 25); else val = mant == 0 ? INFINITY : NAN; return half & 0x8000 ? -val : val; } Figure 3: C Code for a Half-Precision Decoder import struct from math import ldexp def decode_single(single): return struct.unpack("!f", struct.pack("!I", single))[0] def decode_half(half): valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16 if ((half & 0x7c00) != 0x7c00): return ldexp(decode_single(valu), 112) return decode_single(valu | 0x7f800000) Figure 4: Python Code for a Half-Precision DecoderAppendix E. Comparison of Other Binary Formats to CBOR's Design Objectives
The proposal for CBOR follows a history of binary formats that is as long as the history of computers themselves. Different formats have had different objectives. In most cases, the objectives of the format were never stated, although they can sometimes be implied by the context where the format was first used. Some formats were meant to be universally usable, although history has proven that no binary format meets the needs of all protocols and applications. CBOR differs from many of these formats due to it starting with a set of objectives and attempting to meet just those. This section compares a few of the dozens of formats with CBOR's objectives in order to help the reader decide if they want to use CBOR or a different format for a particular protocol or application.
Note that the discussion here is not meant to be a criticism of any format: to the best of our knowledge, no format before CBOR was meant to cover CBOR's objectives in the priority we have assigned them. A brief recap of the objectives from Section 1.1 is: 1. unambiguous encoding of most common data formats from Internet standards 2. code compactness for encoder or decoder 3. no schema description needed 4. reasonably compact serialization 5. applicability to constrained and unconstrained applications 6. good JSON conversion 7. extensibilityE.1. ASN.1 DER, BER, and PER
[ASN.1] has many serializations. In the IETF, DER and BER are the most common. The serialized output is not particularly compact for many items, and the code needed to decode numeric items can be complex on a constrained device. Few (if any) IETF protocols have adopted one of the several variants of Packed Encoding Rules (PER). There could be many reasons for this, but one that is commonly stated is that PER makes use of the schema even for parsing the surface structure of the data stream, requiring significant tool support. There are different versions of the ASN.1 schema language in use, which has also hampered adoption.E.2. MessagePack
[MessagePack] is a concise, widely implemented counted binary serialization format, similar in many properties to CBOR, although somewhat less regular. While the data model can be used to represent JSON data, MessagePack has also been used in many remote procedure call (RPC) applications and for long-term storage of data. MessagePack has been essentially stable since it was first published around 2011; it has not yet had a transition. The evolution of MessagePack is impeded by an imperative to maintain complete backwards compatibility with existing stored data, while only few bytecodes are still available for extension. Repeated requests over the years from the MessagePack user community to separate out binary
and text strings in the encoding recently have led to an extension proposal that would leave MessagePack's "raw" data ambiguous between its usages for binary and text data. The extension mechanism for MessagePack remains unclear.E.3. BSON
[BSON] is a data format that was developed for the storage of JSON- like maps (JSON objects) in the MongoDB database. Its major distinguishing feature is the capability for in-place update, foregoing a compact representation. BSON uses a counted representation except for map keys, which are null-byte terminated. While BSON can be used for the representation of JSON-like objects on the wire, its specification is dominated by the requirements of the database application and has become somewhat baroque. The status of how BSON extensions will be implemented remains unclear.E.4. UBJSON
[UBJSON] has a design goal to make JSON faster and somewhat smaller, using a binary format that is limited to exactly the data model JSON uses. Thus, there is expressly no intention to support, for example, binary data; however, there is a "high-precision number", expressed as a character string in JSON syntax. UBJSON is not optimized for code compactness, and its type byte coding is optimized for human recognition and not for compact representation of native types such as small integers. Although UBJSON is mostly counted, it provides a reserved "unknown-length" value to support streaming of arrays and maps (JSON objects). Within these containers, UBJSON also has a "Noop" type for padding.E.5. MSDTP: RFC 713
Message Services Data Transmission (MSDTP) is a very early example of a compact message format; it is described in [RFC0713], written in 1976. It is included here for its historical value, not because it was ever widely used.E.6. Conciseness on the Wire
While CBOR's design objective of code compactness for encoders and decoders is a higher priority than its objective of conciseness on the wire, many people focus on the wire size. Table 6 shows some encoding examples for the simple nested array [1, [2, 3]]; where some form of indefinite-length encoding is supported by the encoding, [_ 1, [2, 3]] (indefinite length on the outer array) is also shown.
+---------------+-------------------------+-------------------------+ | Format | [1, [2, 3]] | [_ 1, [2, 3]] | +---------------+-------------------------+-------------------------+ | RFC 713 | c2 05 81 c2 02 82 83 | | | | | | | ASN.1 BER | 30 0b 02 01 01 30 06 02 | 30 80 02 01 01 30 06 02 | | | 01 02 02 01 03 | 01 02 02 01 03 00 00 | | | | | | MessagePack | 92 01 92 02 03 | | | | | | | BSON | 22 00 00 00 10 30 00 01 | | | | 00 00 00 04 31 00 13 00 | | | | 00 00 10 30 00 02 00 00 | | | | 00 10 31 00 03 00 00 00 | | | | 00 00 | | | | | | | UBJSON | 61 02 42 01 61 02 42 02 | 61 ff 42 01 61 02 42 02 | | | 42 03 | 42 03 45 | | | | | | CBOR | 82 01 82 02 03 | 9f 01 82 02 03 ff | +---------------+-------------------------+-------------------------+ Table 6: Examples for Different Levels of ConcisenessAuthors' Addresses
Carsten Bormann Universitaet Bremen TZI Postfach 330440 D-28359 Bremen Germany Phone: +49-421-218-63921 EMail: cabo@tzi.org Paul Hoffman VPN Consortium EMail: paul.hoffman@vpnc.org