As noted in the previous section, hash functions can be used for a variety of purposes. Some of these purposes require that a hash function be cryptographically strong. These include direct and indirect signatures -- that is, using the hash as part of the signature or using the hash as part of the body to be signed. Other uses of hash functions may not require the same level of strength.
This document contains some hash functions that are not designed to be used for cryptographic operations. An application that is using a hash function needs to carefully evaluate exactly what hash properties are needed and which hash functions are going to provide them. Applications should also make sure that the ability to change hash functions is part of the base design, as cryptographic advances are sure to reduce the strength of any given hash function [
BCP201].
A hash function is a map from one, normally large, bit string to a second, usually smaller, bit string. As the number of possible input values is far greater than the number of possible output values, it is inevitable that there are going to be collisions. The trick is to make sure that it is difficult to find two values that are going to map to the same output value. A "Collision Attack" is one where an attacker can find two different messages that have the same hash value. A hash function that is susceptible to practical collision attacks
SHOULD NOT be used for a cryptographic purpose. The discovery of theoretical collision attacks against a given hash function
SHOULD trigger protocol maintainers and users to review the continued suitability of the algorithm if alternatives are available and migration is viable. The only reason such a hash function is used is when there is absolutely no other choice (e.g., a Hardware Security Module (HSM) that cannot be replaced), and only after looking at the possible security issues. Cryptographic purposes would include the creation of signatures or the use of hashes for indirect signatures. These functions may still be usable for noncryptographic purposes.
An example of a noncryptographic use of a hash is filtering from a collection of values to find a set of possible candidates; the candidates can then be checked to see if they can successfully be used. A simple example of this is the classic fingerprint of a certificate. If the fingerprint is used to verify that it is the correct certificate, then that usage is a cryptographic one and is subject to the warning above about collision attack. If, however, the fingerprint is used to sort through a collection of certificates to find those that might be used for the purpose of verifying a signature, a simple filter capability is sufficient. In this case, one still needs to confirm that the public key validates the signature (and that the certificate is trusted), and all certificates that don't contain a key that validates the signature can be discarded as false positives.
To distinguish between these two cases, a new value in the Recommended column of the "COSE Algorithms" registry has been added. "Filter Only" indicates that the only purpose of a hash function should be to filter results; it is not intended for applications that require a cryptographically strong algorithm.
[
COSE] did not provide a default structure for holding a hash value both because no separate hash algorithms were defined and because the way the structure is set up is frequently application specific. There are four fields that are often included as part of a hash structure:
-
The hash algorithm identifier.
-
The hash value.
-
A pointer to the value that was hashed. This could be a pointer to a file, an object that can be obtained from the network, a pointer to someplace in the message, or something very application specific.
-
Additional data. This can be something as simple as a random value (i.e., salt) to make finding hash collisions slightly harder (because the payload handed to the application could have been selected to have a collision), or as complicated as a set of processing instructions that is used with the object that is pointed to. The additional data can be dealt with in a number of ways, prepending or appending to the content, but it is strongly suggested that either it be a fixed known size, or the lengths of the pieces being hashed be included so that the resulting byte string has a unique interpretation as the additional data. (Encoding as a CBOR array accomplishes this requirement.)
An example of a structure that permits all of the above fields to exist would look like the following:
COSE_Hash_V = (
1 : int / tstr, # Algorithm identifier
2 : bstr, # Hash value
? 3 : tstr, # Location of object that was hashed
? 4 : any # object containing other details and things
)
Below is an alternative structure that could be used in situations where one is searching a group of objects for a matching hash value. In this case, the location would not be needed, and adding extra data to the hash would be counterproductive. This results in a structure that looks like this:
COSE_Hash_Find = [
hashAlg : int / tstr,
hashValue : bstr
]