Internet Engineering Task Force (IETF) T. Hansen Request for Comments: 5863 AT&T Laboratories Category: Informational E. Siegel ISSN: 2070-1721 Consultant P. Hallam-Baker Default Deny Security, Inc. D. Crocker Brandenburg InternetWorking May 2010 DomainKeys Identified Mail (DKIM) Development, Deployment, and OperationsAbstract
DomainKeys Identified Mail (DKIM) allows an organization to claim responsibility for transmitting a message, in a way that can be validated by a recipient. The organization can be the author's, the originating sending site, an intermediary, or one of their agents. A message can contain multiple signatures, from the same or different organizations involved with the message. DKIM defines a domain-level digital signature authentication framework for email, using public key cryptography and using the domain name service as its key server technology. This permits verification of a responsible organization, as well as the integrity of the message content. DKIM will also provide a mechanism that permits potential email signers to publish information about their email signing practices; this will permit email receivers to make additional assessments about messages. DKIM's authentication of email identity can assist in the global control of "spam" and "phishing". This document provides implementation, deployment, operational, and migration considerations for DKIM. Status of This Memo This document is not an Internet Standards Track specification; it is published for informational purposes. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Not all documents approved by the IESG are a candidate for any level of Internet Standard; see Section 2 of RFC 5741.
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5863. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. This document may contain material from IETF Documents or IETF Contributions published or made publicly available before November 10, 2008. The person(s) controlling the copyright in some of this material may not have granted the IETF Trust the right to allow modifications of such material outside the IETF Standards Process. Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English.Table of Contents
1. Introduction ....................................................4 2. Using DKIM as Part of Trust Assessment ..........................4 2.1. A Systems View of Email Trust Assessment ...................4 2.2. Choosing a DKIM Tag for the Assessment Identifier ..........6 2.3. Choosing the Signing Domain Name ...........................8 2.4. Recipient-Based Assessments ...............................10 2.5. Filtering .................................................12 3. DKIM Key Generation, Storage, and Management ...................15 3.1. Private Key Management: Deployment and Ongoing Operations ................................................16 3.2. Storing Public Keys: DNS Server Software Considerations ...17 3.3. Per-User Signing Key Management Issues ....................18 3.4. Third-Party Signer Key Management and Selector Administration ............................................19 3.5. Key Pair / Selector Life Cycle Management .................19
4. Signing ........................................................21 4.1. DNS Records ...............................................21 4.2. Signing Module ............................................21 4.3. Signing Policies and Practices ............................22 5. Verifying ......................................................23 5.1. Intended Scope of Use .....................................23 5.2. Signature Scope ...........................................23 5.3. Design Scope of Use .......................................24 5.4. Inbound Mail Filtering ....................................24 5.5. Messages Sent through Mailing Lists and Other Intermediaries ............................................25 5.6. Generation, Transmission, and Use of Results Headers ......25 6. Taxonomy of Signatures .........................................26 6.1. Single Domain Signature ...................................26 6.2. Parent Domain Signature ...................................27 6.3. Third-Party Signature .....................................27 6.4. Using Trusted Third-Party Senders .........................29 6.5. Multiple Signatures .......................................30 7. Example Usage Scenarios ........................................31 7.1. Author's Organization - Simple ............................32 7.2. Author's Organization - Differentiated Types of Mail ......32 7.3. Author Domain Signing Practices ...........................32 7.4. Delegated Signing .........................................34 7.5. Independent Third-Party Service Providers .................35 7.6. Mail Streams Based on Behavioral Assessment ...............35 7.7. Agent or Mediator Signatures ..............................36 8. Usage Considerations ...........................................36 8.1. Non-Standard Submission and Delivery Scenarios ............36 8.2. Protection of Internal Mail ...............................37 8.3. Signature Granularity .....................................38 8.4. Email Infrastructure Agents ...............................39 8.5. Mail User Agent ...........................................40 9. Security Considerations ........................................41 10. Acknowledgements ..............................................41 11. References ....................................................42 11.1. Normative References .....................................42 11.2. Informative References ...................................42 Appendix A. Migration Strategies .................................43 A.1. Migrating from DomainKeys .................................43 A.2. Migrating Hash Algorithms .................................48 A.3. Migrating Signing Algorithms ..............................49 Appendix B. General Coding Criteria for Cryptographic Applications .........................................50
1. Introduction
DomainKeys Identified Mail (DKIM) allows an organization to claim responsibility for transmitting a message, in a way that can be validated by a recipient. This document provides practical tips for those who are developing DKIM software, mailing list managers, filtering strategies based on the output from DKIM verification, and DNS servers; those who are deploying DKIM software, keys, mailing list software, and migrating from DomainKeys [RFC4870]; and those who are responsible for the ongoing operations of an email infrastructure that has deployed DKIM. The reader is encouraged to read the DKIM Service Overview document [RFC5585] before this document. More detailed guidance about DKIM and Author Domain Signing Practices (ADSP) can also be found in the protocol specifications [RFC4871], [RFC5617], and [RFC5672]. The document is organized around the key concepts related to DKIM. Within each section, additional considerations specific to development, deployment, or ongoing operations are highlighted where appropriate. The possibility of the use of DKIM results as input to a local reputation database is also discussed.2. Using DKIM as Part of Trust Assessment
2.1. A Systems View of Email Trust Assessment
DKIM participates in a trust-oriented enhancement to the Internet's email service, to facilitate message handling decisions, such as for delivery and for content display. Trust-oriented message handling has substantial differences from the more established approaches that consider messages in terms of risk and abuse. With trust, there is a collaborative exchange between a willing participant along the sending path and a willing participant at a recipient site. In contrast, the risk model entails independent, unilateral action by the recipient site, in the face of a potentially unknown, hostile, and deceptive sender. This translates into a very basic technical difference: in the face of unilateral action by the recipient and even antagonistic efforts by the sender, risk-oriented mechanisms are based on heuristics, that is, on guessing. Guessing produces statistical results with some false negatives and some false positives. For trust-based exchanges, the goal is the deterministic exchange of information. For DKIM, that information is the one identifier that represents a stream of mail for which an independent assessment is sought (by the signer).
A trust-based service is built upon a validated Responsible Identifier that labels a stream of mail and is controlled by an identity (role, person, or organization). The identity is acknowledging some degree of responsibility for the message stream. Given a basis for believing that an identifier is being used in an authorized manner, the recipient site can make and use an assessment of the associated identity. An identity can use different identifiers, on the assumption that the different streams might produce different assessments. For example, even the best-run marketing campaigns will tend to produce some complaints that can affect the reputation of the associated identifier, whereas a stream of transactional messages is likely to have a more pristine reputation. Determining that the identifier's use is valid is quite different from determining that the content of a message is valid. The former means only that the identifier for the responsible role, person, or organization has been legitimately associated with a message. The latter means that the content of the message can be believed and, typically, that the claimed author of the content is correct. DKIM validates only the presence of the identifier used to sign the message. Even when this identifier is validated, DKIM carries no implication that any of the message content, including the RFC5322.From field [RFC5322], is valid. Surprisingly, this limit to the semantics of a DKIM signature applies even when the validated signing identifier is the same domain name as is used in the RFC5322.From field! DKIM's only claim about message content is that the content cited in the DKIM-Signature: field's h= tag has been delivered without modification. That is, it asserts message content integrity -- between signing and verifying -- not message content validity. As shown in Figure 1, this enhancement is a communication between a responsible role, person, or organization that signs the message and a recipient organization that assesses its trust in the signer. The recipient then makes handling decisions based on a collection of assessments, of which the DKIM mechanism is only a part. In this model, as shown in Figure 1, validation is an intermediary step, having the sole task of passing a validated Responsible Identifier to the Identity Assessor. The communication is of a single Responsible Identifier that the Responsible Identity wishes to have used by the Identity Assessor. The Identifier is the sole, formal input and output value of DKIM signing. The Identity Assessor uses this single, provided Identifier for consulting whatever assessment databases are deemed appropriate by the assessing entity. In turn, output from the Identity Assessor is fed into a Handling Filter
engine that considers a range of factors, along with this single output value. The range of factors can include ancillary information from the DKIM validation. Identity Assessment covers a range of possible functions. It can be as simple as determining whether the identifier is a member of some list, such as authorized operators or participants in a group that might be of interest for recipient assessment. Equally, it can indicate a degree of trust (reputation) that is to be afforded the actor using that identifier. The extent to which the assessment affects the handling of the message is, of course, determined later, by the Handling Filter. +------+------+ +------+------+ | Author | | Recipient | +------+------+ +------+------+ | ^ | | | +------+------+ | -->| Handling |<-- | -->| Filter |<-- | +-------------+ | ^ V Responsible | +-------------+ Identifier +------+------+ | Responsible |. . . . . . . . . . .>| Identity | | Identity | . . | Assessor | +------+------+ . . +-------------+ | V . ^ ^ V . . | | +------------------.-------.--------------------+ | | | +------+------+ . . . > . +-------------+ | | | +-----------+ | | Identifier | | Identifier +--|--+ +--+ Assessment| | | Signer +------------->| Validator | | | Databases | | +-------------+ +-------------+ | +-----------+ | DKIM Service | +-----------------------------------------------+ Figure 1: Actors in a Trust Sequence Using DKIM2.2. Choosing a DKIM Tag for the Assessment Identifier
The signer of a message needs to be able to provide precise data and know what that data will mean upon delivery to the Assessor. If there is ambiguity in the choice that will be made on the recipient side, then the sender cannot know what basis for assessment will be used. DKIM has three values that specify identification information and it is easy to confuse their use, although only one defines the
formal input and output of DKIM, with the other two being used for internal protocol functioning and adjunct purposes, such as auditing and debugging. The salient values include the s=, d= and i= parameters in the DKIM- Signature: header field. In order to achieve the end-to-end determinism needed for this collaborative exchange from the signer to the assessor, the core model needs to specify what the signer is required to provide to the assessor. The update to RFC 4871 [RFC5672] specifies: DKIM's primary task is to communicate from the Signer to a recipient-side Identity Assessor a single Signing Domain Identifier (SDID) that refers to a responsible identity. DKIM MAY optionally provide a single responsible Agent or User Identifier (AUID)... A receive-side DKIM verifier MUST communicate the Signing Domain Identifier (d=) to a consuming Identity Assessor module and MAY communicate the User Agent Identifier (i=) if present.... To the extent that a receiver attempts to intuit any structured semantics for either of the identifiers, this is a heuristic function that is outside the scope of DKIM's specification and semantics. The single, mandatory value that DKIM supplies as its output is: d= This specifies the "domain of the signing entity". It is a domain name and is combined with the selector to form a DNS query. A receive-side DKIM verifier needs to communicate the Signing Domain Identifier (d=) to a consuming Identity Assessor module and can also communicate the User Agent Identifier (i=) if present. The adjunct values are: s= This tag specifies the selector. It is used to discriminate among different keys that can be used for the same d= domain name. As discussed in Section 4.3 of [RFC5585], "If verifiers were to employ the selector as part of an assessment mechanism, then there would be no remaining mechanism for making a transition from an old, or compromised, key to a new one". Consequently, the selector is not appropriate for use as part or all of the identifier used to make assessments. i= This tag is optional and provides the "[t]he Agent or User Identifier (AUID) on behalf of which the SDID is taking responsibility" [RFC5672]. The identity can be in the syntax
of an entire email address or only a domain name. The domain name can be the same as for d= or it can be a sub-name of the d= name. NOTE: Although the i= identity has the syntax of an email address, it is not required to have those semantics. That is, "the identity of the user" need not be the same as the user's mailbox. For example, the signer might wish to use i= to encode user-related audit information, such as how they were accessing the service at the time of message posting. Therefore, it is not possible to conclude anything from the i= string's (dis)similarity to email addresses elsewhere in the header. So, i= can have any of these properties: * Be a valid domain when it is the same as d= * Appear to be a subdomain of d= but might not even exist * Look like a mailbox address but might have different semantics and therefore not function as a valid email address * Be unique for each message, such as indicating access details of the user for the specific posting This underscores why the tag needs to be treated as being opaque, since it can represent any semantics, known only to the signer. Hence, i= serves well as a token that is usable like a Web cookie, for return to the signing Administrative Management Domain (ADMD) -- such as for auditing and debugging. Of course in some scenarios the i= string might provide a useful adjunct value for additional (heuristic) processing by the Handling Filter.2.3. Choosing the Signing Domain Name
A DKIM signing entity can serve different roles, such as being the author of content, the operator of the mail service, or the operator of a reputation service that also provides signing services on behalf of its customers. In these different roles, the basis for distinguishing among portions of email traffic can vary. For an entity creating DKIM signatures, it is likely that different portions of its mail will warrant different levels of trust. For example: * Mail is sent for different purposes, such as marketing versus transactional, and recipients demonstrate different patterns of acceptance between these.
* For an operator of an email service, there often are distinct sub-populations of users warranting different levels of trust or privilege, such as paid versus free users, or users engaged in direct correspondence versus users sending bulk mail. * Mail originating outside an operator's system, such as when it is redistributed by a mailing-list service run by the operator, will warrant a different reputation from mail submitted by users authenticated with the operator. It is therefore likely to be useful for a signer to use different d= subdomain names, for different message traffic streams, so that receivers can make differential assessments. However, too much differentiation -- that is, too fine a granularity of signing domains -- makes it difficult for the receiver to discern a sufficiently stable pattern of traffic for developing an accurate and reliable assessment. So the differentiation needs to achieve a balance. Generally, in a trust system, legitimate signers have an incentive to pick a small stable set of identities, so that recipients and others can attribute reputations to them. The set of these identities a receiver trusts is likely to be quite a bit smaller than the set it views as risky. The challenge in using additional layers of subdomains is whether the extra granularity will be useful for the Assessor. In fact, excessive levels invite ambiguity: if the Assessor does not take advantage of the added granularity in the entire domain name that is provided, they might unilaterally decide to use only some rightmost part of the identifier. The signer cannot know what portion will be used. That ambiguity would move the use of DKIM back to the realm of heuristics, rather than the deterministic processing that is its goal. Hence, the challenge is to determine a useful scheme for labeling different traffic streams. The most obvious choices are among different types of content and/or different types of authors. Although stability is essential, it is likely that the choices will change, over time, so the scheme needs to be flexible.
For those originating message content, the most likely choice of subdomain naming scheme will by based upon type of content, which can use content-oriented labels or service-oriented labels. For example: transaction.example.com newsletter.example.com bugreport.example.com support.example.com sales.example.com marketing.example.com where the choices are best dictated by whether they provide the Identity Assessor with the ability to discriminate usefully among streams of mail that demonstrate significantly different degrees of recipient acceptance or safety. Again, the danger in providing too fine a granularity is that related message streams that are labeled separately will not benefit from an aggregate reputation. For those operating messaging services on behalf of a variety of customers, an obvious scheme to use has a different subdomain label for each customer. For example: widgetco.example.net moviestudio.example.net bigbank.example.net However, it can also be appropriate to label by the class of service or class of customer, such as: premier.example.net free.example.net certified.example.net Prior to using domain names for distinguishing among sources of data, IP Addresses have been the basis for distinction. Service operators typically have done this by dedicating specific outbound IP Addresses to specific mail streams -- typically to specific customers. For example, a university might want to distinguish mail from the administration, versus mail from the student dorms. In order to make the adoption of a DKIM-based service easier, it can be reasonable to translate the same partitioning of traffic, using domain names in place of the different IP Addresses.2.4. Recipient-Based Assessments
DKIM gives the recipient site's Identity Assessor a verifiable identifier to use for analysis. Although the mechanism does not make claims that the signer is a Good Actor or a Bad Actor, it does make
it possible to know that use of the identifier is valid. This is in marked contrast with schemes that do not have authentication. Without verification, it is not possible to know whether the identifier -- whether taken from the RFC5322.From field, the RFC5321.MailFrom command, or the like -- is being used by an authorized agent. DKIM solves this problem. Hence, with DKIM, the Assessor can know that two messages with the same DKIM d= identifier are, in fact, signed by the same person or organization. This permits a far more stable and accurate assessment of mail traffic using that identifier. DKIM is distinctive, in that it provides an identifier that is not necessarily related to any other identifier in the message. Hence, the signer might be the author's ADMD, one of the operators along the transit path, or a reputation service being used by one of those handling services. In fact, a message can have multiple signatures, possibly by any number of these actors. As discussed above, the choice of identifiers needs to be based on differences that the signer thinks will be useful for the recipient Assessor. Over time, industry practices establish norms for these choices. Absent such norms, it is best for signers to distinguish among streams that have significant differences, while consuming the smallest number of identifiers possible. This will limit the burden on recipient Assessors. A common view about a DKIM signature is that it carries a degree of assurance about some or all of the message contents, and in particular, that the RFC5322.From field is likely to be valid. In fact, DKIM makes assurances only about the integrity of the data and not about its validity. Still, presumptions of the RFC5322.From field validity remain a concern. Hence, a signer using a domain name that is unrelated to the domain name in the RFC5322.From field can reasonably expect that the disparity will warrant some curiosity, at least until signing by independent operators has produced some established practice among recipient Assessors. With the identifier(s) supplied by DKIM, the Assessor can consult an independent assessment service about the entity associated with the identifier(s). Another possibility is that the Assessor can develop its own reputation rating for the identifier(s). That is, over time, the Assessor can observe the stream of messages associated with the identifier(s) developing a reaction to associated content. For example, if there is a high percentage of user complaints regarding
signed mail with a d= value of "widgetco.example.net", the Assessor might include that fact in the vector of data it provides to the Handling Filter. This is also discussed briefly in Section 5.4.2.5. Filtering
The assessment of the signing identifier is given to a Handling Filter that is defined by local policies, according to a potentially wide range of different factors and weightings. This section discusses some of the kinds of choices and weightings that are plausible and the differential actions that might be performed. Because authenticated domain names represent a collaborative sequence between signer and Assessor, actions can sometimes reasonably include contacting the signer. The discussion focuses on variations in Organizational Trust versus Message Stream Risk, that is, the degree of positive assessment of a DKIM-signing organization, and the potential danger present in the message stream signed by that organization. While it might seem that higher trust automatically means lower risk, the experience with real-world operations provides examples of every combination of the two factors, as shown in Figure 2. For each axis, only three levels of granularity are listed, in order to keep discussion manageable. In real-world filtering engines, finer-grained distinctions are typically needed, and there typically are more axes. For example, there are different types of risk, so that an engine might distinguish between spam risk versus virus risk and take different actions based on which type of problematic content is present. For spam, the potential damage from a false negative is small, whereas the damage from a false positive is high. For a virus, the potential danger from a false negative is extremely high, while the likelihood of a false positive when using modern detection tools is extremely low. However, for the discussion here, "risk" is taken as a single construct. The DKIM d= identifier is independent of any other identifier in a message and can be a subdomain of the name owned by the signer. This permits the use of fine-grained and stable distinctions between different types of message streams, such as between transactional messages and marketing messages from the same organization. Hence, the use of DKIM might permit a richer filtering model than has typically been possible for mail-receiving engines. Note that the realities of today's public Internet Mail environment necessitate having a baseline handling model that is quite suspicious. Hence, "strong" filtering rules really are the starting point, as indicated for the UNKNOWN cell.
The table indicates differential handling for each combination, such as how aggressive or broad-based the filtering could be. Aggressiveness affects the types of incorrect assessments that are likely. So, the table distinguishes various characteristics, including: 1) whether an organization is unknown, known to be good actors, or known to be bad actors; and 2) the assessment of messages. It includes advice about the degree of filtering that might be done, and other message disposition. Perhaps unexpectedly, it also lists a case in which the receiving site might wish to deliver problematic mail, rather than redirecting or deleting it. The site might also wish to contact the signing organization and seek resolution of the problem. +-------------+-----------------------------------------------+ | S T R E A M * O R G A N I Z A T I O N A L T R U S T | | R I S K * Low Medium High | | +***************+***************+***************+ | Low * BENIGN: | DILIGENT: | PRISTINE | | * Moderate | Mild | Accept | | * filter | filter | | | +---------------+---------------+---------------+ | Medium * UNKNOWN: | TYPICAL: | PROTECTED: | | * Strong | Targeted | Accept & | | * filter | filter | Contact | | +---------------+---------------+---------------+ | High * MALICIOUS: | NEGLIGENT: | COMPROMISED: | | * Block & | Block | Block & | | * Counter | | Contact | +-------------+---------------+---------------+---------------+ Figure 2: Trust versus Risk Handling Tradeoffs Example [LEGEND] AXES Stream Risk: This is a measure of the recent history of a message stream and the severity of problems it has presented. Organizational Trust: This combines longer-term history about possible stream problems from that organization, and its responsiveness to problem handling. CELLS (indicating reasonable responses) Labels for the cells are meant as a general assessment of an organization producing that type of mail stream under that circumstance.
Benign: There is some history of sending good messages, with very few harmful messages having been received. This stream warrants filtering that does not search for problems very aggressively, in order to reduce the likelihood of false positives. Diligent: The stream has had a limited degree of problems and the organization is consistently successful at controlling their abuse issues and in a timely manner. Pristine: There is a history of a clean message stream with no problems, from an organization with an excellent reputation. So, the filter primarily needs to ensure that messages are delivered; catching stray problem messages is a lesser concern. In other words, the paramount concern, here, is false positives. ----- Unknown: There is no history with the organization. Apply an aggressive level of "naive" filtering, given the nature of the public email environment. Typical: The stream suffers significant abuse issues and the organization has demonstrated a record of having difficulties resolving them in a timely manner, in spite of legitimate efforts. Unfortunately, this is the typical case for service providers with an easy and open subscription policy. Protected: An organization with a good history and/or providing an important message stream for the receiving site is subject to a local policy that messages are not allowed to be blocked, but the stream is producing a problematic stream. The receiver delivers messages, but works quickly with the organization to resolve the matter. ----- Malicious: A persistently problematic message stream is coming from an organization that appears to contribute to the problem. The stream will be blocked, but the organization's role is sufficiently troubling to warrant following up with others in the anti-abuse or legal communities, to constrain or end their impact.
Negligent: A persistently problematic message stream is coming from an organization that does not appear to be contributing to the problem, but also does not appear to be working to eliminate it. At the least, the stream needs to be blocked. Compromised: An organization with a good history has a stream that changes and becomes too problematic to be delivered. The receiver blocks the stream and works quickly with the organization to resolve the matter.