The robustness principle can be highly effective in safeguarding against flaws in the implementation of a protocol by peers. Especially when a specification remains unchanged for an extended period of time, the incentive to be tolerant of errors accumulates over time. Indeed, when faced with divergent interpretations of an immutable specification, the only way for an implementation to remain interoperable is to be tolerant of differences in interpretation and implementation errors. However, when official specifications fail to be updated, then deployed implementations -- including their quirks -- often become a substitute standard.
Tolerating unexpected inputs from another implementation might seem logical, even necessary. However, that conclusion relies on an assumption that existing specifications and implementations cannot change. Applying the robustness principle in this way disproportionately values short-term gains over the negative effects on future implementations and the protocol as a whole.
For a protocol to have sustained viability, it is necessary for both specifications and implementations to be responsive to changes, in addition to handling new and old problems that might arise over time. For example, when an implementer discovers a scenario where a specification defines some input as faulty but does not define how to handle that input, the implementer can provide significant value to the ecosystem by reporting the issue and helping to evolve the specification.
When a discrepancy is found between a specification and its implementation, a maintenance discussion inside the standards process allows reaching consensus on how best to evolve the specification. Subsequently, updating implementations to match evolved specifications ensures that implementations are consistently interoperable and removes needless barriers for new implementations. Maintenance also enables continued improvement of the protocol. New use cases are an indicator that the protocol could be successful [
RFC 5218].
Protocol designers are strongly encouraged to continue to maintain and evolve protocol specifications beyond their initial inception and definition. This might require the development of revised specifications, extensions, or other supporting material that evolves in concert with implementations. Involvement of those who implement and deploy the protocol is a critical part of this process, as they provide input on their experience with how the protocol is used.
Most interoperability problems do not require revision of protocols or protocol specifications, as software defects can happen even when the specification is unambiguous. For instance, the most effective means of dealing with a defective implementation in a peer could be to contact the developer responsible. It is far more efficient in the long term to fix one isolated bug than it is to deal with the consequences of workarounds.
Early implementations of protocols have a stronger obligation to closely follow specifications, as their behavior will affect all subsequent implementations. In addition to specifications, later implementations will be guided by what existing deployments accept. Tolerance of errors in early deployments is most likely to result in problems. Protocol specifications might need more frequent revision during early deployments to capture feedback from early rounds of deployment.
Neglect can quickly produce the negative consequences this document describes. Restoring the protocol to a state where it can be maintained involves first discovering the properties of the protocol as it is deployed rather than the protocol as it was originally documented. This can be difficult and time-consuming, particularly if the protocol has a diverse set of implementations. Such a process was undertaken for HTTP [
HTTP] after a period of minimal maintenance. Restoring HTTP specifications to relevance took significant effort.
Maintenance is most effective if it is responsive, which is greatly affected by how rapidly protocol changes can be deployed. For protocol deployments that operate on longer time scales, temporary workarounds following the spirit of the robustness principle might be necessary. For this, improvements in software update mechanisms ensure that the cost of reacting to changes is much lower than it was in the past. Alternatively, if specifications can be updated more readily than deployments, details of the workaround can be documented, including the desired form of the protocols once the need for workarounds no longer exists and plans for removing the workaround.
A well-specified protocol includes rules for consistent handling of aberrant conditions. This increases the chances that implementations will have consistent and interoperable handling of unusual conditions.
Choosing to generate fatal errors for unspecified conditions instead of attempting error recovery can ensure that faults receive attention. This intolerance can be harnessed to reduce occurrences of aberrant implementations.
Intolerance toward violations of specification improves feedback for new implementations in particular. When a new implementation encounters a peer that is intolerant of an error, it receives strong feedback that allows the problem to be discovered quickly.
To be effective, intolerant implementations need to be sufficiently widely deployed so that they are encountered by new implementations with high probability. This could depend on multiple implementations deploying strict checks.
Interoperability problems also need to be made known to those in a position to address them. In particular, systems with human operators, such as user-facing clients, are ideally suited to surfacing errors. Other systems might need to use less direct means of making errors known.
This does not mean that intolerance of errors in early deployments of protocols has the effect of preventing interoperability. On the contrary, when existing implementations follow clearly specified error handling, new implementations or features can be introduced more readily, as the effect on existing implementations can be easily predicted; see also
Section 2.2.
Any intolerance also needs to be strongly supported by specifications; otherwise, they encourage fracturing of the protocol community or proliferation of workarounds. See
Section 5.2.
Intolerance can be used to motivate compliance with any protocol requirement. For instance, the INADEQUATE_SECURITY error code and associated requirements in HTTP/2 [
HTTP/2] resulted in improvements in the security of the deployed base.
A notification for a fatal error is best sent as explicit error messages to the entity that made the error. Error messages benefit from being able to carry arbitrary information that might help the implementer of the sender of the faulty input understand and fix the issue in their software. QUIC error frames [
QUIC] are an example of a fatal error mechanism that helped implementers improve software quality throughout the protocol lifecycle. Similarly, the use of Extended DNS Errors [
EDE] has been effective in providing better descriptions of DNS resolution errors to clients.
Stateless protocol endpoints might generate denial-of-service attacks if they send an error message in response to every message that is received from an unauthenticated sender. These implementations might need to silently discard these messages.
Any protocol participant that is affected by changes arising from maintenance might be excluded if they are unwilling or unable to implement or deploy changes that are made to the protocol.
Deliberate exclusion of problematic implementations is an important tool that can ensure that the interoperability of a protocol remains viable. While backward-compatible changes are always preferable to incompatible ones, it is not always possible to produce a design that protects the ability of all current and future protocol participants to interoperate.
Accidentally excluding unexpected participants is not usually a good outcome. When developing and deploying changes, it is best to first understand the extent to which the change affects existing deployments. This ensures that any exclusion that occurs is intentional.
In some cases, existing deployments might need to change in order to avoid being excluded. Though it might be preferable to avoid forcing deployments to change, this might be considered necessary. To avoid unnecessarily excluding deployments that might take time to change, developing a migration plan can be prudent.
Exclusion is a direct goal when choosing to be intolerant of errors (see
Section 5.1). Exclusionary actions are employed with the deliberate intent of protecting future interoperability.
Excluding implementations or deployments can lead to a fracturing of the protocol system that could be more harmful than any divergence that might arise from tolerating the unexpected. The IAB document "Uncoordinated Protocol Development Considered Harmful" [
RFC 5704] describes how conflict or competition in the maintenance of protocols can lead to similar problems.