11. System Process
As each new sample (theta, delta, epsilon, jitter, t) is produced by the clock filter algorithm, all peer processes are scanned by the mitigation algorithms consisting of the selection, cluster, combine, and clock discipline algorithms in the system process. The selection algorithm scans all associations and casts off the falsetickers, which have demonstrably incorrect time, leaving the truechimers as result. In a series of rounds, the cluster algorithm discards the association statistically furthest from the centroid until a specified minimum number of survivors remain. The combine algorithm produces the best and final statistics on a weighted average basis. The final offset is passed to the clock discipline algorithm to steer the system clock to the correct time. The cluster algorithm selects one of the survivors as the system peer. The associated statistics (theta, delta, epsilon, jitter, t) are used to construct the system variables inherited by dependent servers and clients and made available to other applications running on the same machine.
11.1. System Process Variables
Figure 23 summarizes the common names, formula names, and a short description of each system variable. Unless noted otherwise, all variables have assumed prefix s. +-----------+------------+------------------------+ | Name | Formula | Description | +-----------+------------+------------------------+ | t | t | update time | | p | p | system peer identifier | | leap | leap | leap indicator | | stratum | stratum | stratum | | precision | rho | precision | | offset | THETA | combined offset | | jitter | PSI | combined jitter | | rootdelay | DELTA | root delay | | rootdisp | EPSILON | root dispersion | | v | v | survivor list | | refid | refid | reference ID | | reftime | reftime | reference time | | NMIN | 3 | minimum survivors | | CMIN | 1 | minimum candidates | +-----------+------------+------------------------+ Figure 23: System Process Variables Except for the t, p, offset, and jitter variables and the NMIN and CMIN constants, the variables have the same format and interpretation as the peer variables of the same name. The NMIN and CMIN parameters are used by the selection and cluster algorithms described in the next section. The t variable is the seconds counter at the time of the last update. An example is shown by the clock_update() routine in Appendix A.5.5.4. The p variable is the system peer identifier determined by the cluster() routine in Section 11.2.2. The precision variable has the same format as the packet variable of the same name. The precision is defined as the larger of the resolution and time to read the clock, in log2 units. For instance, the precision of a mains-frequency clock incrementing at 60 Hz is 16 ms, even when the system clock hardware representation is to the nanosecond. The offset and jitter variables are determined by the combine algorithm in Section 11.2.3. These values represent the best and final offset and jitter used to discipline the system clock.
Initially, all variables are cleared to zero, then the leap is set to 3 (unsynchronized) and stratum is set to MAXSTRAT (16). Remember that MAXSTRAT is mapped to zero in the transmitted packet.11.2. System Process Operations
Figure 24 summarizes the system process operations performed by the clock select routine. The selection algorithm described in Section 11.2.1 produces a majority clique of presumed correct candidates (truechimers) based on agreement principles. The cluster algorithm described in Section 11.2.2 discards outliers to produce the most accurate survivors. The combine algorithm described in Section 11.2.3 provides the best and final offset for the clock discipline algorithm. An example is described in Appendix A.5.5.6. If the selection algorithm cannot produce a majority clique, or if it cannot produce at least CMIN survivors, the system process exits without disciplining the system clock. If successful, the cluster algorithm selects the statistically best candidate as the system peer and its variables are inherited as the system variables.
+-----------------+ | clock_select() | +-----------------+ ................................|........... . V . . yes +---------+ +-----------------+ . . +--| accept? | | scan candidates | . . | +---------+ | | . . V no | | | . . +---------+ | | | . . | add peer| | | | . . +---------- | | | . . | V | | . . +---------->-->| | . . | | . . Selection Algorithm +-----------------+ . .................................|.......... V no +-------------------+ +-------------| survivors? | | +-------------------+ | | yes | V | +-------------------+ | | Cluster Algorithm | | +-------------------+ | | | V V yes +-------------------+ |<------------| n < CMIN? | | +-------------------+ V | +-----------------+ V no | s.p = NULL | +-------------------+ +-----------------+ | s.p = v_0.p | | +-------------------+ V | +-----------------+ V | return (UNSYNC) | +-------------------+ +-----------------+ | return (SYNC) | +-------------------+ Figure 24: Clock Select Routine
11.2.1. Selection Algorithm
Note that the selection and cluster algorithms are described separately, but combined in the code skeleton. The selection algorithm operates to find an intersection interval containing a majority clique of truechimers using Byzantine agreement principles originally proposed by Marzullo [ref6], but modified to improve accuracy. An overview of the algorithm is given below and described in the first half of the clock_select() routine in Appendix A.5.5.1. First, those servers that are unusable according to the rules of the protocol are detected and discarded as shown by the accept() routine in Appendix A.5.5.3. Next, a set of tuples (p, type, edge) is generated for the remaining candidates. Here, p is the association identifier and type identifies the upper (+1), middle (0), and lower (-1) endpoints of a correctness interval centered on theta for that candidate. This results in three tuples, lowpoint (p, -1, theta - lambda), midpoint (p, 0, theta), and highpoint (p, +1, theta + lambda), where lambda is the root synchronization distance. An example of this calculation is shown by the rootdist() routine in Appendix A.5.1.1. The steps of the algorithm are: 1. For each of m associations, place three tuples as defined above on the candidate list. 2. Sort the tuples on the list by the edge component. Order the lowpoint, midpoint, and highpoint of these intervals from lowest to highest. Set the number of falsetickers f = 0. 3. Set the number of midpoints d = 0. Set c = 0. Scan from lowest endpoint to highest. Add one to c for every lowpoint, subtract one for every highpoint, add one to d for every midpoint. If c >= m - f, stop; set l = current lowpoint. 4. Set c = 0. Scan from highest endpoint to lowest. Add one to c for every highpoint, subtract one for every lowpoint, add one to d for every midpoint. If c >= m - f, stop; set u = current highpoint. 5. Is d = f and l < u? If yes, then follow step 5A; else, follow step 5B. 5A. Success: the intersection interval is [l, u]. 5B. Add one to f. Is f < (m / 2)? If yes, then go to step 3 again. If no, then go to step 6. 6. Failure; a majority clique could not be found. There are no suitable candidates to discipline the system clock.
The algorithm is described in detail in Appendix A.5.5.1. Note that it starts with the assumption that there are no falsetickers (f = 0) and attempts to find a non-empty intersection interval containing the midpoints of all correct servers, i.e., truechimers. If a non-empty interval cannot be found, it increases the number of assumed falsetickers by one and tries again. If a non-empty interval is found and the number of falsetickers is less than the number of truechimers, a majority clique has been found and the midpoint of each truechimer (theta) represents the candidates available to the cluster algorithm. If a majority clique is not found, or if the number of truechimers is less than CMIN, there are insufficient candidates to discipline the system clock. CMIN defines the minimum number of servers consistent with the correctness requirements. Suspicious operators would set CMIN to ensure multiple redundant servers are available for the algorithms to mitigate properly. However, for historic reasons the default value for CMIN is one.11.2.2. Cluster Algorithm
The candidates of the majority clique are placed on the survivor list v in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an association identifier, theta_p, psi_p, and stratum_p the current offset, jitter and stratum of association p, respectively, and lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda, where lambda is the root synchronization distance for association p. The list is processed by the cluster algorithm below. An example is shown by the second half of the clock_select() algorithm in Appendix A.5.5.1. 1. Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate. 2. Sort the candidates by increasing lambda_p. Let n be the number of candidates and NMIN the minimum required number of survivors. 3. For each candidate, compute the selection jitter psi_s: +----- -----+^1/2 | n-1 | | --- | | 1 \ 2 | psi_s = | ---- * / (theta_s - theta_j) | | n-1 --- | | j=1 | +----- -----+ 4. Select psi_max as the candidate with maximum psi_s.
5. Select psi_min as the candidate with minimum psi_p. 6. Is psi_max < psi_min or n <= NMIN? If yes, follow step 6A; otherwise, follow step 6B. 6A. Done. The remaining candidates on the survivor list are ranked in the order of preference. The first entry on the list represents the system peer; its variables are used later to update the system variables. 6B. Delete the outlier candidate with psi_max; reduce n by one and go back to step 3. The algorithm operates in a series of rounds where each round discards the statistical outlier with maximum selection jitter psi_s. However, if psi_s is less than the minimum peer jitter psi_p, no improvement is possible by discarding outliers. This and the minimum number of survivors represent the terminating conditions of the algorithm. Upon termination, the final value of psi_max is saved as the system selection jitter PSI_s for use later.11.2.3. Combine Algorithm
The clock combine route processes the remaining survivors to produce the best and final data for the clock discipline algorithm. The routine processes peer offset and jitter statistics to produce the combined system offset THETA and system peer jitter PSI_p, where each server statistic is weighted by the reciprocal of the root synchronization distance and the result normalized. An example is shown by the clock_combine() routine in Appendix A.5.5.5 The combined THETA is passed to the clock update routine. The first candidate on the survivor list is nominated as the system peer with identifier p. The system peer jitter PSI_p is a component of the system jitter PSI. It is used along with the selection jitter PSI_s to produce the system jitter: PSI = [(PSI_s)^2 + (PSI_p)^2]^1/2 Each time an update is received from the system peer, the clock update routine is called. By rule, an update is discarded if its time of arrival p.t is not strictly later than the last update used s.t. The labels IGNOR, PANIC, ADJ, and STEP refer to return codes from the local clock routine described in the next section. IGNORE means the update has been ignored as an outlier. PANIC means the offset is greater than the panic threshold PANICT (1000 s) and SHOULD cause the program to exit with a diagnostic message to the
system log. STEP means the offset is less than the panic threshold, but greater than the step threshold STEPT (125 ms). In this case, the clock is stepped to the correct offset, but since this means all peer data have been invalidated, all associations MUST be reset and the client begins as at initial start. ADJ means the offset is less than the step threshold and thus a valid update. In this case, the system variables are updated from the peer variables as shown in Figure 25. +-------------------------------------------+ | System Variable <-- System Peer Variable | | +-------------------------------------------+ | s.leap <-- p.leap | | s.stratum <-- p.stratum + 1 | | s.offset <-- THETA | | s.jitter <-- PSI | | s.rootdelay <-- p.delta_r + delta | | s.rootdisp <-- p.epsilon_r + p.epsilon + | | p.psi + PHI * (s.t - p.t) | | + |THETA| | | s.refid <-- p.refid | | s.reftime <-- p.reftime | | s.t <-- p.t | +-------------------------------------------+ Figure 25: System Variables Update There is an important detail not shown. The dispersion increment (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from below by MINDISP. In subnets with very fast processors and networks and very small delay and dispersion this forces a monotone-definite increase in s.rootdisp (EPSILON), which avoids loops between peers operating at the same stratum. The system variables are available to dependent application programs as nominal performance statistics. The system offset THETA is the clock offset relative to the available synchronization sources. The system jitter PSI is an estimate of the error in determining this value, elsewhere called the expected error. The root delay DELTA is the total round-trip delay relative to the primary server. The root dispersion EPSILON is the dispersion accumulated over the network from the primary server. Finally, the root synchronization distance is defined as:
LAMBDA = EPSILON + DELTA / 2, which represents the maximum error due all causes and is designated the root synchronization distance. An example of the clock update routine is provided in Appendix A.5.5.4.11.3. Clock Discipline Algorithm
The NTPv4 clock discipline algorithm, shortened to discipline in the following, functions as a combination of two quite philosophically different feedback control systems. In a phase-locked loop (PLL) design, periodic phase updates at update intervals mu seconds are used directly to minimize the time error and indirectly the frequency error. In a frequency-locked loop (FLL) design, periodic frequency updates at intervals mu are used directly to minimize the frequency error and indirectly the time error. As shown in [ref7], a PLL usually works better when network jitter dominates, while an FLL works better when oscillator wander dominates. This section contains an outline of how the NTPv4 design works. An in-depth discussion of the design principles is provided in [ref7], which also includes a performance analysis. The discipline is implemented as the feedback control system shown in Figure 26. The variable theta_r represents the combine algorithm offset (reference phase) and theta_c the VFO offset (control phase). Each update produces a signal V_d representing the instantaneous phase difference theta_r - theta_c. The clock filter for each server functions as a tapped delay line, with the output taken at the tap selected by the clock filter algorithm. The selection, cluster, and combine algorithms combine the data from multiple filters to produce the signal V_s. The loop filter, with impulse response F(t), produces the signal V_c, which controls the VFO frequency omega_c and thus the integral of the phase theta_c which closes the loop. The V_c signal is generated by the clock-adjust process in Section 12. The detailed equations that implement these functions are best presented in the routines of Appendices A.5.5.6 and A.5.6.1.
theta_r + +---------\ +----------------+ NTP --------->| Phase \ V_d | | V_s theta_c - | Detector ------>| Clock Filter |----+ +-------->| / | | | | +---------/ +----------------+ | | | ----- | / \ | | VFO | | \ / | ----- ....................................... | ^ . Loop Filter . | | . +---------+ x +-------------+ . | | V_c . | |<-----| | . | +------.-| Clock | y | Phase/Freq |<---------+ . | Adjust |<-----| Prediction | . . | | | | . . +---------+ +-------------+ . ....................................... Figure 26: Clock Discipline Feedback Loop Ordinarily, the pseudo-linear feedback loop described above operates to discipline the system clock. However, there are cases where a non-linear algorithm offers considerable improvement. One case is when the discipline starts without knowledge of the intrinsic clock frequency. The pseudo-linear loop takes several hours to develop an accurate measurement and during most of that time the poll interval cannot be increased. The non-linear loop described below does this in 15 minutes. Another case is when occasional bursts of large jitter are present due to congested network links. The state machine described below resists error bursts lasting less than 15 minutes. Figure 27 contains a summary of the variables and parameters including the variable (lowercase) or parameter (uppercase) name, formula name, and short description. Unless noted otherwise, all variables have assumed prefix c. The variables t, tc, state, hyster, and count are integers; the remaining variables are floating doubles. The function of each will be explained in the algorithm descriptions below.
+--------+------------+--------------------------+ | Name | Formula | Description | +--------+------------+--------------------------+ | t | timer | seconds counter | | offset | theta | combined offset | | resid | theta_r | residual offset | | freq | phi | clock frequency | | jitter | psi | clock offset jitter | | wander | omega | clock frequency wander | | tc | tau | time constant (log2) | | state | state | state | | adj | adj | frequency adjustment | | hyster | hyster | hysteresis counter | | STEPT | 125 | step threshold (.125 s) | | WATCH | 900 | stepout thresh(s) | | PANICT | 1000 | panic threshold (1000 s) | | LIMIT | 30 | hysteresis limit | | PGATE | 4 | hysteresis gate | | TC | 16 | time constant scale | | AVG | 8 | averaging constant | +--------+------------+--------------------------+ Figure 27: Clock Discipline Variables and Parameters The process terminates immediately if the offset is greater than the panic threshold PANICT (1000 s). The state transition function is described by the rstclock() function in Appendix A.5.5.7. Figure 28 shows the state transition function used by this routine. It has four columns showing, respectively, the state name, predicate and action if the offset theta is less than the step threshold, the predicate and actions otherwise, and finally some comments.
+-------+---------------------+-------------------+--------------+ | State | theta < STEP | theta > STEP | Comments | +-------+---------------------+-------------------+--------------+ | NSET | ->FREQ | ->FREQ | no frequency | | | adjust time | step time | file | +-------+---------------------+-------------------+--------------+ | FSET | ->SYNC | ->SYNC | frequency | | | adjust time | step time | file | +-------+---------------------+-------------------+--------------+ | SPIK | ->SYNC | if < 900 s ->SPIK | outlier | | | adjust freq | else ->SYNC | detected | | | adjust time | step freq | | | | | step time | | +-------+---------------------+-------------------+--------------+ | FREQ | if < 900 s ->FREQ | if < 900 s ->FREQ | initial | | | else ->SYNC | else ->SYNC | frequency | | | step freq | step freq | | | | adjust time | adjust time | | +-------+---------------------+-------------------+--------------+ | SYNC | ->SYNC | if < 900 s ->SPIK | normal | | | adjust freq | else ->SYNC | operation | | | adjust time | step freq | | | | | step time | | +-------+---------------------+-------------------+--------------+ Figure 28: State Transition Function In the table entries, the next state is identified by the arrow -> with the actions listed below. Actions such as adjust time and adjust frequency are implemented by the PLL/FLL feedback loop in the local_clock() routine. A step clock action is implemented by setting the clock directly, but this is done only after the stepout threshold WATCH (900 s) when the offset is more than the step threshold STEPT (.125 s). This resists clock steps under conditions of extreme network congestion. The jitter (psi) and wander (omega) statistics are computed using an exponential average with weight factor AVG. The time constant exponent (tau) is determined by comparing psi with the magnitude of the current offset theta. If the offset is greater than PGATE (4) times the clock jitter, the hysteresis counter hyster is reduced by two; otherwise, it is increased by one. If hyster increases to the upper limit LIMIT (30), tau is increased by one; if it decreases to the lower limit -LIMIT (-30), tau is decreased by one. Normally, tau hovers near MAXPOLL, but quickly decreases if a temperature spike causes a frequency surge.
12. Clock-Adjust Process
The actual clock-adjust process runs at one-second intervals to add the frequency correction and a fixed percentage of the residual offset theta_r. The theta_r is, in effect, the exponential decay of the theta value produced by the loop filter at each update. The TC parameter scales the time constant to match the poll interval for convenience. Note that the dispersion EPSILON increases by PHI at each second. The clock-adjust process includes a timer interrupt facility driving the seconds counter c.t. It begins at zero when the service starts and increments once each second. At each interrupt, the clock_adjust() routine is called to incorporate the clock discipline time and frequency adjustments, then the associations are scanned to determine if the seconds counter equals or exceeds the p.next state variable defined in the next section. If so, the poll process is called to send a packet and compute the next p.next value. An example of the clock-adjust process is shown by the clock_adjust() routine in Appendix A.5.6.1.13. Poll Process
Each association supports a poll process that runs at regular intervals to construct and send packets in symmetric, client, and broadcast server associations. It runs continuously, whether or not servers are reachable in order to manage the clock filter and reach register.13.1. Poll Process Variables
Figure 29 summarizes the common names, formula names, and a short description of the poll process variables (lowercase) and parameters (uppercase). Unless noted otherwise, all variables have assumed prefix p.
+---------+---------+--------------------+ | Name | Formula | Description | +---------+---------+--------------------+ | hpoll | hpoll | host poll exponent | | last | last | last poll time | | next | next | next poll time | | reach | reach | reach register | | unreach | unreach | unreach counter | | UNREACH | 24 | unreach limit | | BCOUNT | 8 | burst count | | BURST | flag | burst enable | | IBURST | flag | iburst enable | +---------+---------+--------------------+ Figure 29: Poll Process Variables and Parameters The poll process variables are allocated in the association data structure along with the peer process variables. The following is a detailed description of the variables. The parameters will be called out in the following text. hpoll: signed integer representing the poll exponent, in log2 seconds last: integer representing the seconds counter when the most recent packet was sent next: integer representing the seconds counter when the next packet is to be sent reach: 8-bit integer shift register shared by the peer and poll processes unreach: integer representing the number of seconds the server has been unreachable13.2. Poll Process Operations
As described previously, once each second the clock-adjust process is called. This routine calls the poll routine for each association in turn. If the time for the next poll message is greater than the seconds counter, the routine returns immediately. Symmetric (modes 1, 2), client (mode 3), and broadcast server (mode 5) associations routinely send packets. A broadcast client (mode 6) association runs the routine to update the reach and unreach variables, but does not send packets. The poll process calls the transmit process to send a packet. If in a burst (burst > 0), nothing further is done except call the poll update routine to set the next poll interval.
If not in a burst, the reach variable is shifted left by one bit, with zero replacing the rightmost bit. If the server has not been heard for the last three poll intervals, the clock filter routine is called to increase the dispersion. An example is shown in Appendix A.5.7.3. If the BURST flag is lit and the server is reachable and a valid source of synchronization is available, the client sends a burst of BCOUNT (8) packets at each poll interval. The interval between packets in the burst is two seconds. This is useful to accurately measure jitter with long poll intervals. If the IBURST flag is lit and this is the first packet sent when the server has been unreachable, the client sends a burst. This is useful to quickly reduce the synchronization distance below the distance threshold and synchronize the clock. If the P_MANY flag is lit in the p.flags word of the association, this is a manycast client association. Manycast client associations send client mode packets to designated multicast group addresses at MINPOLL intervals. The association starts out with a TTL of 1. If by the time of the next poll there are fewer than MINCLOCK servers have been mobilized, the ttl is increased by one. If the ttl reaches the limit TTLMAX, without finding MINCLOCK servers, the poll interval increases until reaching BEACON, when it starts over from the beginning. The poll() routine includes a feature that backs off the poll interval if the server becomes unreachable. If reach is nonzero, the server is reachable and unreach is set to zero; otherwise, unreach is incremented by one for each poll to the maximum UNREACH. Thereafter for each poll hpoll is increased by one, which doubles the poll interval up to the maximum MAXPOLL determined by the poll_update() routine. When the server again becomes reachable, unreach is set to zero, hpoll is reset to the tc system variable, and operation resumes normally. A packet is sent by the transmit process. Some header values are copied from the peer variables left by a previous packet and others from the system variables. Figure 30 shows which values are copied to each header field. In those implementations, using floating double data types for root delay and root dispersion, these must be converted to NTP short format. All other fields are either copied intact from peer and system variables or struck as a timestamp from the system clock.
+-----------------------------------+ | Packet Variable <-- Variable | +-----------------------------------+ | x.leap <-- s.leap | | x.version <-- s.version | | x.mode <-- s.mode | | x.stratum <-- s.stratum | | x.poll <-- s.poll | | x.precision <-- s.precision | | x.rootdelay <-- s.rootdelay | | x.rootdisp <-- s.rootdisp | | x.refid <-- s.refid | | x.reftime <-- s.reftime | | x.org <-- p.xmt | | x.rec <-- p.dst | | x.xmt <-- clock | | x.keyid <-- p.keyid | | x.digest <-- md5 digest | +-----------------------------------+ Figure 30: xmit_packet Packet Header The poll update routine is called when a valid packet is received and immediately after a poll message has been sent. If in a burst, the poll interval is fixed at 2 s; otherwise, the host poll exponent hpoll is set to the minimum of ppoll from the last packet received and hpoll from the poll routine, but not less than MINPOLL or greater than MAXPOLL. Thus, the clock discipline can be oversampled but not undersampled. This is necessary to preserve subnet dynamic behavior and protect against protocol errors. The poll exponent is converted to an interval, which, when added to the last poll time variable, determines the value of the next poll time variable. Finally, the last poll time variable is set to the current seconds counter.14. Simple Network Time Protocol (SNTP)
Primary servers and clients complying with a subset of NTP, called the Simple Network Time Protocol (SNTPv4) [RFC4330], do not need to implement the mitigation algorithms described in Section 9 and following sections. SNTP is intended for primary servers equipped with a single reference clock, as well as for clients with a single upstream server and no dependent clients. The fully developed NTPv4 implementation is intended for secondary servers with multiple upstream servers and multiple downstream servers or clients. Other than these considerations, NTP and SNTP servers and clients are completely interoperable and can be intermixed in NTP subnets.
An SNTP primary server implementing the on-wire protocol described in Section 8 has no upstream servers except a single reference clock. In principle, it is indistinguishable from an NTP primary server that has the mitigation algorithms and therefore capable of mitigating between multiple reference clocks. Upon receiving a client request, an SNTP primary server constructs and sends the reply packet as described in Figure 31. Note that the dispersion field in the packet header must be updated as described in Section 5. +-----------------------------------+ | Packet Variable <-- Variable | +-----------------------------------+ | x.leap <-- s.leap | | x.version <-- r.version | | x.mode <-- 4 | | x.stratum <-- s.stratum | | x.poll <-- r.poll | | x.precision <-- s.precision | | x.rootdelay <-- s.rootdelay | | x.rootdisp <-- s.rootdisp | | x.refid <-- s.refid | | x.reftime <-- s.reftime | | x.org <-- r.xmt | | x.rec <-- r.dst | | x.xmt <-- clock | | x.keyid <-- r.keyid | | x.digest <-- md5 digest | +-----------------------------------+ Figure 31: fast_xmit Packet Header An SNTP client implementing the on-wire protocol has a single server and no dependent clients. It can operate with any subset of the NTP on-wire protocol, the simplest approach using only the transmit timestamp of the server packet and ignoring all other fields. However, the additional complexity to implement the full on-wire protocol is minimal so that a full implementation is encouraged.15. Security Considerations
NTP security requirements are even more stringent than most other distributed services. First, the operation of the authentication mechanism and the time synchronization mechanism are inextricably intertwined. Reliable time synchronization requires cryptographic keys that are valid only over a designated time interval; but, time intervals can be enforced only when participating servers and clients
are reliably synchronized to UTC. In addition, the NTP subnet is hierarchical by nature, so time and trust flow from the primary servers at the root through secondary servers to the clients at the leaves. An NTP client can claim to have authentic time to dependent applications only if all servers on the path to the primary servers are authenticated. In NTP each server authenticates the next lower stratum servers and authenticates by induction the lowest stratum (primary) servers. It is important to note that authentication in the context of NTP does not necessarily imply the time is correct. An NTP client mobilizes a number of concurrent associations with different servers and uses a crafted agreement algorithm to pluck truechimers from the population possibly including falsetickers. The NTP specification assumes that the goal of the intruder is to inject false time values, disrupt the protocol, or clog the network, servers, or clients with spurious packets that exhaust resources and deny service to legitimate applications. There are a number of defense mechanisms already built in the NTP architecture, protocol, and algorithms. The on-wire timestamp exchange scheme is inherently resistant to spoofing, packet-loss, and replay attacks. The engineered clock filter, selection and clustering algorithms are designed to defend against evil cliques of Byzantine traitors. While not necessarily designed to defeat determined intruders, these algorithms and accompanying sanity checks have functioned well over the years to deflect improperly operating but presumably friendly scenarios. However, these mechanisms do not securely identify and authenticate servers to clients. Without specific further protection, an intruder can inject any or all of the following attacks: 1. An intruder can intercept and archive packets forever, as well as all the public values ever generated and transmitted over the net. 2. An intruder can generate packets faster than the server, network or client can process them, especially if they require expensive cryptographic computations. 3. In a wiretap attack, the intruder can intercept, modify, and replay a packet. However, it cannot permanently prevent onward transmission of the original packet; that is, it cannot break the wire, only tell lies and congest it. Generally, the modified packet cannot arrive at the victim before the original packet, nor does it have the server private keys or identity parameters.
4. In a middleman or masquerade attack, the intruder is positioned between the server and client, so it can intercept, modify and replay a packet and prevent onward transmission of the original packet. However, the middleman does not have the server private keys. The NTP security model assumes the following possible limitations: 1. The running times for public key algorithms are relatively long and highly variable. In general, the performance of the time synchronization function is badly degraded if these algorithms must be used for every NTP packet. 2. In some modes of operation, it is not feasible for a server to retain state variables for every client. It is however feasible to regenerated them for a client upon arrival of a packet from that client. 3. The lifetime of cryptographic values must be enforced, which requires a reliable system clock. However, the sources that synchronize the system clock must be trusted. This circular interdependence of the timekeeping and authentication functions requires special handling. 4. Client security functions must involve only public values transmitted over the net. Private values must never be disclosed beyond the machine on which they were created, except in the case of a special trusted agent (TA) assigned for this purpose. Unlike the Secure Shell (SSH) security model, where the client must be securely authenticated to the server, in NTP the server must be securely authenticated to the client. In SSH, each different interface address can be bound to a different name, as returned by a reverse-DNS query. In this design, separate public/private key pairs may be required for each interface address with a distinct name. A perceived advantage of this design is that the security compartment can be different for each interface. This allows a firewall, for instance, to require some interfaces to authenticate the client and others not. In the case of NTP as specified herein, NTP broadcast clients are vulnerable to disruption by misbehaving or hostile SNTP or NTP broadcast servers elsewhere in the Internet. Such disruption can be minimized by several approaches. Filtering can be employed to limit the access of NTP clients to known or trusted NTP broadcast servers. Such filtering will prevent malicious traffic from reaching the NTP clients. Cryptographic authentication at the client will only allow
timing information from properly signed NTP messages to be utilized in synchronizing its clock. Higher levels of authentication may be gained by the use of the Autokey mechanism [RFC5906]. Section 8 describes a potential security concern with the replay of client requests. Following the recommendations in that section provides protection against such attacks. It should be noted that this specification is describing an existing implementation. While the security shortfalls of the MD5 algorithm are well-known, its use in the NTP specification is consistent with widescale deployment in the Internet community.16. IANA Considerations
UDP/TCP Port 123 was previously assigned by IANA for this protocol. The IANA has assigned the IPv4 multicast group address 224.0.1.1 and the IPv6 multicast address ending :101 for NTP. This document introduces NTP extension fields allowing for the development of future extensions to the protocol, where a particular extension is to be identified by the Field Type sub-field within the extension field. IANA has established and will maintain a registry for Extension Field Types associated with this protocol, populating this registry with no initial entries. As future needs arise, new Extension Field Types may be defined. Following the policies outlined in [RFC5226], new values are to be defined by IETF Review. The IANA has created a new registry for NTP Reference Identifier codes. This includes the current codes defined in Section 7.3, and may be extended on a First-Come-First-Serve (FCFS) basis. The format of the registry is: +------+----------------------------------------------------------+ | ID | Clock Source | +------+----------------------------------------------------------+ | GOES | Geosynchronous Orbit Environment Satellite | | GPS | Global Position System | | ... | ... | +------+----------------------------------------------------------+ Figure 32: Reference Identifier Codes The IANA has created a new registry for NTP Kiss-o'-Death codes. This includes the current codes defined in Section 7.4, and may be extended on a FCFS basis. The format of the registry is:
+------+------------------------------------------------------------+ | Code | Meaning | +------+------------------------------------------------------------+ | ACST | The association belongs to a unicast server. | | AUTH | Server authentication failed. | | ... | ... | +------+------------------------------------------------------------+ Figure 33: Kiss Codes For both Reference Identifiers and Kiss-o'-Death codes, IANA is requested to never assign a code beginning with the character "X", as this is reserved for experimentation and development.17. Acknowledgements
The editors would like to thank Karen O'Donoghue, Brian Haberman, Greg Dowd, Mark Elliot, Harlan Stenn, Yaakov Stein, Stewart Bryant, and Danny Mayer for technical reviews and specific text contributions to this document.18. References
18.1. Normative References
[RFC0768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, August 1980. [RFC0791] Postel, J., "Internet Protocol", STD 5, RFC 791, September 1981. [RFC0793] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, September 1981. [RFC1321] Rivest, R., "The MD5 Message-Digest Algorithm", RFC 1321, April 1992. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.18.2. Informative References
[CGPM] Bureau International des Poids et Mesures, "Comptes Rendus de la 15e CGPM", 1976. [ITU-R_TF.460] International Telecommunications Union, "ITU-R TF.460 Standard-frequency and time-signal emissions", February 2002.
[RFC1305] Mills, D., "Network Time Protocol (Version 3) Specification, Implementation and Analysis", RFC 1305, March 1992. [RFC1345] Simonsen, K., "Character Mnemonics and Character Sets", RFC 1345, June 1992. [RFC4330] Mills, D., "Simple Network Time Protocol (SNTP) Version 4 for IPv4, IPv6 and OSI", RFC 4330, January 2006. [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", BCP 26, RFC 5226, May 2008. [RFC5906] Haberman, B., Ed. and D. Mills, "Network Time Protocol Version 4: Autokey Specification", RFC 5906, June 2010. [ref6] Marzullo and S. Owicki, "Maintaining the time in a distributed system", ACM Operating Systems Review 19, July 1985. [ref7] Mills, D.L., "Computer Network Time Synchronization - the Network Time Protocol", CRC Press, 304 pp, 2006. [ref9] Mills, D.L., Electrical and Computer Engineering Technical Report 06-6-1, NDSS, June 2006, "Network Time Protocol Version 4 Reference and Implementation Guide", 2006.