accuracies achievable depend upon the statistical properties of the outbound and inbound data paths. Further analysis and experimental results bearing on this issue can be found in [MIL90] and in Appendix H. Test 4 requires that the calculated delay be within <169>reasonable<170> bounds: <$Etest4~<<-~(| delta |~<<~roman NTP.MAXDISPERSE~bold and~epsilon~<<~roman NTP.MAXDISPERSE)>; /* test 4 */ Test 5 is implemented only if the authentication mechanism described in Appendix C is implemented. It requires either that authentication be explicitly disabled or that the authenticator be present and correct as determined by the decrypt procedure. #ifdef (authentication implemented) /* test 5 */ <$Etest5~<<-~( roman {(peer.config~=~1~bold and~peer.authenable~=~0)~bold or~ peer.authentic~=~1})>; #endef Test 6 requires the peer clock be synchronized and the interval since the peer clock was last updated be positive and less than NTP.MAXAGE. Test 7 insures that the host will not synchronize on a peer with greater stratum. Test 8 requires that the header contains <169>reasonable<170> values for the pkt.rootdelay and pkt.rootdispersion fields. <$Etest6~<<-~( roman pkt.leap~!=~11 sub 2> and /* test 6 */ <$Eroman {pkt.reftime~<<=~pkt.xmt~<<~pkt.reftime~+~NTP.MAXAGE}>) <$Etest7~<<-~roman {pkt.stratum ~<<=~sys.stratum}> and /* test 7 */ <$Eroman {pkt.stratum ~<<~NTP.MAXSTRATUM}>; <$Etest8~<<-~( roman {| pkt.rootdelay |~<<~NTP.MAXDISPERSE}> and /* test 8 */ <$Eroman {pkt.rootdispersion~<<~NTP.MAXDISPERSE})>; With respect to further processing, the packet includes valid (synchronized) data if tests one through four succeed <$E(test1~&~test2~&~test3~&~test4~=~1)>, regardless of the remaining tests. Only packets with valid data can be used to calculate offset, delay and dispersion values. The packet includes a valid header if tests five through eight succeed <$E(test5~&~test6~&~test7~&~test8~=~1)>, regardless of the remaining tests. Only packets with valid headers can be used to determine whether a peer can be selected for synchronization. Note that <$Etest1> and <$Etest2> are not used in broadcast mode (forced to true), since the originate and receive timestamps are undefined. The clock-filter procedure is called to produce the delay (peer.delay), offset (peer.offset) and dispersion (peer.dispersion) for the peer. Specification of the clock-filter algorithm is not an integral part of
the NTP specification, since there may be other algorithms that work well in practice. However, one found to work well in the Internet environment is described in Section 4 and its use is recommended. if (not valid header) exit; <$Eroman peer.leap~<<-~roman pkt.leap>; /* copy packet variables */ <$Eroman peer.stratum~<<-~roman pkt.stratum>; <$Eroman peer.precision~<<-~roman pkt.precision>; <$Eroman peer.rootdelay~<<-~roman pkt.rootdelay>; <$Eroman peer.rootdispersion~<<-~roman pkt.rootdispersion>; <$Eroman peer.refid~<<-~roman pkt.refid>; <$Eroman peer.reftime~<<-~roman pkt.reftime>; if (valid data) call clock-filter(<$Etheta ,~delta ,~epsilon>); /* process sample */ end packet procedure; Clock-Update Procedure The clock-update procedure is called from the receive procedure when valid clock offset, delay and dispersion data have been determined by the clock-filter procedure for the current peer. The result of the clock-selection and clock-combining procedures is the final clock correction <$ETHETA>, which is used by the local-clock procedure to update the local clock. If no candidates survive these procedures, the clock-update procedure exits without doing anything further. begin clock-update procedure call clock-select; /* select clock source */ if (<$Eroman sys.peer~!=~peer>) exit; It may happen that the local clock may be reset, rather than slewed to its final value. In this case the clear procedure is called for every peer to purge the clock filter, reset the poll interval and reselect the synchronization source, if necessary. Note that the local-clock procedure sets the leap bits sys.leap to <169>unsynchronized<170> 112 in this case, so that no other peer will attempt to synchronize to the host until the host once again selects a peer for synchronization. The distance procedure calculates the root delay <$EDELTA>, root dispersion <$EEPSILON> and root synchronization distance <$ELAMBDA> via the peer to the root of the synchronization subnet. The host will not synchronize to the selected peer if the distance is greater than NTP.MAXDISTANCE. The reason for the minimum clamp at NTP.MINDISPERSE is to discourage subnet route flaps that can happen with Bellman-Ford algorithms and small roundtrip delays. <$ELAMBDA~<M=O> <~>an distance (peer)>; /* update system variables */
<B> if (<$ELAMBDA~>>=~roman NTP.MAXDISTANCE>) exit; <$Eroman sys.leap~<<-~roman peer.leap>; <$Eroman sys.stratum~<<-~roman peer.stratum~+~1>; <$Eroman sys.refid~<<-~roman peer.peeraddr>; call local-clock; if (local clock reset) begin /* if reset, clear state variables */ <$Eroman sys.leap~<<-~11 sub 2>; for (all peers) call clear; endif else begin <$Eroman sys.peer~<<-~peer>; /* if not, adjust local clock */ <$Eroman sys.rootdelay~<<-~DELTA>; <$Eroman sys.rootdispersion~<<-~EPSILON~+~max ( epsilon sub xi~+~| THETA |,~roman NTP.MINDISPERSE)>; endif <$Eroman sys.reftime~<<-~roman sys.clock>; end clock-update procedure; In some system configurations a precise source of timing information is available in the form of a train of timing pulses spaced at one-second intervals. Usually, this is in addition to a source of timecode information, such as a radio clock or even NTP itself, to number the seconds, minutes, hours and days. In these configurations the system variables are set to refer to the source from which the pulses are derived. For those configurations which support a primary reference source, such as a radio clock or calibrated atomic clock, the stratum is set at one as long as this is the actual synchronization source and whether or not the primary-clock procedure is used. Specification of the clock-selection and local-clock algorithms is not an integral part of the NTP specification, since there may be other algorithms which provide equivalent performance. However, a clock- selection algorithm found to work well in the Internet environment is described in Section 4, while a local-clock algorithm is described in Section 5 and their use is recommended. The clock-selection algorithm described in Section 4 usually picks the peer at the lowest stratum and minimum synchronization distance among all those available, unless that peer appears to be a falseticker. The result is that the algorithms all work to build a minimum-weight spanning tree relative to the primary reference time servers and thus a hierarchical-master-slave synchronization subnet. Primary-Clock Procedure When a primary reference source such as a radio clock is connected to the host, it is convenient to incorporate its information into the data base as if the clock were represented as an ordinary peer. In the primary-clock procedure the clock is polled once a minute or so and the
returned timecode used to produce a new update for the local clock. When peer.timer decrements to zero for a primary clock peer, the transmit procedure is not called; rather, the radio clock is polled, usually using an ASCII string specified for this purpose. When a valid timecode is received from the radio clock, it is converted to NTP timestamp format and the peer variables updated. The value of peer.leap is set depending on the status of the leap-warning bit in the timecode, if available, or manually by the operator. The value for peer.peeraddr, which will become the value of sys.refid when the clock-update procedure is called, is set to an ASCII string describing the clock type (see Appendix A). begin primary-clock-update procedure <$Eroman peer.leap~<<-~"from"~radio~or~operator>; /* copy variables */ <$Eroman peer.peeraddr~<<-~ASCII~identifier>; <$Eroman peer.rec~<<-~radio~timestamp>; <$Eroman peer.reach~<<-~1>; call clock-filter(<$Eroman {sys.clock~-~peer.rec,~0,~1~<< <<~peer.precision}>); /* process sample */ call clock-update; /* update local clock */ end primary-clock-update procedure; Initialization Procedures The initialization procedures are used to set up and initialize the system, its peers and associations. Initialization Procedure The initialization procedure is called upon reboot or restart of the NTP daemon. The local clock is presumably undefined at reboot; however, in some equipment an estimate is available from the reboot environment, such as a battery-backed clock/calendar. The precision variable is determined by the intrinsic architecture of the local hardware clock. The authentication variables are used only if the authentication mechanism described in Appendix C is implemented. The values of these variables are determined using procedures beyond the scope of NTP itself. begin initialization procedure #ifdef (authentication implemented) / * see Appendix C */ <$Eroman sys.keys~<<-~as~required>; #endef; <$Eroman sys.leap~<<-~11 sub 2>; /* copy variables */ <$Eroman sys.stratum~<<-~0~(undefined)>; <$Eroman sys.precision~<<-~host~precision>; <$Eroman sys.rootdelay~<<-~0~(undefined)>;
<$Eroman sys.rootdispersion~<<-~0~(undefined)>; <$Eroman sys.refid~<<-~0~(undefined)>; <$Eroman sys.reftime~<<-~0~(undefined)>; <$Eroman sys.clock~<<-~external~reference>; <$Eroman sys.peer~<<-~roman NULL>; <$Eroman sys.poll~<<-~roman NTP.MINPOLL>; for (all configured peers) /* create configured associations */ call initialization-instantiation procedure; end initialization procedure; Initialization-Instantiation Procedure This implementation-specific procedure is called from the initialization procedure to define an association. The addresses and modes of the peers are determined using information read during the reboot procedure or as the result of operator commands. The authentication variables are used only if the authentication mechanism described in Appendix C is implemented. The values of these variables are determined using procedures beyond the scope of NTP itself. With the authentication bits set as suggested, only properly authenticated peers can become the synchronization source. begin initialization-instantiation procedure <$Eroman peer.config~<<-~1>; #ifdef (authentication implemented) /* see Appendix C */ <$Eroman peer.authenable~<<-~1~(suggested)>; <$Eroman peer.authentic~<<-~0>; <$Eroman peer.hostkeyid~<<-~as~required>; <$Eroman peer.peerkeyid~<<-~0>; #endef; <$Eroman peer.peeraddr~<<-~peer~IP~address>; /* copy variables */ <$Eroman peer.peerport~<<-~roman NTP.PORT>; <$Eroman peer.hostaddr~<<-~host~IP~address>; <$Eroman peer.hostport~<<-~roman NTP.PORT>; <$Eroman peer.mode~<<-~host~mode>; <$Eroman peer.peerpoll~<<-~0~(undefined)>; <$Eroman peer.timer~<<-~0>; <$Eroman peer.delay~<<-~0~(undefined)>; <$Eroman peer.offset~<<-~0~(undefined)>; call clear; /* initialize association */ end initialization-instantiation procedure; Receive-Instantiation Procedure The receive-instantiation procedure is called from the receive procedure when a new peer is discovered. It initializes the peer variables and mobilizes the association. If the message is from a peer operating in
client mode (3), the host mode is set to server mode (4); otherwise, it is set to symmetric passive mode (2). The authentication variables are used only if the authentication mechanism described in Appendix C is implemented. If implemented, only properly authenticated non-configured peers can become the synchronization source. begin receive-instantiation procedure #ifdef (authentication implemented) /* see Appendix C */ <$Eroman peer.authenable~<<-~0>; <$Eroman peer.authentic~<<-~0>; <$Eroman peer.hostkeyid~<<-~as~required>; <$Eroman peer.peerkeyid~<<-~0>; #endef <$Eroman peer.config~<<-~0>; /* copy variables */ <$Eroman peer.peeraddr~<<-~roman pkt.peeraddr>; <$Eroman peer.peerport~<<-~roman pkt.peerport>; <$Eroman peer.hostaddr~<<-~roman pkt.hostaddr>; <$Eroman peer.hostport~<<-~roman pkt.hostport>; if (pkt.mode = 3) /* determine mode */ <$Eroman peer.mode~<<-~4>; else <$Eroman peer.mode~<<-~2>; <$Eroman peer.peerpoll~<<-~0~(undefined)>; <$Eroman peer.timer~<<-~0>; <$Eroman peer.delay~<<-~0~(undefined)>; <$Eroman peer.offset~<<-~0~(undefined)>; call clear; /* initialize association */ end receive-instantiation procedure; Primary Clock-Instantiation Procedure This procedure is called from the initialization procedure in order to set up the state variables for the primary clock. The value for peer.precision is determined from the radio clock specification and hardware interface. The value for peer.rootdispersion is nominally ten times the inherent maximum error of the radio clock; for instance, <$E10~mu s> for a calibrated atomic clock, 10 ms for a WWVB or GOES radio clock and 100 ms for a less accurate WWV radio clock. begin clock-instantiation procedure <$Eroman peer.config~<<-~1>; /* copy variables */ <$Eroman peer.peeraddr~<<-~0~undefined>; <$Eroman peer.peerport~<<-~0~(not~used)>; <$Eroman peer.hostaddr~<<-~0~(not~used)>; <$Eroman peer.hostport~<<-~0~(not~used)>; <$Eroman peer.leap~<<-~11 sub 2>;
<$Eroman peer.mode~<<-~0~(not~used)>; <$Eroman peer.stratum~<<-~0>; <$Eroman peer.peerpoll~<<-~0~(undefined)>; <$Eroman peer.precision~<<-~clock~precision>; <$Eroman peer.rootdelay~<<-~0>; <$Eroman peer.rootdispersion~<<-~clock~dispersion>; <$Eroman peer.refid~<<-~0~(not~used)>; <$Eroman peer.reftime~<<-~0~(undefined)>; <$Eroman peer.timer~<<-~0>; <$Eroman peer.delay~<<-~0~(undefined)>; <$Eroman peer.offset~<<-~0~(undefined)>; call clear; /* initialize association */ end clock-instantiation procedure; In some configurations involving a calibrated atomic clock or LORAN-C receiver, the primary reference source may provide only a seconds pulse, but lack a complete timecode from which the numbering of the seconds, etc., can be derived. In these configurations seconds numbering can be derived from other sources, such as a radio clock or even other NTP peers. In these configurations the primary clock variables should reflect the primary reference source, not the seconds-numbering source; however, if the seconds-numbering source fails or is known to be operating incorrectly, updates from the primary reference source should be suppressed as if it had failed. Clear Procedure The clear procedure is called when some event occurs that results in a significant change in reachability state or potential disruption of the local clock. begin clear procedure <$Eroman peer.org~<<-~0~(undefined)>; /* mark timestamps undefined */ <$Eroman peer.rec~<<-~0~(undefined)>; <$Eroman peer.xmt~<<-~0~(undefined)>; <$Eroman peer.reach~<<-~0>; /* reset state variables */ <$Eroman peer.filter~<<-~[0,~,0,~roman NTP.MAXDISPERSE]>; /* all stages */ <$Eroman peer.valid~<<-~0>; <$Eroman peer.dispersion~<<-~roman NTP.MAXDISPERSE>; <$Eroman {peer.hostpoll~<<-~NTP.MINPOLL}>; /* reset poll interval */ call poll-update; call clock-select; /* select clock source */ end clear procedure; Poll-Update Procedure
The poll-update procedure is called when a significant event occurs that may result in a change of the poll interval or peer timer. It checks the values of the host poll interval (peer.hostpoll) and peer poll interval (peer.peerpoll) and clamps each within the valid range. If the peer is selected for synchronization, the value is further clamped as a function of the computed compliance (see Section 5). begin poll-update procedure <$Etemp~<<-~roman peer.hostpoll>; /* determine host poll interval */ if (<$Epeer~=~roman sys.peer>) <$Etemp~<<-~min (temp,~roman {sys.poll,~NTP.MAXPOLL)}>; else <$Etemp~<<-~min (temp,~roman NTP.MAXPOLL)>; <$Eroman peer.hostpoll~<<-~max (temp,~roman NTP.MINPOLL)>; <$Etemp~<<-~1~<< << ~min ( roman {peer.hostpoll,~max (peer.peerpoll,~NTP.MINPOLL)})>; If the poll interval is unchanged and the peer timer is zero, the timer is simply reset. If the poll interval is changed and the new timer value is greater than the present value, no additional action is necessary; otherwise, the peer timer must be reduced. When the peer timer must be reduced it is important to discourage tendencies to synchronize transmissions between the peers. A prudent precaution is to randomize the first transmission after the timer is reduced, for instance by the sneaky technique illustrated. if (peer.timer = 0) /* reset peer timer */ <$Eroman peer.timer~<<-~temp>; else if (<$Eroman peer.timer~>>~temp>) <$Eroman peer.timer~<<-~( roman sys.clock~&~(temp~- ~1))~+~1>; end poll-update procedure; Synchronization Distance Procedure The distance procedure calculates the synchronization distance from the peer variables for the peer peer. begin distance(peer) procedure; <$EDELTA~<<-~roman {peer.rootdelay~+~|peer.delay|}>; <$EEPSILON~<<-~roman {peer.rootdispersion~+~peer.dispersion~+~phi (sys.clock~-~peer.update) }>; <$ELAMBDA~<<-~EPSILON~+~{| DELTA |} over 2> ; end distance procedure; Note that, while <$EDELTA> may be negative in some cases, both
<$EEPSILON> and <$ELAMBDA> are always positive. Access Control Issues The NTP design is such that accidental or malicious data modification (tampering) or destruction (jamming) at a time server should not in general result in timekeeping errors elsewhere in the synchronization subnet. However, the success of this approach depends on redundant time servers and diverse network paths, together with the assumption that tampering or jamming will not occur at many time servers throughout the synchronization subnet at the same time. In principle, the subnet vulnerability can be engineered through the selection of time servers known to be trusted and allowing only those time servers to become the synchronization source. The authentication procedures described in Appendix C represent one mechanism to enforce this; however, the encryption algorithms can be quite CPU-intensive and can seriously degrade accuracy, unless precautions such as mentioned in the description of the transmit procedure are taken. While not a required feature of NTP itself, some implementations may include an access-control feature that prevents unauthorized access and controls which peers are allowed to update the local clock. For this purpose it is useful to distinguish between three categories of access: those that are preauthorized as trusted, preauthorized as friendly and all other (non-preauthorized) accesses. Presumably, preauthorization is accomplished by entries in the configuration file or some kind of ticket-management system such as Kerberos [STE88]. In this model only trusted accesses can result in the peer becoming the synchronization source. While friendly accesses cannot result in the peer becoming the synchronization source, NTP messages and timestamps are returned as specified. It does not seem useful to maintain a secret clock, as would result from restricting non-preauthorized accesses, unless the intent is to hide the existence of the time server itself. Well-behaved Internet hosts are expected to return an ICMP service-unavailable error message if a service is not implemented or resources are not available; however, in the case of NTP the resources required are minimal, so there is little need to restrict requests intended only to read the clock. A simple but effective access-control mechanism is then to consider all associations preconfigured in a symmetric mode or client mode (modes 1, 2 and 3) as trusted and all other associations, preconfigured or not, as friendly. If a more comprehensive trust model is required, the design can be based on an access-control list with each entry consisting of a 32-bit Internet address, 32-bit mask and three-bit mode. If the logical AND of the source address (pkt.peeraddr) and the mask in an entry matches the corresponding address in the entry and the mode (pkt.mode) matches the mode in the entry, the access is allowed; otherwise an ICMP error message is returned to the requestor. Through appropriate choice of
mask, it is possible to restrict requests by mode to individual addresses, a particular subnet or net addresses, or have no restriction at all. The access-control list would then serve as a filter controlling which peers could create associations. Filtering and Selection Algorithms A most important factor affecting the accuracy and reliability of time distribution is the complex of algorithms used to reduce the effect of statistical errors and falsetickers due to failure of various subnet components, reference sources or propagation media. The algorithms suggested in this section were developed and refined over several years of operation in the Internet under widely varying topologies, speeds and traffic regimes. While these algorithms are believed the best available at the present time, they are not an integral part of the NTP specification, since other algorithms with similar or superior performance may be devised in future. However, it is important to observe that not all time servers or clients in an NTP synchronization subnet must implement these algorithms. For instance, simple workstations may dispense with one or both of them in the interests of simplicity if accuracy and reliability requirements justify. Nevertheless, it would be expected that an NTP server providing synchronization to a sizable community, such as a university campus or research laboratory, would be expected to implement these algorithms or others proved to have equivalent functionality. A comprehensive discussion of the design principles and performance is given in [MIL91a]. In order for the NTP filter and selection algorithms to operate effectively, it is useful to have a measure of recent sample variance recorded for each peer. The measure adopted is based on first-order differences, which are easy to compute and effective for the purposes intended. There are two measures, one called the filter dispersion <$Eepsilon sub sigma> and the other the select dispersion <$Eepsilon sub xi>. Both are computed as the weighted sum of the clock offsets in a temporary list sorted by synchronization distance. If <$Etheta sub i ~(0~<<=~i~<<~n)> is the offset of the ith entry, then the sample difference <$Eepsilon sub ij> of the ith entry relative to the jth entry is defined <$Eepsilon sub ij~<~>=~| theta sub i~-~theta sub j |> . The dispersion relative to the jth entry is defined <$Eepsilon sub j> and computed as the weighted sum <$Eepsilon sub j~=~sum from {i~=~0} to {n~-~1}~epsilon sub ij~w~sup {i+1}> , where w is a weighting factor chosen to control the influence of synchronization distance in the dispersion budget. In the NTP algorithms w is chosen less than <$E1 / 2>: <$Ew~=~roman NTP.FILTER> for filter dispersion and <$Ew~=~roman NTP.SELECT> for select dispersion. The
(absolute) dispersion <$Eepsilon sub sigma> and <$Eepsilon sub xi> as used in the NTP algorithms are defined relative to the 0th entry <$Eepsilon sub 0>. There are two procedures described in the following, the clock-filter procedure, which is used to select the best offset samples from a given clock, and the clock-selection procedure, which is used to select the best clock among a hierarchical set of clocks. Clock-Filter Procedure The clock-filter procedure is executed upon arrival of an NTP message or other event that results in new data samples. It takes arguments of the form (<$Etheta ,~delta ,~epsilon>), where <$Etheta> is a sample clock offset measurement and <$Edelta> and <$Eepsilon> are the associated roundtrip delay and dispersion. It determines the filtered clock offset (peer.offset), roundtrip delay (peer.delay) and dispersion (peer.dispersion). It also updates the dispersion of samples already recorded and saves the current time (peer.update). The basis of the clock-filter procedure is the filter shift register (peer.filter), which consists of NTP.SHIFT stages, each stage containing a 3-tuple <$E[ theta sub i ,~delta sub i ,~epsilon sub i ]>, with indices numbered from zero on the left. The filter is initialized with the value <$E[0,~0,~roman NTP.MAXDISPERSE]> in all stages by the clear procedure. New data samples are shifted into the filter at the left end, causing first NULLs then old samples to fall off the right end. The packet procedure provides samples of the form (<$Etheta ,~delta ,~epsilon>) as new updates arrive, while the transmit procedure provides samples of the form <$E[0,~0,~roman NTP.MAXDISPERSE]> when two poll intervals elapse without a fresh update. While the same symbols (<$Etheta ,~delta ,~epsilon>) are used here for the arguments, clock- filter contents and peer variables, the meaning will be clear from context. The following pseudo-code describes this procedure. begin clock-filter procedure (<$Etheta ,~delta ,~epsilon>) The dispersion <$Eepsilon sub i> for all valid samples in the filter register must be updated to account for the skew-error accumulation since the last update. These samples are also inserted on a temporary list with entry format <$E[distance,index]>. The samples in the register are shifted right one stage, with the overflow sample discarded and the new sample inserted at the leftmost stage. The temporary list is then sorted by increasing distance. If no samples remain in the list, the procedure exits without updating the peer variables. for (i from NTP.SIZE <196> 1 to 1) begin /* update dispersion */ <$E[ theta sub i ,~delta sub i ,~epsilon sub i ]~<<-~[ theta sub {i-1} ,~delta sub {i-1} ,~epsilon sub {i-1} ]>;
/* shift stage right */ <$Eepsilon sub i~=~epsilon sub i~+~phi tau>; add <$E[ lambda sub i~==~epsilon sub i~+~{| delta sub i |} over 2 ,~i]> to temporary list; endfor; <$E[ theta sub 0 ,~delta sub 0 ,~epsilon sub 0 ]~<<-~[ theta ,~delta ,~epsilon ]>; /* insert new sample */ add <$E[ lambda~==~epsilon~+~{| delta |} over 2 ,~0]> to temporary list; <$Eroman peer.update~<<-~roman sys.clock>; /* reset base time */ sort temporary list by increasing <$E[distance~||index]>; where <$E[distance~||index]> represents the concatenation of the distance and index fields and distance is the high-order field. The filter dispersion <$Eepsilon sub sigma> is computed and included in the peer dispersion. Note that for this purpose the temporary list is already sorted. <$Eepsilon sub sigma~<<-~0>; for (i from NTP.SHIFT<196>1 to 0) /* compute filter dispersion */ if (<$Eroman peer.dispersion sub index[i]~>>=~roman NTP.MAXDISPERSE> or <$E| theta sub i~-~theta sub 0 |~>>~roman NTP.MAXDISPERSE>) <$Eepsilon sub sigma~<~><<-~( epsilon sub sigma~+~roman NTP.MAXDISPERSE)~times~roman NTP.FILTER>; else <$Eepsilon sub sigma~<~><<-~( epsilon sub sigma~+~| theta sub i~-~theta sub 0 |)~times~roman NTP.FILTER>; The peer offset <$Etheta sub 0>, delay <$Edelta sub 0> and dispersion <$Eepsilon sub 0> are chosen as the values corresponding to the minimum- distance sample; in other words, the sample corresponding to the first entry on the temporary list, here represented as the 0th subscript. <$Eroman peer.offset~<<-~theta sub 0>; /* update peer variables */ <$Eroman peer.delay~<<-~delta sub 0>; <$Eroman peer.dispersion~<<-~min ( epsilon sub 0~+~epsilon sub sigma ,~roman NTP.MAXDISPERSE)>; end clock-filter procedure The peer.offset and peer.delay variables represent the clock offset and roundtrip delay of the local clock relative to the peer clock. Both of these are precision quantities and can usually be averaged over long intervals in order to improve accuracy and stability without bias accumulation (see Appendix H). The peer.dispersion variable represents the maximum error due to measurement error, skew-error accumulation and
sample variance. All three variables are used in the clock-selection and clock-combining procedures to select the peer clock(s) used for synchronization and to maximize the accuracy and stability of the indications. Clock-Selection Procedure The clock-selection procedure uses the peer variables <$ETHETA>, <$EDELTA>, <$EEPSILON> and <$Etau> and is called when these variables change or when the reachability status changes. It consists of two algorithms, the intersection algorithm and the clustering algorithm. The intersection algorithm constructs a list of candidate peers eligible to become the synchronization source, computes a confidence interval for each and casts out falsetickers using a technique adapted from Marzullo and Owicki [MAR85]. The clustering algorithm sorts the list of surviving candidates in order of stratum and synchronization distance and repeatedly casts out outlyers on the basis of select dispersion until only the most accurate, precise and stable survivors are left. A bit is set for each survivor to indicate the outcome of the selection process. The system variable sys.peer is set as a pointer to the most likely survivor, if there is one, or to the NULL value if not. Intersection Algorithm begin clock-selection procedure Each peer is examined in turn and added to an endpoint list only if it passes several sanity checks designed to avoid loops and use of exceptionally noisy data. If no peers survive the sanity checks, the procedure exits without finding a synchronization source. For each of m survivors three entries of the form <$E[endpoint,~type]> are added to the endpoint list: <$E[ THETA~-~LAMBDA ,~-~1]>, <$E[ THETA ,~0]> and <$E[ THETA~+~LAMBDA ,~1]>. There will be <$E3 m> entries on the list, which is then sorted by increasing endpoint. <$Em~<<-~0>; for (each peer) /* calling all peers */ if (<$Eroman {peer.reach~!=~0~bold and~peer.dispersion~<<~NTP.MAXDISPERSE}> and not (peer.stratum >> 1 and peer.refid = peer.hostaddr)) begin <$ELAMBDA~<MO> <~>an distance (peer)>; /* make list entry */ add <$E[ THETA~-~LAMBDA ,~-1]> to endpoint list; add <$E[ THETA ,~0]> to endpoint list; add <$E[ THETA~+~LAMBDA ,~1]> to endpoint list; <$Em~<<-~m~+~1>; <B>endif endfor if (<$Em~=~0>) begin /* skedaddle if
no candidates */ <$Eroman sys.peer~<<-~roman NULL>; <$Eroman sys.stratum~<<-~0~(undefined)>; exit; endif sort endpoint list by increasing endpoint||type; The following algorithm is adapted from DTS [DEC89] and is designed to produce the largest single intersection containing only truechimers. The algorithm begins by initializing a value f and counters i and c to zero. Then, starting from the lowest endpoint of the sorted endpoint list, for each entry <$E[endpoint,~type]> the value of type is subtracted from the counter i, which is the number of intersections. If type is zero, increment the value of c, which is the number of falsetickers (see Appendix H). If <$Ei~>>=~m~-~f> for some entry, endpoint of that entry becomes the lower endpoint of the intersection; otherwise, f is increased by one and the above procedure is repeated. Without resetting f or c, a similar procedure is used to find the upper endpoint, except that the value of type is added to the counter.. If after both endpoints have been determined <$Ec~<<=~f>, the procedure continues having found <$Em~-~f> truechimers; otherwise, f is increased by one and the entire procedure is repeated. for (f from 0 to <$Ef~>>=~m over 2>) begin /* calling all truechimers */ <$Ec~<<-~0>; <$Ei~<<-~0>; for (each [endpoint, type] from lowest) begin /* find low endpoint */ <$Ei~<<-~i~-~type>; <$Elow~<<-~endpoint>; if (<$Ei~>>=~m~-~f>) break; if (<$Etype~=~0>) <$Ec~<<-~c~+~1>; endfor; <$Ei~<<-~0>; for (each [endpoint, type] from highest) begin /* find high endpoint */ <$Ei~<<-~i~+~type>; <$Ehigh~<<-~endpoint>; if (<$Ei~>>=~m~-~f>) break; if (<$Etype~=~0>) <$Ec~<<-~c~+~1>; endfor; if (<$Ec~<<=~f>) break; /* continue until all falsetickers found */ endfor; if (<$Elow~>>~high>) begin /* quit if no intersection found */ <$Eroman sys.peer~<<-~roman NULL>; exit;
endif; Note that processing continues past this point only if there are more than <$Em over 2> intersections. However, it is possible, but not highly likely, that there may be fewer than <$Em over 2> truechimers remaining in the intersection. Clustering Algorithm In the original DTS algorithm the clock-selection procedure exits at this point with the presumed correct time set midway in the computed intersection <$E[low,~high]>. However, this can lead to a considerable loss in accuracy and stability, since the individual peer statistics are lost. Therefore, in NTP the candidates that survived the preceding steps are processed further. The candidate list is rebuilt with entries of the form <$E[distance,~index]>, where distance is computed from the (scaled) peer stratum and synchronization distance <$ELAMBDA>. The scaling factor provides a mechanism to weight the combination of stratum and distance. Ordinarily, the stratum will dominate, unless one or more of the survivors has an exceptionally high distance. The list is then sorted by increasing distance. <$Em~<<-~0>; for (each peer) begin /* calling all peers */ if (<$Elow~<<=~theta~<<=~high>) begin <$ELAMBDA~<<-~roman distance (peer)>; /* make list entry */ <$Edist~<<-~roman {peer.stratum~times~NTP.MAXDISPERSE~+~LAMBDA }> add <$E[ dist ,~peer]> to candidate list; <$Em~<<-~m~+~1>; endif; endfor; sort candidate list by increasing dist; The next steps are designed to cast out outlyers which exhibit significant dispersions relative to the other members of the candidate list while minimizing wander, especially on high-speed LANs with many time servers. Wander causes needless network overhead, since the poll interval is clamped at sys.poll as each new peer is selected for synchronization and only slowly increases when the peer is no longer selected. It has been the practical experience that the number of candidates surviving to this point can become quite large and can result in significant processor cycles without materially enhancing stability and accuracy. Accordingly, the candidate list is truncated at NTP.MAXCLOCK entries. Note <$Eepsilon sub {xi i}> is the select (sample) dispersion relative to the ith peer represented on the candidate list, which can be calculated in a manner similar to the filter dispersion described
previously. The <$EEPSILON sub j> is the dispersion of the jth peer represented on the list and includes components due to measurement error, skew-error accumulation and filter dispersion. If the maximum <$Eepsilon sub {xi i}> is greater than the minimum <$EEPSILON sub j> and the number of survivors is greater than NTP.MINCLOCK, the ith peer is discarded from the list and the procedure is repeated. If the current synchronization source is one of the survivors and there is no other survivor of lower stratum, then the procedure exits without doing anything further. Otherwise, the synchronization source is set to the first survivor on the candidate list. In the following i, j, k, l are peer indices, with k the index of the current synchronization source (NULL if none) and l the index of the first survivor on the candidate list. while begin for (each survivor <$E[distance,~index]>) begin /* compute dispersions */ find index i for max <$Eepsilon sub {xi i}>; find index j for min <$EEPSILON sub j>; endfor if (<$Eepsilon sub {xi i}~<<=~EPSILON sub j> or <$Em~<<=~roman NTP.MINCLOCK>) break; <$Eroman peer.survivor [i]~<<-~0> ; /* discard ith peer */ if (<$Ei~=~k>) <$Eroman sys.peer~<<-~roman NULL>; delete the ith peer from the candidate list; <$Em~<<-~m~-~1>; endwhile if (<$Eroman peer.survivor [k]~=~0> or <$Eroman peer.stratum [k]~>>~roman peer.stratum [l]>) begin <$Eroman sys.peer~<<-~l>; /* new clock source */ call poll-update; endif end clock-select procedure; The algorithm is designed to favor those peers near the head of the candidate list, which are at the lowest stratum and distance and presumably can provide the most accurate and stable time. With proper selection of weight factor v (also called NTP.SELECT), entries will be trimmed from the tail of the list, unless a few outlyers disagree significantly with respect to the remaining entries, in which case the outlyers are discarded first. The termination condition is designed to avoid needless switching between synchronization sources when not statistically justified, yet maintain a bias toward the low-stratum, low-distance peers. Local Clocks In order to implement a precise and accurate local clock, the host must
be equipped with a hardware clock consisting of an oscillator and interface and capable of the required precision and stability. A logical clock is then constructed using these components plus software components that adjust the apparent time and frequency in response to periodic updates computed by NTP or some other time-synchronization protocol such as Hellospeak [MIL83b] or the Unix 4.3bsd TSP [GUS85a]. This section describes the Fuzzball local-clock model and implementation, which includes provisions for precise time and frequency adjustment and can maintain time to within 15 ns and frequency to within 0.3 ms per day. The model is suitable for use with both compensated and uncompensated quartz oscillators and can be adapted to power-frequency oscillators. A summary of the characteristics of these and other types of oscillators can be found in Appendix E, while a comprehensive mathematical analysis of the NTP local-clock model can be found in Appendix G. It is important to note that the particular implementation described is only one of possibly many implementations that provide equivalent functionality. However, it is equally important to note that the clock model described in Appendix G and which is the basis of the implementation involves a particular kind of control-feedback loop that is potentially unstable if the design rules are broken. The model and parameter described in Appendix G are designed to provide accurate and stable time under typical operating conditions using conventional hardware and in the face of disruptions in hardware or network connectivity. The parameters have been engineered for reliable operation in a multi-level hierarchical subnet where unstable operation at one level can disrupt possibly many other levels. Fuzzball Implementation The Fuzzball local clock consists of a collection of hardware and software registers, together with a set of algorithms, which implement a logical clock that functions as a disciplined oscillator and synchronizes to an external source. Following is a description of its components and manner of operation. Note that all arithmetic is two's complement integer and all shifts <169><<<<<170> and <169>>>>><170> are arithmetic (sign-fill for right shifts and zero-fill for left shifts). Also note that <$Ex~<< <<~n> is equivalent to <$Ex~>> >>~-~n>. The principal components of the local clock are shown in Figure 3,<$&fig3> in which the fraction points shown are relative to whole milliseconds. The 48-bit Clock register and 32-bit Prescaler function as a disciplined oscillator which increments in milliseconds relative to midnight at the fraction point. The 32-bit Clock-Adjust register is used to adjust the oscillator phase in gradual steps to avoid discontinuities in the indicated timescale. Its contents are designated x in the following. The 32-bit Skew-Compensation register is used to trim the oscillator frequency by adding small phase increments at periodic adjustment intervals and can compensate for frequency errors as much as
.01% or <F128M>?<F255D>100 ppm. Its contents are designated y in the following. The 16-bit Watchdog counter and 32-bit Compliance register are used to determine validity, as well as establish the PLL bandwidth and poll interval (see Appendix G). The contents of the Compliance register are designated z in the following. The 32-bit PPS-Adjust register is used to hold a precision time adjustment when a source of 1- pps pulses is available, while the 8-bit PPS counter is used to verify presence of these pulses. The two-bit Flags register contains the two leap bits described elsewhere (leap). All registers except the Prescaler register are ordinarily implemented in memory. In typical clock interface designs such as the DEC KWV11-C, the Prescaler register is implemented as a 16-bit buffered counter driven by a quartz-controlled oscillator at some multiple of 1000 Hz. A counter overflow is signalled by an interrupt, which results in an increment of the Clock register at the bit corresponding to the overflow. The time of day is determined by reading the Prescaler register, which does not disturb the counting process, and adding its value to that of the Clock register with fraction points aligned as shown and with unimplemented low-order bits set to zero. In other interface designs, such as the LSI-11 event-line mechanism, each tick of the clock is signalled by an interrupt at intervals of 16-2/3 ms or 20 ms, depending on interface and mains frequency. When this occurs the appropriate increment in fractional milliseconds is added to the Clock register. The various parameters used are summarized in Table 6, in which certain parameters have been rescaled from those given in Appendix G due to the units here being in milliseconds.<$&tab6> When the system is initialized, all registers and counters are cleared and the leap bits set to 112 (unsynchronized). At adjustment intervals of CLOCK.ADJ seconds CLOCK.ADJ is subtracted from the PPS counter, but only if the previous contents of the PPS counter are greater than zero. Also, CLOCK.ADJ is added to the Watchdog counter, but the latter is clamped not to exceed NTP.MAXAGE divided by CLOCK.ADJ (one full day). In addition, if the Watchdog counter reaches this value, the leap bits are set to 112 (unsynchronized). In some system configurations a precise source of timing information is available in the form of a train of timing pulses spaced at one-second intervals. Usually, this is in addition to a source of timecode information, such as a radio clock or even NTP itself, to number the seconds, minutes, hours and days. In typical clock interface designs such as the DEC KWV11-C, a special input is provided which can trigger an interrupt as each pulse is received. When this happens the PPS counter is set to CLOCK.PPS and the current time offset is determined in the usual way. Then, the PPS-Adjust register is set to the time offset scaled to milliseconds. Finally, if the PPS-Adjust register is greater than or equal to 500, 1000 is subtracted from its contents. As described below, the PPS-Adjust register and PPS counters can be used in
conjunction with an ordinary timecode to produce an extremely accurate local clock. Gradual Phase Adjustments Left uncorrected, the local clock runs at the offset and frequency resulting from its last update. An update is produced by an event that results in a valid clock selection. It consists of a signed 48-bit integer in whole milliseconds and fraction, with fraction point to the left of bit 32. If the magnitude is greater than the maximum aperture CLOCK.MAX, a step adjustment is required, in which case proceed as described later. Otherwise, a gradual phase adjustment is performed. Normally, the update is computed by the NTP algorithms described previously; however, if the PPS counter is greater than zero, the value of the PPS-Adjust register is used instead. Let u be a 32-bit quantity with bits 0-31 set as bits 16-47 of the update. If some of the low-order bits of the update are unimplemented, they are set as the value of the sign bit. These operations move the fraction point of u to the left of bit 16 and minimize the effects of truncation and roundoff errors. Let b be the number of leading zeros of the absolute value of the Compliance register and let c be the number of leading zeros of the Watchdog counter, both of which are easily computed by compact loops. Then, set b to <$Eb~=~b~-~16~+~roman CLOCK.COMP> and clamp it to be not less than zero. This represents the logarithm of the loop time constant. Then, set c to <$Ec~=~10~-~c> and clamp it to be not greater than NTP.MAXPOLL <196> NTP.MINPOLL. This represents the logarithm of the integration interval since the last update. The clamps insure stable operation under typical conditions encountered in the Internet. Then, compute new values for the Clock- Adjust and Skew-Compensation registers <$Ex~=~u~>> >>~b> , <$Ey~=~y~+~(u~>> >>~(b~+~b~-~c))> . Finally, compute the exponential average <$Ez~=~z~+~(u~<< <<~(b~+~ roman CLOCK.MULT)~-~z)~>> >>~ roman CLOCK.WEIGHT> , where the left shift realigns the fraction point for greater precision and ease of computation. At each adjustment interval the final clock correction consisting of two components is determined. The first (phase) component consists of the quantity
<$Ex~>> >>~ roman CLOCK.PHASE> , which is then subtracted from the previous contents of the Clock-Adjust register to form the new contents of that register. The second (frequency) component consists of the quantity <$Ey~>> >>~ roman CLOCK.FREQ> . The sum of the phase and frequency components is the final clock correction, which is then added to the Clock register. FInally, the Watchdog counter is set to zero. Operation continues in this way until a new correction is introduced. The value of b computed above can be used to update the poll interval system variable (sys.poll). This functions as an adaptive parameter that provides a very valuable feature which reduces the polling overhead, especially if the clock-combining algorithm described in Appendix F is used: <$Eroman sys.poll~<<-~b~+~roman NTP.MINPOLL> . Under conditions when update noise is high or the hardware oscillator frequency is changing relatively rapidly due to environmental conditions, the magnitude of the compliance increases. With the parameters specified, this causes the loop bandwidth (reciprocal of time constant) to increase and the poll interval to decrease, eventually to NTP.MINPOLL seconds. When noise is low and the hardware oscillator very stable, the compliance decreases, which causes the loop bandwidth to decrease and the poll interval to increase, eventually to NTP.MAXPOLL seconds. The parameters in Table 6 have been selected so that, under good conditions with updates in the order of a few milliseconds, a precision of a millisecond per day (about .01 ppm or 10-8), can be achieved. Care is required in the implementation to insure monotonicity of the Clock register and to preserve the highest precision while minimizing the propagation of roundoff errors. Since all of the multiply/divide operations (except those involved with the 1-pps pulses) computed in real time can be approximated by bitwise-shift operations, it is not necessary to implement a full multiply/divide capability in hardware or software. In the various implementations of NTP for many Unix-based systems it has been the common experience that the single most important factor affecting local-clock stability is the matching of the phase and frequency coefficients to the particular kernel implementation. It is vital that these coefficients be engineered according to the model values, for otherwise the PLL can fail to track normal oscillator variations and can even become unstable.
Step Phase Adjustments When the magnitude of a correction exceeds the maximum aperture CLOCK.MAX, the possibility exists that the clock is so far out of synchronization with the reference source that the best action is an immediate and wholesale replacement of Clock register contents, rather than in gradual adjustments as described above. However, in cases where the sample variance is extremely high, it is prudent to disbelieve a step change, unless a significant interval has elapsed since the last gradual adjustment. Therefore, if a step change is indicated and the Watchdog counter is less than the preconfigured value CLOCK.MINSTEP, the update is ignored and the local-clock procedure exits. These safeguards are especially useful in those system configurations using a calibrated atomic clock or LORAN-C receiver in conjunction with a separate source of seconds-numbering information, such as a radio clock or NTP peer. If a step change is indicated the update is added directly to the Clock register and the Clock-Adjust register and Watchdog counter both set to zero, but the other registers are left undisturbed. Since a step change invalidates data currently in the clock filters, the leap bits are set to 112 (unsynchronized) and, as described elsewhere, the clear procedure is called to purge the clock filters and state variables for all peers. In practice, the necessity to perform a step change is rare and usually occurs when the local host or reference source is rebooted, for example. This is fortunate, since step changes can result in the local clock apparently running backward, as well as incorrect delay and offset measurements of the synchronization mechanism itself. Considerable experience with the Internet environment suggests the values of CLOCK.MAX tabulated in Table 6 as appropriate. In practice, these values are exceeded with a single time-server source only under conditions of the most extreme congestion or when multiple failures of nodes or links have occurred. The most common case when the maximum is exceeded is when the time-server source is changed and the time indicated by the new and old sources exceeds the maximum due to systematic errors in the primary reference source or large differences in path delays. It is recommended that implementations include provisions to tailor CLOCK.MAX for specific situations. The amount that CLOCK.MAX can be increased without violating the monotonicity requirement depends on the Clock register increment. For an increment of 10 ms, as used in many workstations, the value shown in Table 6 can be increased by a factor of five. Implementation Issues The basic NTP robustness model is that a host has no other means to verify time other than NTP itself. In some equipment a battery-backed clock/calendar is available for a sanity check. If such a device is available, it should be used only to confirm sanity of the timekeeping
system, not as the source of system time. In the common assumption (not always justified) that the clock/calendar is more reliable, but less accurate, than the NTP synchronization subnet, the recommended approach at initialization is to set the Clock register as determined from the clock/calendar and the other registers, counters and flags as described above. On subsequent updates if the time offset is greater than a configuration parameter (e.g., 1000 seconds), then the update should be discarded and an error condition reported. Some implementations periodically record the contents of the Skew-Compensation register in stable storage such as a system file or NVRAM and retrieve this value at initialization. This can significantly reduce the time to converge to the nominal stability and accuracy regime. Conversion from NTP format to the common date and time formats used by application programs is simplified if the internal local-clock format uses separate date and time variables. The time variable is designed to roll over at 24 hours, give or take a leap second as determined by the leap-indicator bits, with its overflows (underflows) incrementing (decrementing) the date variable. The date and time variables then indicate the number of days and seconds since some previous reference time, but uncorrected for intervening leap seconds. On the day prior to the insertion of a leap second the leap bits (sys.leap) are set at the primary servers, presumably by manual means. Subsequently, these bits show up at the local host and are passed to the local-clock procedure. This causes the modulus of the time variable, which is the length of the current day, to be increased or decreased by one second as appropriate. Immediately following insertion the leap bits are reset. Additional discussion on this issue can be found in Appendix E. Lack of a comprehensive mechanism to administer the leap bits in the primary servers is presently an awkward problem better suited to a comprehensive network-management mechanism yet to be developed. As a practical matter and unless specific provisions have been made otherwise, currently manufactured radio clocks have no provisions for leap seconds, either automatic or manual. Thus, when a leap actually occurs, the radio must resynchronize to the broadcast timecode, which may take from a few minutes to some hours. Unless special provisions are made, a primary server might leap to the new timescale, only to be yanked back to the previous timescale when it next synchronizes to the radio. Subsequently, the server will be yanked forward again when the radio itself resynchronizes to the broadcast timecode. This problem can not be reliably avoided using any selection algorithm, since there will always exist an interval of at least a couple of minutes and possibly as much as some hours when some or all radios will be out of synchronization with the broadcast timecode and only after the majority of them have resynchronized will the subnet settle down. The CLOCK.MINSTEP delay is designed to cope with this problem by forcing a
minimum interval since the last gradual adjustment was made before allowing a step change to occur. Therefore, until the radio resynchronizes, it will continue on the old timescale, which is one second off the local clock after the leap and outside the maximum aperture CLOCK.MAX permitted for gradual phase adjustments. When the radio eventually resynchronizes, it will almost certainly come up within the aperture and again become the reference source. Thus, even in the unlikely case when the local clock incorrectly leaps, the server will go no longer than CLOCK.MINSTEP seconds before resynchronizing. Acknowledgments Many people contributed to the contents of this document, which was thoroughly debated by electronic mail and debugged using two different prototype implementations for the Unix 4.3bsd operating system, one written by Louis Mamakos and Michael Petry of the University of Maryland and the other by Dennis Ferguson of the University of Toronto. Another implementation for the Fuzzball operating system [MIL88b] was written by the author. Many individuals to numerous to mention meticulously tested the several beta-test prototype versions and ruthlessly smoked out the bugs, both in the code and the specification. Especially useful were comments from Dennis Ferguson and Bill Sommerfeld, as well as discussions with Joe Comuzzi and others at Digital Equipment Corporation. References [ABA89] Abate, et al. AT&T's new approach to the synchronization of telecommunication networks. IEEE Communications Magazine (April 1989), 35-45. [ALL74a] Allan, D.W., J.H. Shoaf and D. Halford. Statistics of time and frequency data analysis. In: Blair, B.E. (Ed.). Time and Frequency Theory and Fundamentals. National Bureau of Standards Monograph 140, U.S. Department of Commerce, 1974, 151-204. [ALL74b] Allan, D.W., J.E. Gray and H.E. Machlan. The National Bureau of Standards atomic time scale: generation, stability, accuracy and accessibility. In: Blair, B.E. (Ed.). Time and Frequency Theory and Fundamentals. National Bureau of Standards Monograph 140, U.S. Department of Commerce, 1974, 205-231. [BEL86]
Bell Communications Research. Digital Synchronization Network Plan. Technical Advisory TA-NPL-000436, 1 November 1986. [BER87] Bertsekas, D., and R. Gallager. Data Networks. Prentice-Hall, Englewood Cliffs, NJ, 1987. [BLA74] Blair, B.E. Time and frequency dissemination: an overview of principles and techniques. In: Blair, B.E. (Ed.). Time and Frequency Theory and Fundamentals. National Bureau of Standards Monograph 140, U.S. Department of Commerce, 1974, 233-314. [BRA80] Braun, W.B. Short term frequency effects in networks of coupled oscillators. IEEE Trans. Communications COM-28, 8 (August 1980), 1269- 1275. [COL88] Cole, R., and C. Foxcroft. An experiment in clock synchronisation. The Computer Journal 31, 6 (1988), 496-502. [DAR81a] Defense Advanced Research Projects Agency. Internet Protocol. DARPA Network Working Group Report RFC-791, USC Information Sciences Institute, September 1981. [DAR81b] Defense Advanced Research Projects Agency. Internet Control Message Protocol. DARPA Network Working Group Report RFC-792, USC Information Sciences Institute, September 1981. [DEC89] Digital Time Service Functional Specification Version T.1.0.5. Digital Equipment Corporation, 1989. [DER90] Dershowitz, N., and E.M. Reingold. Calendrical Calculations. Software Practice and Experience 20, 9 (September 1990), 899-928. [FRA82]
Frank, R.L. History of LORAN-C. Navigation 29, 1 (Spring 1982). [GUS84] Gusella, R., and S. Zatti. TEMPO - A network time controller for a distributed Berkeley UNIX system. IEEE Distributed Processing Technical Committee Newsletter 6, NoSI-2 (June 1984), 7-15. Also in: Proc. Summer USENIX Conference (June 1984, Salt Lake City). [GUS85a] Gusella, R., and S. Zatti. The Berkeley UNIX 4.3BSD time synchronization protocol: protocol specification. Technical Report UCB/CSD 85/250, University of California, Berkeley, June 1985. [GUS85b] Gusella, R., and S. Zatti. An election algorithm for a distributed clock synchronization program. Technical Report UCB/CSD 86/275, University of California, Berkeley, December 1985. [HAL84] Halpern, J.Y., B. Simons, R. Strong and D. Dolly. Fault-tolerant clock synchronization. Proc. Third Annual ACM Symposium on Principles of Distributed Computing (August 1984), 89-102. [JOR85] Jordan, E.C. (Ed). Reference Data for Engineers, Seventh Edition. H.W. Sams & Co., New York, 1985. [KOP87] Kopetz, H., and W. Ochsenreiter. Clock synchronization in distributed real-time systems. IEEE Trans. Computers C-36, 8 (August 1987), 933-939. [LAM78] Lamport, L., Time, clocks and the ordering of events in a distributed system. Comm. ACM 21, 7 (July 1978), 558-565. [LAM85] Lamport, L., and P.M. Melliar-Smith. Synchronizing clocks in the presence of faults. J. ACM 32, 1 (January 1985), 52-78. [LIN80] Lindsay, W.C., and A.V. Kantak. Network synchronization of random
signals. IEEE Trans. Communications COM-28, 8 (August 1980), 1260-1266. [LUN84] Lundelius, J., and N.A. Lynch. A new fault-tolerant algorithm for clock synchronization. Proc. Third Annual ACM Symposium on Principles of Distributed Computing (August 1984), 75-88. [MAR85] Marzullo, K., and S. Owicki. Maintaining the time in a distributed system. ACM Operating Systems Review 19, 3 (July 1985), 44-54. [MIL81a] Mills, D.L. Time Synchronization in DCNET Hosts. DARPA Internet Project Report IEN-173, COMSAT Laboratories, February 1981. [MIL81b] Mills, D.L. DCNET Internet Clock Service. DARPA Network Working Group Report RFC-778, COMSAT Laboratories, April 1981. [MIL83a] Mills, D.L. Internet Delay Experiments. DARPA Network Working Group Report RFC-889, M/A-COM Linkabit, December 1983. [MIL83b] Mills, D.L. DCN local-network protocols. DARPA Network Working Group Report RFC-891, M/A-COM Linkabit, December 1983. [MIL85a] Mills, D.L. Algorithms for synchronizing network clocks. DARPA Network Working Group Report RFC-956, M/A-COM Linkabit, September 1985. [MIL85b] Mills, D.L. Experiments in network clock synchronization. DARPA Network Working Group Report RFC-957, M/A-COM Linkabit, September 1985. [MIL85c] Mills, D.L. Network Time Protocol (NTP). DARPA Network Working Group Report RFC-958, M/A-COM Linkabit, September 1985. [MIL88a]
Mills, D.L. Network Time Protocol (version 1) - specification and implementation. DARPA Network Working Group Report RFC-1059, University of Delaware, July 1988. [MIL88b] Mills, D.L. The Fuzzball. Proc. ACM SIGCOMM 88 Symposium (Palo Alto, CA, August 1988), 115-122. [MIL89] Mills, D.L. Network Time Protocol (version 2) - specification and implementation. DARPA Network Working Group Report RFC-1119, University of Delaware, September 1989. [MIL90] Mills, D.L. Measured performance of the Network Time Protocol in the Internet system. ACM Computer Communication Review 20, 1 (January 1990), 65-75. [MIL91a] Mills, D.L. Internet time synchronization: the Network Time Protocol. IEEE Trans. Communications 39, 10 (October 1991), 1482-1493. [MIL91b] Mills, D.L. On the chronology and metrology of computer network timescales and their application to the Network Time Protocol. ACM Computer Communications Review 21, 5 (October 1991), 8-17. [MIT80] Mitra, D. Network synchronization: analysis of a hybrid of master-slave and mutual synchronization. IEEE Trans. Communications COM-28, 8 (August 1980), 1245-1259. [NBS77] Data Encryption Standard. Federal Information Processing Standards Publication 46. National Bureau of Standards, U.S. Department of Commerce, 1977. [NBS79] Time and Frequency Dissemination Services. NBS Special Publication 432, U.S. Department of Commerce, 1979. [NBS80]
DES Modes of Operation. Federal Information Processing Standards Publication 81. National Bureau of Standards, U.S. Department of Commerce, December 1980. [POS80] Postel, J. User Datagram Protocol. DARPA Network Working Group Report RFC-768, USC Information Sciences Institute, August 1980. [POS83a] Postel, J. Daytime protocol. DARPA Network Working Group Report RFC-867, USC Information Sciences Institute, May 1983. [POS83b] Postel, J. Time protocol. DARPA Network Working Group Report RFC-868, USC Information Sciences Institute, May 1983. [RIC88] Rickert, N.W. Non Byzantine clock synchronization - a programming experiment. ACM Operating Systems Review 22, 1 (January 1988), 73-78. [SCH86] Schneider, F.B. A paradigm for reliable clock synchronization. Department of Computer Science Technical Report TR 86-735, Cornell University, February 1986. [SMI86] Smith, J. Modern Communications Circuits. McGraw-Hill, New York, NY, 1986. [SRI87] Srikanth, T.K., and S. Toueg. Optimal clock synchronization. J. ACM 34, 3 (July 1987), 626-645. [STE88] Steiner, J.G., C. Neuman, and J.I. Schiller. Kerberos: an authentication service for open network systems. Proc. Winter USENIX Conference (February 1988). [SU81] Su, Z. A specification of the Internet protocol (IP) timestamp option.
DARPA Network Working Group Report RFC-781. SRI International, May 1981. [TRI86] Tripathi, S.K., and S.H. Chang. ETempo: a clock synchronization algorithm for hierarchical LANs - implementation and measurements. Systems Research Center Technical Report TR-86-48, University of Maryland, 1986. [VAN84] Van Dierendonck, A.J., and W.C. Melton. Applications of time transfer using NAVSTAR GPS. In: Global Positioning System, Papers Published in Navigation, Vol. II, Institute of Navigation, Washington, DC, 1984. [VAS78] Vass, E.R. OMEGA navigation system: present status and plans 1977-1980. Navigation 25, 1 (Spring 1978). Appendix A. NTP Data Format - Version 3 The format of the NTP Message data area, which immediately follows the UDP header, is shown in Figure 4<$&fig4>. Following is a description of its fields. Leap Indicator (LI): This is a two-bit code warning of an impending leap second to be inserted/deleted in the last minute of the current day, with bit 0 and bit 1, respectively, coded as follows: @Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000), ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT) @Z_TBL_BODY = TABLE TEXT, TABLE TEXT 00, no warning 01, last minute has 61 seconds 10, last minute has 59 seconds) 11, alarm condition (clock not synchronized) @Z_TBL_END = Version Number (VN): This is a three-bit integer indicating the NTP version number, currently three (3). Mode: This is a three-bit integer indicating the mode, with values defined as follows:
@Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000), ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT) @Z_TBL_BODY = TABLE TEXT, TABLE TEXT 0, reserved 1, symmetric active 2, symmetric passive 3, client 4, server 5, broadcast 6, reserved for NTP control message (see Appendix B) 7, reserved for private use @Z_TBL_END = Stratum: This is a eight-bit integer indicating the stratum level of the local clock, with values defined as follows: @Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000), ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT) @Z_TBL_BODY = TABLE TEXT, TABLE TEXT 0, unspecified 1, primary reference (e.g.,, radio clock) 2-255, secondary reference (via NTP) @Z_TBL_END = The values that can appear in this field range from zero to NTP.INFIN inclusive. Poll Interval: This is an eight-bit signed integer indicating the maximum interval between successive messages, in seconds to the nearest power of two. The values that can appear in this field range from NTP.MINPOLL to NTP.MAXPOLL inclusive. Precision: This is an eight-bit signed integer indicating the precision of the local clock, in seconds to the nearest power of two.
Root Delay: This is a 32-bit signed fixed-point number indicating the total roundtrip delay to the primary reference source, in seconds with fraction point between bits 15 and 16. Note that this variable can take on both positive and negative values, depending on clock precision and skew. Root Dispersion: This is a 32-bit signed fixed-point number indicating the maximum error relative to the primary reference source, in seconds with fraction point between bits 15 and 16. Only positive values greater than zero are possible. Reference Clock Identifier: This is a 32-bit code identifying the particular reference clock. In the case of stratum 0 (unspecified) or stratum 1 (primary reference), this is a four-octet, left-justified, zero-padded ASCII string. While not enumerated as part of the NTP specification, the following are suggested ASCII identifiers: @Z_TBL_BEG = COLUMNS(3), DIMENSION(IN), COLWIDTHS(E2,E2,E5), WIDTH(4.6700), ABOVE(.1670), BELOW(.0830), HGUTTER(.3330), BOX(Z_SINGLE), KEEP(ON), ALIGN(CT), L1(R1C0..R1C3) @Z_TBL_BODY = TABLE HEADER, TABLE HEADER, TABLE HEADER Stratum, Code, Meaning @Z_TBL_BODY = TABLE TEXT, TABLE TEXT, TABLE TEXT 0, DCN, DCN routing protocol 0, NIST, NIST public modem 0, TSP, TSP time protocol 0, DTS, Digital Time Service 1, ATOM, Atomic clock (calibrated) 1, VLF, VLF radio (OMEGA,, etc.) 1, callsign, Generic radio 1, LORC, LORAN-C radionavigation 1, GOES, GOES UHF environment satellite @Z_TBL_BODY = TABLE HEADER, TABLE HEADER, TABLE HEADER 1, GPS, GPS UHF satellite positioning @Z_TBL_END =
In the case of stratum 2 and greater (secondary reference) this is the four-octet Internet address of the primary reference host. Reference Timestamp: This is the local time at which the local clock was last set or corrected, in 64-bit timestamp format. Originate Timestamp: This is the local time at which the request departed the client host for the service host, in 64-bit timestamp format. Receive Timestamp: This is the local time at which the request arrived at the service host, in 64-bit timestamp format. Transmit Timestamp: This is the local time at which the reply departed the service host for the client host, in 64-bit timestamp format. Authenticator (optional): When the NTP authentication mechanism is implemented, this contains the authenticator information defined in Appendix C. Appendix B. NTP Control Messages In a comprehensive network-management environment, facilities are presumed available to perform routine NTP control and monitoring functions, such as setting the leap-indicator bits at the primary servers, adjusting the various system parameters and monitoring regular operations. Ordinarily, these functions can be implemented using a network-management protocol such as SNMP and suitable extensions to the MIB database. However, in those cases where such facilities are not available, these functions can be implemented using special NTP control messages described herein. These messages are intended for use only in systems where no other management facilities are available or appropriate, such as in dedicated-function bus peripherals. Support for these messages is not required in order to conform to this specification. The NTP Control Message has the value 6 specified in the mode field of the first octet of the NTP header and is formatted as shown below. The format of the data field is specific to each command or response; however, in most cases the format is designed to be constructed and viewed by humans and so is coded in free-form ASCII. This facilitates the specification and implementation of simple management tools in the absence of fully evolved network-management facilities. As in ordinary NTP messages, the authenticator field follows the data field. If the authenticator is used the data field is zero-padded to a 32-bit boundary, but the padding bits are not considered part of the data field and are not included in the field count. IP hosts are not required to reassemble datagrams larger than 576 octets; however, some commands or responses may involve more data than
will fit into a single datagram. Accordingly, a simple reassembly feature is included in which each octet of the message data is numbered starting with zero. As each fragment is transmitted the number of its first octet is inserted in the offset field and the number of octets is inserted in the count field. The more-data (M) bit is set in all fragments except the last. Most control functions involve sending a command and receiving a response, perhaps involving several fragments. The sender chooses a distinct, nonzero sequence number and sets the status field and R and E bits to zero. The responder interprets the opcode and additional information in the data field, updates the status field, sets the R bit to one and returns the three 32-bit words of the header along with additional information in the data field. In case of invalid message format or contents the responder inserts a code in the status field, sets the R and E bits to one and, optionally, inserts a diagnostic message in the data field. Some commands read or write system variables and peer variables for an association identified in the command. Others read or write variables associated with a radio clock or other device directly connected to a source of primary synchronization information. To identify which type of variable and association a 16-bit association identifier is used. System variables are indicated by the identifier zero. As each association is mobilized a unique, nonzero identifier is created for it. These identifiers are used in a cyclic fashion, so that the chance of using an old identifier which matches a newly created association is remote. A management entity can request a list of current identifiers and subsequently use them to read and write variables for each association. An attempt to use an expired identifier results in an exception response, following which the list can be requested again. Some exception events, such as when a peer becomes reachable or unreachable, occur spontaneously and are not necessarily associated with a command. An implementation may elect to save the event information for later retrieval or to send an asynchronous response (called a trap) or both. In case of a trap the IP address and port number is determined by a previous command and the sequence field is set as described below. Current status and summary information for the latest exception event is returned in all normal responses. Bits in the status field indicate whether an exception has occurred since the last response and whether more than one exception has occurred. Commands need not necessarily be sent by an NTP peer, so ordinary access-control procedures may not apply; however, the optional mask/match mechanism suggested elsewhere in this document provides the capability to control access by mode number, so this could be used to limit access for control messages (mode 6) to selected address ranges. NTP Control Message Format
The format of the NTP Control Message header, which immediately follows the UDP header, is shown in Figure 5<$&fig5>. Following is a description of its fields. Bit positions marked as zero are reserved and should always be transmitted as zero. Version Number (VN): This is a three-bit integer indicating the NTP version number, currently three (3). Mode: This is a three-bit integer indicating the mode. It must have the value 6, indicating an NTP control message. Response Bit (R): Set to zero for commands, one for responses. Error Bit (E): Set to zero for normal response, one for error response. More Bit (M): Set to zero for last fragment, one for all others. Operation Code (Op): This is a five-bit integer specifying the command function. Values currently defined include the following: @Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000), ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT) @Z_TBL_BODY = TABLE TEXT, TABLE TEXT 0, reserved 1, read status command/response 2, read variables command/response 3, write variables command/response 4, read clock variables command/response 5, write clock variables command/response 6, set trap address/port command/response 7, trap response 8-31, reserved @Z_TBL_END = Sequence: This is a 16-bit integer indicating the sequence number of the command or response. Status: This is a 16-bit code indicating the current status of the
system, peer or clock, with values coded as described in following sections. Association ID: This is a 16-bit integer identifying a valid association. Offset: This is a 16-bit integer indicating the offset, in octets, of the first octet in the data area. Count: This is a 16-bit integer indicating the length of the data field, in octets. Data: This contains the message data for the command or response. The maximum number of data octets is 468. Authenticator (optional): When the NTP authentication mechanism is implemented, this contains the authenticator information defined in Appendix C. Status Words Status words indicate the present status of the system, associations and clock. They are designed to be interpreted by network-monitoring programs and are in one of four 16-bit formats shown in Figure 6<$&fig6> and described in this section. System and peer status words are associated with responses for all commands except the read clock variables, write clock variables and set trap address/port commands. The association identifier zero specifies the system status word, while a nonzero identifier specifies a particular peer association. The status word returned in response to read clock variables and write clock variables commands indicates the state of the clock hardware and decoding software. A special error status word is used to report malformed command fields or invalid values. System Status Word The system status word appears in the status field of the response to a read status or read variables command with a zero association identifier. The format of the system status word is as follows: Leap Indicator (LI): This is a two-bit code warning of an impending leap second to be inserted/deleted in the last minute of the current day, with bit 0 and bit 1, respectively, coded as follows: @Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000), ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT) @Z_TBL_BODY = TABLE TEXT, TABLE TEXT 00, no warning