RFC 1305

Network Time Protocol (Version 3) Specification, Implementation and Analysis

Pages: 109
Obsoletes: 0958 1059 1119
Obsoleted by: 5905

Part 2 of 4 – Pages 35 to 69

noToC RFC1305 - Page 35 prevText

accuracies achievable depend upon the statistical properties of the
outbound and inbound data paths. Further analysis and experimental
results bearing on this issue can be found in [MIL90] and in Appendix H.
Test 4 requires that the calculated delay be within <169>reasonable<170>
bounds:

        <$Etest4~<<-~(| delta |~<<~roman NTP.MAXDISPERSE~bold
and~epsilon~<<~roman NTP.MAXDISPERSE)>;  /* test 4 */

Test 5 is implemented only if the authentication mechanism described in
Appendix C is implemented. It requires either that authentication be
explicitly disabled or that the authenticator be present and correct as
determined by the decrypt procedure.

        #ifdef (authentication implemented)     /* test 5 */
                <$Etest5~<<-~( roman {(peer.config~=~1~bold
and~peer.authenable~=~0)~bold or~ peer.authentic~=~1})>;
                #endef

Test 6 requires the peer clock be synchronized and the interval since
the peer clock was last updated be positive and less than NTP.MAXAGE.
Test 7 insures that the host will not synchronize on a peer with greater
stratum. Test 8 requires that the header contains <169>reasonable<170>
values for the pkt.rootdelay and pkt.rootdispersion fields.

        <$Etest6~<<-~( roman pkt.leap~!=~11 sub 2> and          /* test
6 */
                <$Eroman
{pkt.reftime~<<=~pkt.xmt~<<~pkt.reftime~+~NTP.MAXAGE}>)
        <$Etest7~<<-~roman {pkt.stratum ~<<=~sys.stratum}> and  /* test
7 */
                 <$Eroman {pkt.stratum ~<<~NTP.MAXSTRATUM}>;
        <$Etest8~<<-~( roman {| pkt.rootdelay |~<<~NTP.MAXDISPERSE}>
and     /* test 8 */
                <$Eroman {pkt.rootdispersion~<<~NTP.MAXDISPERSE})>;

With respect to further processing, the packet includes valid
(synchronized) data if tests one through four succeed
<$E(test1~&~test2~&~test3~&~test4~=~1)>, regardless of the remaining
tests. Only packets with valid data can be used to calculate offset,
delay and dispersion values. The packet includes a valid header if tests
five through eight succeed <$E(test5~&~test6~&~test7~&~test8~=~1)>,
regardless of the remaining tests. Only packets with valid headers can
be used to determine whether a peer can be selected for synchronization.
Note that <$Etest1> and <$Etest2> are not used in broadcast mode (forced
to true), since the originate and receive timestamps are undefined.

The clock-filter procedure is called to produce the delay (peer.delay),
offset (peer.offset) and dispersion (peer.dispersion) for the peer.
Specification of the clock-filter algorithm is not an integral part of

noToC RFC1305 - Page 36

the NTP specification, since there may be other algorithms that work
well in practice. However, one found to work well in the Internet
environment is described in Section 4 and its use is recommended.

        if (not valid header) exit;
        <$Eroman peer.leap~<<-~roman pkt.leap>;                 /* copy
packet variables */
        <$Eroman peer.stratum~<<-~roman pkt.stratum>;
        <$Eroman peer.precision~<<-~roman pkt.precision>;
        <$Eroman peer.rootdelay~<<-~roman pkt.rootdelay>;
        <$Eroman peer.rootdispersion~<<-~roman pkt.rootdispersion>;
        <$Eroman peer.refid~<<-~roman pkt.refid>;
        <$Eroman peer.reftime~<<-~roman pkt.reftime>;
        if (valid data) call clock-filter(<$Etheta ,~delta ,~epsilon>); 
/* process sample */
        end packet procedure;

Clock-Update Procedure
The clock-update procedure is called from the receive procedure when
valid clock offset, delay and dispersion data have been determined by
the clock-filter procedure for the current peer. The result of the
clock-selection and clock-combining procedures is the final clock
correction <$ETHETA>, which is used by the local-clock procedure to
update the local clock. If no candidates survive these procedures, the
clock-update procedure exits without doing anything further.

begin clock-update procedure
        call clock-select;                              /* select clock
source */
        if (<$Eroman sys.peer~!=~peer>) exit;

It may happen that the local clock may be reset, rather than slewed to
its final value. In this case the clear procedure is called for every
peer to purge the clock filter, reset the poll interval and reselect the
synchronization source, if necessary. Note that the local-clock
procedure sets the leap bits sys.leap to <169>unsynchronized<170> 112 in
this case, so that no other peer will attempt to synchronize to the host
until the host once again selects a peer for synchronization.

The distance procedure calculates the root delay <$EDELTA>, root
dispersion <$EEPSILON> and root synchronization distance <$ELAMBDA> via
the peer to the root of the synchronization subnet. The host will not
synchronize to the selected peer if the distance is greater than
NTP.MAXDISTANCE. The reason for the minimum clamp at NTP.MINDISPERSE is
to discourage subnet route flaps that can happen with Bellman-Ford
algorithms and small roundtrip delays.

        <$ELAMBDA~<M=O>
<~>an distance (peer)>;                         /* update system
variables */

noToC RFC1305 - Page 37

 <B>    if (<$ELAMBDA~>>=~roman NTP.MAXDISTANCE>) exit;
        <$Eroman sys.leap~<<-~roman peer.leap>;
        <$Eroman sys.stratum~<<-~roman peer.stratum~+~1>;
        <$Eroman sys.refid~<<-~roman peer.peeraddr>;
        call local-clock;
        if (local clock reset) begin                    /* if reset,
clear state variables */
                <$Eroman sys.leap~<<-~11 sub 2>;
                for (all peers) call clear;
                endif
        else begin
                <$Eroman sys.peer~<<-~peer>;                    /* if
not, adjust local clock */
                <$Eroman sys.rootdelay~<<-~DELTA>;
                <$Eroman sys.rootdispersion~<<-~EPSILON~+~max ( epsilon
sub xi~+~| THETA |,~roman NTP.MINDISPERSE)>;
                endif
        <$Eroman sys.reftime~<<-~roman sys.clock>;
        end clock-update procedure;

In some system configurations a precise source of timing information is
available in the form of a train of timing pulses spaced at one-second
intervals. Usually, this is in addition to a source of timecode
information, such as a radio clock or even NTP itself, to number the
seconds, minutes, hours and days. In these configurations the system
variables are set to refer to the source from which the pulses are
derived. For those configurations which support a primary reference
source, such as a radio clock or calibrated atomic clock, the stratum is
set at one as long as this is the actual synchronization source and
whether or not the primary-clock procedure is used.

Specification of the clock-selection and local-clock algorithms is not
an integral part of the NTP specification, since there may be other
algorithms which provide equivalent performance. However, a clock-
selection algorithm found to work well in the Internet environment is
described in Section 4, while a local-clock algorithm is described in
Section 5 and their use is recommended. The clock-selection algorithm
described in Section 4 usually picks the peer at the lowest stratum and
minimum synchronization distance among all those available, unless that
peer appears to be a falseticker. The result is that the algorithms all
work to build a minimum-weight spanning tree relative to the primary
reference time servers and thus a hierarchical-master-slave
synchronization subnet.

Primary-Clock Procedure

When a primary reference source such as a radio clock is connected to
the host, it is convenient to incorporate its information into the data
base as if the clock were represented as an ordinary peer. In the
primary-clock procedure the clock is polled once a minute or so and the

noToC RFC1305 - Page 38

returned timecode used to produce a new update for the local clock. When
peer.timer decrements to zero for a primary clock peer, the transmit
procedure is not called; rather, the radio clock is polled, usually
using an ASCII string specified for this purpose. When a valid timecode
is received from the radio clock, it is converted to NTP timestamp
format and the peer variables updated. The value of peer.leap is set
depending on the status of the leap-warning bit in the timecode, if
available, or manually by the operator. The value for peer.peeraddr,
which will become the value of sys.refid when the clock-update procedure
is called, is set to an ASCII string describing the clock type (see
Appendix A).

begin primary-clock-update procedure
        <$Eroman peer.leap~<<-~"from"~radio~or~operator>;       /* copy
variables */
        <$Eroman peer.peeraddr~<<-~ASCII~identifier>;
        <$Eroman peer.rec~<<-~radio~timestamp>;
        <$Eroman peer.reach~<<-~1>;
        call clock-filter(<$Eroman {sys.clock~-~peer.rec,~0,~1~<<
<<~peer.precision}>);   /* process sample */
        call clock-update;                              /* update local
clock */
        end primary-clock-update procedure;

Initialization Procedures

The initialization procedures are used to set up and initialize the
system, its peers and associations.

 Initialization Procedure

The initialization procedure is called upon reboot or restart of the NTP
daemon. The local clock is presumably undefined at reboot; however, in
some equipment an estimate is available from the reboot environment,
such as a battery-backed clock/calendar. The precision variable is
determined by the intrinsic architecture of the local hardware clock.
The authentication variables are used only if the authentication
mechanism described in Appendix C is implemented. The values of these
variables are determined using procedures beyond the scope of NTP
itself.

begin initialization procedure
        #ifdef (authentication implemented)     / * see Appendix C */
                <$Eroman sys.keys~<<-~as~required>;
                #endef;
        <$Eroman sys.leap~<<-~11 sub 2>;                                
/* copy variables */
        <$Eroman sys.stratum~<<-~0~(undefined)>;
        <$Eroman sys.precision~<<-~host~precision>;
        <$Eroman sys.rootdelay~<<-~0~(undefined)>;

noToC RFC1305 - Page 39

        <$Eroman sys.rootdispersion~<<-~0~(undefined)>;
        <$Eroman sys.refid~<<-~0~(undefined)>;
        <$Eroman sys.reftime~<<-~0~(undefined)>;
        <$Eroman sys.clock~<<-~external~reference>;
        <$Eroman sys.peer~<<-~roman NULL>;
        <$Eroman sys.poll~<<-~roman NTP.MINPOLL>;
        for (all configured peers)                      /* create
configured associations */
                call initialization-instantiation procedure;
        end initialization procedure;

 Initialization-Instantiation Procedure

This implementation-specific procedure is called from the initialization
procedure to define an association. The addresses and modes of the peers
are determined using information read during the reboot procedure or as
the result of operator commands. The authentication variables are used
only if the authentication mechanism described in Appendix C is
implemented. The values of these variables are determined using
procedures beyond the scope of NTP itself. With the authentication bits
set as suggested, only properly authenticated peers can become the
synchronization source.

begin initialization-instantiation procedure
        <$Eroman peer.config~<<-~1>;
        #ifdef (authentication implemented)     /* see Appendix C */
                <$Eroman peer.authenable~<<-~1~(suggested)>;
                <$Eroman peer.authentic~<<-~0>;
                <$Eroman peer.hostkeyid~<<-~as~required>;
                <$Eroman peer.peerkeyid~<<-~0>;
                #endef;
        <$Eroman peer.peeraddr~<<-~peer~IP~address>;    /* copy
variables */
        <$Eroman peer.peerport~<<-~roman NTP.PORT>;
        <$Eroman peer.hostaddr~<<-~host~IP~address>;
        <$Eroman peer.hostport~<<-~roman NTP.PORT>;
        <$Eroman peer.mode~<<-~host~mode>;
        <$Eroman peer.peerpoll~<<-~0~(undefined)>;
        <$Eroman peer.timer~<<-~0>;
        <$Eroman peer.delay~<<-~0~(undefined)>;
        <$Eroman peer.offset~<<-~0~(undefined)>;
        call clear;                                     /* initialize
association */
        end initialization-instantiation procedure;

 Receive-Instantiation Procedure

The receive-instantiation procedure is called from the receive procedure
when a new peer is discovered. It initializes the peer variables and
mobilizes the association. If the message is from a peer operating in

noToC RFC1305 - Page 40

client mode (3), the host mode is set to server mode (4); otherwise, it
is set to symmetric passive mode (2). The authentication variables are
used only if the authentication mechanism described in Appendix C is
implemented. If implemented, only properly authenticated non-configured
peers can become the synchronization source.

begin receive-instantiation procedure
        #ifdef (authentication implemented)     /* see Appendix C */
                <$Eroman peer.authenable~<<-~0>;
                <$Eroman peer.authentic~<<-~0>;
                <$Eroman peer.hostkeyid~<<-~as~required>;
                <$Eroman peer.peerkeyid~<<-~0>;
                #endef
        <$Eroman peer.config~<<-~0>;                            /* copy
variables */
        <$Eroman peer.peeraddr~<<-~roman pkt.peeraddr>;
        <$Eroman peer.peerport~<<-~roman pkt.peerport>;
        <$Eroman peer.hostaddr~<<-~roman pkt.hostaddr>;
        <$Eroman peer.hostport~<<-~roman pkt.hostport>;
        if (pkt.mode = 3)                               /* determine
mode */
                <$Eroman peer.mode~<<-~4>;
                else
                <$Eroman peer.mode~<<-~2>;
        <$Eroman peer.peerpoll~<<-~0~(undefined)>;
        <$Eroman peer.timer~<<-~0>;
        <$Eroman peer.delay~<<-~0~(undefined)>;
        <$Eroman peer.offset~<<-~0~(undefined)>;
        call clear;                                     /* initialize
association */
        end receive-instantiation procedure;

 Primary Clock-Instantiation Procedure

This procedure is called from the initialization procedure in order to
set up the state variables for the primary clock. The value for
peer.precision is determined from the radio clock specification and
hardware interface. The value for peer.rootdispersion is nominally ten
times the inherent maximum error of the radio clock; for instance,
<$E10~mu s> for a calibrated atomic clock, 10 ms for a WWVB or GOES
radio clock and 100 ms for a less accurate WWV radio clock.

begin clock-instantiation procedure
        <$Eroman peer.config~<<-~1>;                            /* copy
variables */
        <$Eroman peer.peeraddr~<<-~0~undefined>;
        <$Eroman peer.peerport~<<-~0~(not~used)>;
        <$Eroman peer.hostaddr~<<-~0~(not~used)>;
        <$Eroman peer.hostport~<<-~0~(not~used)>;
        <$Eroman peer.leap~<<-~11 sub 2>;

noToC RFC1305 - Page 41

        <$Eroman peer.mode~<<-~0~(not~used)>;
        <$Eroman peer.stratum~<<-~0>;
        <$Eroman peer.peerpoll~<<-~0~(undefined)>;
        <$Eroman peer.precision~<<-~clock~precision>;
        <$Eroman peer.rootdelay~<<-~0>;
        <$Eroman peer.rootdispersion~<<-~clock~dispersion>;
        <$Eroman peer.refid~<<-~0~(not~used)>;
        <$Eroman peer.reftime~<<-~0~(undefined)>;
        <$Eroman peer.timer~<<-~0>;
        <$Eroman peer.delay~<<-~0~(undefined)>;
        <$Eroman peer.offset~<<-~0~(undefined)>;
        call clear;                                     /* initialize
association */
        end clock-instantiation procedure;

In some configurations involving a calibrated atomic clock or LORAN-C
receiver, the primary reference source may provide only a seconds pulse,
but lack a complete timecode from which the numbering of the seconds,
etc., can be derived. In these configurations seconds numbering can be
derived from other sources, such as a radio clock or even other NTP
peers. In these configurations the primary clock variables should
reflect the primary reference source, not the seconds-numbering source;
however, if the seconds-numbering source fails or is known to be
operating incorrectly, updates from the primary reference source should
be suppressed as if it had failed.

Clear Procedure

The clear procedure is called when some event occurs that results in a
significant change in reachability state or potential disruption of the
local clock.
begin clear procedure
        <$Eroman peer.org~<<-~0~(undefined)>;                   /* mark
timestamps undefined */
        <$Eroman peer.rec~<<-~0~(undefined)>;
        <$Eroman peer.xmt~<<-~0~(undefined)>;
        <$Eroman peer.reach~<<-~0>;                             /* reset
state variables */
        <$Eroman peer.filter~<<-~[0,~,0,~roman NTP.MAXDISPERSE]>;   /*
all stages */
        <$Eroman peer.valid~<<-~0>;
        <$Eroman peer.dispersion~<<-~roman NTP.MAXDISPERSE>;
        <$Eroman {peer.hostpoll~<<-~NTP.MINPOLL}>;              /* reset
poll interval */
        call poll-update;
        call clock-select;                              /* select clock
source */
        end clear procedure;

Poll-Update Procedure

noToC RFC1305 - Page 42

The poll-update procedure is called when a significant event occurs that
may result in a change of the poll interval or peer timer. It checks the
values of the host poll interval (peer.hostpoll) and peer poll interval
(peer.peerpoll) and clamps each within the valid range. If the peer is
selected for synchronization, the value is further clamped as a function
of the computed compliance (see Section 5).

begin poll-update procedure
        <$Etemp~<<-~roman peer.hostpoll>;                       /*
determine host poll interval */
        if (<$Epeer~=~roman sys.peer>)
                <$Etemp~<<-~min (temp,~roman {sys.poll,~NTP.MAXPOLL)}>;
        else
                <$Etemp~<<-~min (temp,~roman NTP.MAXPOLL)>;
        <$Eroman peer.hostpoll~<<-~max (temp,~roman NTP.MINPOLL)>;
        <$Etemp~<<-~1~<< << ~min ( roman {peer.hostpoll,~max
(peer.peerpoll,~NTP.MINPOLL)})>;

If the poll interval is unchanged and the peer timer is zero, the timer
is simply reset. If the poll interval is changed and the new timer value
is greater than the present value, no additional action is necessary;
otherwise, the peer timer must be reduced. When the peer timer must be
reduced it is important to discourage tendencies to synchronize
transmissions between the peers. A prudent precaution is to randomize
the first transmission after the timer is reduced, for instance by the
sneaky technique illustrated.

        if (peer.timer = 0)                             /* reset peer
timer */
                <$Eroman peer.timer~<<-~temp>;
        else if (<$Eroman peer.timer~>>~temp>)
                <$Eroman peer.timer~<<-~( roman sys.clock~&~(temp~-
~1))~+~1>;
        end poll-update procedure;

Synchronization Distance Procedure

The distance procedure calculates the synchronization distance from the
peer variables for the peer peer.

begin distance(peer) procedure;
        <$EDELTA~<<-~roman {peer.rootdelay~+~|peer.delay|}>;
        <$EEPSILON~<<-~roman
{peer.rootdispersion~+~peer.dispersion~+~phi (sys.clock~-~peer.update)
}>;
        <$ELAMBDA~<<-~EPSILON~+~{| DELTA |} over 2> ;
        end distance procedure;

Note that, while <$EDELTA> may be negative in some cases, both

noToC RFC1305 - Page 43

<$EEPSILON> and <$ELAMBDA> are always positive.

Access Control Issues

The NTP design is such that accidental or malicious data modification
(tampering) or destruction (jamming) at a time server should not in
general result in timekeeping errors elsewhere in the synchronization
subnet. However, the success of this approach depends on redundant time
servers and diverse network paths, together with the assumption that
tampering or jamming will not occur at many time servers throughout the
synchronization subnet at the same time. In principle, the subnet
vulnerability can be engineered through the selection of time servers
known to be trusted and allowing only those time servers to become the
synchronization source. The authentication procedures described in
Appendix C represent one mechanism to enforce this; however, the
encryption algorithms can be quite CPU-intensive and can seriously
degrade accuracy, unless precautions such as mentioned in the
description of the transmit procedure are taken.

While not a required feature of NTP itself, some implementations may
include an access-control feature that prevents unauthorized access and
controls which peers are allowed to update the local clock. For this
purpose it is useful to distinguish between three categories of access:
those that are preauthorized as trusted, preauthorized as friendly and
all other (non-preauthorized) accesses. Presumably, preauthorization is
accomplished by entries in the configuration file or some kind of
ticket-management system such as Kerberos [STE88]. In this model only
trusted accesses can result in the peer becoming the synchronization
source. While friendly accesses cannot result in the peer becoming the
synchronization source, NTP messages and timestamps are returned as
specified.

It does not seem useful to maintain a secret clock, as would result from
restricting non-preauthorized accesses, unless the intent is to hide the
existence of the time server itself. Well-behaved Internet hosts are
expected to return an ICMP service-unavailable error message if a
service is not implemented or resources are not available; however, in
the case of NTP the resources required are minimal, so there is little
need to restrict requests intended only to read the clock. A simple but
effective access-control mechanism is then to consider all associations
preconfigured in a symmetric mode or client mode (modes 1, 2 and 3) as
trusted and all other associations, preconfigured or not, as friendly.

If a more comprehensive trust model is required, the design can be based
on an access-control list with each entry consisting of a 32-bit
Internet address, 32-bit mask and three-bit mode. If the logical AND of
the source address (pkt.peeraddr) and the mask in an entry matches the
corresponding address in the entry and the mode (pkt.mode) matches the
mode in the entry, the access is allowed; otherwise an ICMP error
message is returned to the requestor. Through appropriate choice of

noToC RFC1305 - Page 44

mask, it is possible to restrict requests by mode to individual
addresses, a particular subnet or net addresses, or have no restriction
at all. The access-control list would then serve as a filter controlling
which peers could create associations.

Filtering and Selection Algorithms

A most important factor affecting the accuracy and reliability of time
distribution is the complex of algorithms used to reduce the effect of
statistical errors and falsetickers due to failure of various subnet
components, reference sources or propagation media. The algorithms
suggested in this section were developed and refined over several years
of operation in the Internet under widely varying topologies, speeds and
traffic regimes. While these algorithms are believed the best available
at the present time, they are not an integral part of the NTP
specification, since other algorithms with similar or superior
performance may be devised in future.

However, it is important to observe that not all time servers or clients
in an NTP synchronization subnet must implement these algorithms. For
instance, simple workstations may dispense with one or both of them in
the interests of simplicity if accuracy and reliability requirements
justify. Nevertheless, it would be expected that an NTP server providing
synchronization to a sizable community, such as a university campus or
research laboratory, would be expected to implement these algorithms or
others proved to have equivalent functionality. A comprehensive
discussion of the design principles and performance is given in
[MIL91a].

In order for the NTP filter and selection algorithms to operate
effectively, it is useful to have a measure of recent sample variance
recorded for each peer. The measure adopted is based on first-order
differences, which are easy to compute and effective for the purposes
intended. There are two measures, one called the filter dispersion
<$Eepsilon sub sigma> and the other the select dispersion <$Eepsilon sub
xi>. Both are computed as the weighted sum of the clock offsets in a
temporary list sorted by synchronization distance. If <$Etheta sub i
~(0~<<=~i~<<~n)> is the offset of the ith entry, then the sample
difference <$Eepsilon sub ij> of the ith entry relative to the jth entry
is defined <$Eepsilon sub ij~<~>=~| theta sub i~-~theta sub j |> . The
dispersion relative to the jth entry is defined <$Eepsilon sub j> and
computed as the weighted sum

<$Eepsilon sub j~=~sum from {i~=~0} to {n~-~1}~epsilon sub ij~w~sup
{i+1}> ,

where w is a weighting factor chosen to control the influence of
synchronization distance in the dispersion budget. In the NTP algorithms
w is chosen less than <$E1 / 2>: <$Ew~=~roman NTP.FILTER> for filter
dispersion and <$Ew~=~roman NTP.SELECT> for select dispersion. The

noToC RFC1305 - Page 45

(absolute) dispersion <$Eepsilon sub sigma> and <$Eepsilon sub xi> as
used in the NTP algorithms are defined relative to the 0th entry
<$Eepsilon sub 0>.

There are two procedures described in the following, the clock-filter
procedure, which is used to select the best offset samples from a given
clock, and the clock-selection procedure, which is used to select the
best clock among a hierarchical set of clocks.

Clock-Filter Procedure

The clock-filter procedure is executed upon arrival of an NTP message or
other event that results in new data samples. It takes arguments of the
form (<$Etheta ,~delta ,~epsilon>), where <$Etheta> is a sample clock
offset measurement and <$Edelta> and <$Eepsilon> are the associated
roundtrip delay and dispersion. It determines the filtered clock offset
(peer.offset), roundtrip delay (peer.delay) and dispersion
(peer.dispersion). It also updates the dispersion of samples already
recorded and saves the current time (peer.update).

The basis of the clock-filter procedure is the filter shift register
(peer.filter), which consists of NTP.SHIFT stages, each stage containing
a 3-tuple <$E[ theta sub i ,~delta sub i ,~epsilon sub i ]>, with
indices numbered from zero on the left. The filter is initialized with
the value <$E[0,~0,~roman NTP.MAXDISPERSE]> in all stages by the clear
procedure. New data samples are shifted into the filter at the left end,
causing first NULLs then old samples to fall off the right end. The
packet procedure provides samples of the form (<$Etheta ,~delta
,~epsilon>) as new updates arrive, while the transmit procedure provides
samples of the form <$E[0,~0,~roman NTP.MAXDISPERSE]> when two poll
intervals elapse without a fresh update. While the same symbols
(<$Etheta ,~delta ,~epsilon>) are used here for the arguments, clock-
filter contents and peer variables, the meaning will be clear from
context. The following pseudo-code describes this procedure.

begin clock-filter procedure (<$Etheta ,~delta ,~epsilon>)

The dispersion <$Eepsilon sub i> for all valid samples in the filter
register must be updated to account for the skew-error accumulation
since the last update. These samples are also inserted on a temporary
list with entry format <$E[distance,index]>. The samples in the register
are shifted right one stage, with the overflow sample discarded and the
new sample inserted at the leftmost stage. The temporary list is then
sorted by increasing distance. If no samples remain in the list, the
procedure exits without updating the peer variables.

        for (i from NTP.SIZE <196> 1 to 1) begin        /* update
dispersion */
                <$E[ theta sub i ,~delta sub i ,~epsilon sub i ]~<<-~[
theta sub {i-1} ,~delta sub {i-1} ,~epsilon sub {i-1} ]>;

noToC RFC1305 - Page 46

/* shift stage right */
                <$Eepsilon sub i~=~epsilon sub i~+~phi tau>;
                add <$E[ lambda sub i~==~epsilon sub i~+~{| delta  sub i
|} over 2 ,~i]> to temporary list;
                endfor;
        <$E[ theta sub 0 ,~delta sub 0 ,~epsilon sub 0 ]~<<-~[ theta
,~delta ,~epsilon ]>;                   /* insert new sample */
        add <$E[ lambda~==~epsilon~+~{| delta |} over 2 ,~0]> to
temporary list;
        <$Eroman peer.update~<<-~roman sys.clock>;                      
/* reset base time */
        sort temporary list by increasing <$E[distance~||index]>;

where <$E[distance~||index]> represents the concatenation of the
distance and index fields and distance is the high-order field. The
filter dispersion <$Eepsilon sub sigma> is computed and included in the
peer dispersion. Note that for this purpose the temporary list is
already sorted.

        <$Eepsilon sub sigma~<<-~0>;
        for (i from NTP.SHIFT<196>1 to 0)               /* compute
filter dispersion */
                if (<$Eroman peer.dispersion sub index[i]~>>=~roman
NTP.MAXDISPERSE> or
                        <$E| theta sub i~-~theta sub 0 |~>>~roman
NTP.MAXDISPERSE>)
                        <$Eepsilon sub sigma~<~><<-~( epsilon sub
sigma~+~roman NTP.MAXDISPERSE)~times~roman NTP.FILTER>;
                else
                        <$Eepsilon sub sigma~<~><<-~( epsilon sub
sigma~+~| theta sub i~-~theta sub 0 |)~times~roman NTP.FILTER>;

The peer offset <$Etheta sub 0>, delay <$Edelta sub 0> and dispersion
<$Eepsilon sub 0> are chosen as the values corresponding to the minimum-
distance sample; in other words, the sample corresponding to the first
entry on the temporary list, here represented as the 0th subscript.

        <$Eroman peer.offset~<<-~theta sub 0>;                          
/* update peer variables */
        <$Eroman peer.delay~<<-~delta sub 0>;
        <$Eroman peer.dispersion~<<-~min ( epsilon sub 0~+~epsilon sub
sigma ,~roman NTP.MAXDISPERSE)>;
        end clock-filter procedure

The peer.offset and peer.delay variables represent the clock offset and
roundtrip delay of the local clock relative to the peer clock. Both of
these are precision quantities and can usually be averaged over long
intervals in order to improve accuracy and stability without bias
accumulation (see Appendix H). The peer.dispersion variable represents
the maximum error due to measurement error, skew-error accumulation and

noToC RFC1305 - Page 47

sample variance. All three variables are used in the clock-selection and
clock-combining procedures to select the peer clock(s) used for
synchronization and to maximize the accuracy and stability of the
indications.

Clock-Selection Procedure

The clock-selection procedure uses the peer variables <$ETHETA>,
<$EDELTA>, <$EEPSILON> and <$Etau> and is called when these variables
change or when the reachability status changes. It consists of two
algorithms, the intersection algorithm and the clustering algorithm. The
intersection algorithm constructs a list of candidate peers eligible to
become the synchronization source, computes a confidence interval for
each and casts out falsetickers using a technique adapted from Marzullo
and Owicki [MAR85]. The clustering algorithm sorts the list of surviving
candidates in order of stratum and synchronization distance and
repeatedly casts out outlyers on the basis of select dispersion until
only the most accurate, precise and stable survivors are left. A bit is
set for each survivor to indicate the outcome of the selection process.
The system variable sys.peer is set as a pointer to the most likely
survivor, if there is one, or to the NULL value if not.

Intersection Algorithm

        begin clock-selection procedure

Each peer is examined in turn and added to an endpoint list only if it
passes several sanity checks designed to avoid loops and use of
exceptionally noisy data. If no peers survive the sanity checks, the
procedure exits without finding a synchronization source. For each of m
survivors three entries of the form <$E[endpoint,~type]> are added to
the endpoint list: <$E[ THETA~-~LAMBDA ,~-~1]>, <$E[ THETA ,~0]> and
<$E[ THETA~+~LAMBDA ,~1]>. There will be <$E3 m> entries on the list,
which is then sorted by increasing endpoint.

        <$Em~<<-~0>;
        for (each peer)                         /* calling all peers */
                if (<$Eroman {peer.reach~!=~0~bold
and~peer.dispersion~<<~NTP.MAXDISPERSE}> and
                        not (peer.stratum >> 1 and peer.refid =
peer.hostaddr)) begin
                        <$ELAMBDA~<MO>
<~>an distance (peer)>;                 /* make list entry */
                        add <$E[ THETA~-~LAMBDA ,~-1]> to endpoint list;
                        add <$E[ THETA ,~0]> to endpoint list;
                        add <$E[ THETA~+~LAMBDA ,~1]> to endpoint list;
                        <$Em~<<-~m~+~1>;
                        <B>endif
                endfor
        if (<$Em~=~0>) begin                            /* skedaddle if

noToC RFC1305 - Page 48

no candidates */
                <$Eroman sys.peer~<<-~roman NULL>;
                <$Eroman sys.stratum~<<-~0~(undefined)>;
                exit;
                endif
        sort endpoint list by increasing endpoint||type;

The following algorithm is adapted from DTS [DEC89] and is designed to
produce the largest single intersection containing only truechimers. The
algorithm begins by initializing a value f and counters i and c to zero.
Then, starting from the lowest endpoint of the sorted endpoint list, for
each entry <$E[endpoint,~type]> the value of type is subtracted from the
counter i, which is the number of intersections. If type is zero,
increment the value of c, which is the number of falsetickers (see
Appendix H). If <$Ei~>>=~m~-~f> for some entry, endpoint of that entry
becomes the lower endpoint of the intersection; otherwise, f is
increased by one and the above procedure is repeated. Without resetting
f or c, a similar procedure is used to find the upper endpoint, except
that the value of type is added to the counter.. If after both endpoints
have been determined <$Ec~<<=~f>, the procedure continues having found
<$Em~-~f> truechimers; otherwise, f is increased by one and the entire
procedure is repeated.

        for (f from 0 to <$Ef~>>=~m over 2>) begin              /*
calling all truechimers */
                <$Ec~<<-~0>;
                <$Ei~<<-~0>;
                for (each [endpoint, type] from lowest) begin   /* find
low endpoint */
                        <$Ei~<<-~i~-~type>;
                        <$Elow~<<-~endpoint>;
                        if (<$Ei~>>=~m~-~f>) break;
                        if (<$Etype~=~0>) <$Ec~<<-~c~+~1>;
                        endfor;
                <$Ei~<<-~0>;

                for (each [endpoint, type] from highest) begin  /* find
high endpoint */
                        <$Ei~<<-~i~+~type>;
                        <$Ehigh~<<-~endpoint>;
                        if (<$Ei~>>=~m~-~f>) break;
                        if (<$Etype~=~0>) <$Ec~<<-~c~+~1>;
                        endfor;
                if (<$Ec~<<=~f>) break;                 /* continue
until all falsetickers found */
                endfor;
        if (<$Elow~>>~high>) begin                              /* quit
if no intersection found */
                <$Eroman sys.peer~<<-~roman NULL>;
                exit;

noToC RFC1305 - Page 49

                endif;

Note that processing continues past this point only if there are more
than <$Em over 2> intersections. However, it is possible, but not highly
likely, that there may be fewer than <$Em over 2> truechimers remaining
in the intersection.

Clustering Algorithm

In the original DTS algorithm the clock-selection procedure exits at
this point with the presumed correct time set midway in the computed
intersection <$E[low,~high]>. However, this can lead to a considerable
loss in accuracy and stability, since the individual peer statistics are
lost. Therefore, in NTP the candidates that survived the preceding steps
are processed further. The candidate list is rebuilt with entries of the
form <$E[distance,~index]>, where distance is computed from the (scaled)
peer stratum and synchronization distance <$ELAMBDA>. The scaling factor
provides a mechanism to weight the combination of stratum and distance.
Ordinarily, the stratum will dominate, unless one or more of the
survivors has an exceptionally high distance. The list is then sorted by
increasing distance.

        <$Em~<<-~0>;
        for (each peer) begin                   /* calling all peers */
                if (<$Elow~<<=~theta~<<=~high>) begin
                        <$ELAMBDA~<<-~roman distance (peer)>;           
/* make list entry */
                        <$Edist~<<-~roman
{peer.stratum~times~NTP.MAXDISPERSE~+~LAMBDA }>
                        add <$E[ dist ,~peer]> to candidate list;
                        <$Em~<<-~m~+~1>;
                        endif;
                endfor;
        sort candidate list by increasing dist;

The next steps are designed to cast out outlyers which exhibit
significant dispersions relative to the other members of the candidate
list while minimizing wander, especially on high-speed LANs with many
time servers. Wander causes needless network overhead, since the poll
interval is clamped at sys.poll as each new peer is selected for
synchronization and only slowly increases when the peer is no longer
selected. It has been the practical experience that the number of
candidates surviving to this point can become quite large and can result
in significant processor cycles without materially enhancing stability
and accuracy. Accordingly, the candidate list is truncated at
NTP.MAXCLOCK entries.

Note <$Eepsilon sub {xi i}> is the select (sample) dispersion relative
to the ith peer represented on the candidate list, which can be
calculated in a manner similar to the filter dispersion described

noToC RFC1305 - Page 50

previously. The <$EEPSILON sub j> is the dispersion of the jth peer
represented on the list and includes components due to measurement
error, skew-error accumulation and filter dispersion. If the maximum
<$Eepsilon sub {xi i}> is greater than the minimum <$EEPSILON sub j> and
the number of survivors is greater than NTP.MINCLOCK, the ith peer is
discarded from the list and the procedure is repeated. If the current
synchronization source is one of the survivors and there is no other
survivor of lower stratum, then the procedure exits without doing
anything further. Otherwise, the synchronization source is set to the
first survivor on the candidate list. In the following i, j, k, l are
peer indices, with k the index of the current synchronization source
(NULL if none) and l the index of the first survivor on the candidate
list.

        while begin
                for (each survivor <$E[distance,~index]>) begin /*
compute dispersions */
                        find index i for max <$Eepsilon sub {xi i}>;
                        find index j for min <$EEPSILON sub j>;
                        endfor
                if (<$Eepsilon sub {xi i}~<<=~EPSILON sub j> or
<$Em~<<=~roman NTP.MINCLOCK>) break;
                <$Eroman peer.survivor [i]~<<-~0> ;             /*
discard ith peer */
                if (<$Ei~=~k>) <$Eroman sys.peer~<<-~roman NULL>;
                delete the ith peer from the candidate list;
                <$Em~<<-~m~-~1>;
                endwhile
        if (<$Eroman peer.survivor [k]~=~0> or <$Eroman peer.stratum
[k]~>>~roman peer.stratum [l]>) begin
                <$Eroman sys.peer~<<-~l>;                               
/* new clock source */
                call poll-update;
                endif
        end clock-select procedure;

The algorithm is designed to favor those peers near the head of the
candidate list, which are at the lowest stratum and distance and
presumably can provide the most accurate and stable time. With proper
selection of weight factor v (also called NTP.SELECT), entries will be
trimmed from the tail of the list, unless a few outlyers disagree
significantly with respect to the remaining entries, in which case the
outlyers are discarded first. The termination condition is designed to
avoid needless switching between synchronization sources when not
statistically justified, yet maintain a bias toward the low-stratum,
low-distance peers.

Local Clocks

In order to implement a precise and accurate local clock, the host must

noToC RFC1305 - Page 51

be equipped with a hardware clock consisting of an oscillator and
interface and capable of the required precision and stability. A logical
clock is then constructed using these components plus software
components that adjust the apparent time and frequency in response to
periodic updates computed by NTP or some other time-synchronization
protocol such as Hellospeak [MIL83b] or the Unix 4.3bsd TSP [GUS85a].
This section describes the Fuzzball local-clock model and
implementation, which includes provisions for precise time and frequency
adjustment and can maintain time to within 15 ns and frequency to within
0.3 ms per day. The model is suitable for use with both compensated and
uncompensated quartz oscillators and can be adapted to power-frequency
oscillators. A summary of the characteristics of these and other types
of oscillators can be found in Appendix E, while a comprehensive
mathematical analysis of the NTP local-clock model can be found in
Appendix G.

It is important to note that the particular implementation described is
only one of possibly many implementations that provide equivalent
functionality. However, it is equally important to note that the clock
model described in Appendix G and which is the basis of the
implementation involves a particular kind of control-feedback loop that
is potentially unstable if the design rules are broken. The model and
parameter described in Appendix G are designed to provide accurate and
stable time under typical operating conditions using conventional
hardware and in the face of disruptions in hardware or network
connectivity. The parameters have been engineered for reliable operation
in a multi-level hierarchical subnet where unstable operation at one
level can disrupt possibly many other levels.

Fuzzball Implementation

The Fuzzball local clock consists of a collection of hardware and
software registers, together with a set of algorithms, which implement a
logical clock that functions as a disciplined oscillator and
synchronizes to an external source. Following is a description of its
components and manner of operation. Note that all arithmetic is two's
complement integer and all shifts <169><<<<<170> and <169>>>>><170> are
arithmetic (sign-fill for right shifts and zero-fill for left shifts).
Also note that <$Ex~<< <<~n> is equivalent to <$Ex~>> >>~-~n>.

The principal components of the local clock are shown in Figure
3,<$&fig3> in which the fraction points shown are relative to whole
milliseconds. The 48-bit Clock register and 32-bit Prescaler function as
a disciplined oscillator which increments in milliseconds relative to
midnight at the fraction point. The 32-bit Clock-Adjust register is used
to adjust the oscillator phase in gradual steps to avoid discontinuities
in the indicated timescale. Its contents are designated x in the
following. The 32-bit Skew-Compensation register is used to trim the
oscillator frequency by adding small phase increments at periodic
adjustment intervals and can compensate for frequency errors as much as

noToC RFC1305 - Page 52

.01% or <F128M>?<F255D>100 ppm. Its contents are designated y in the
following. The 16-bit Watchdog counter and 32-bit Compliance register
are used to determine validity, as well as establish the PLL bandwidth
and poll interval (see Appendix G). The contents of the Compliance
register are designated z in the following. The 32-bit PPS-Adjust
register is used to hold a precision time adjustment when a source of 1-
pps pulses is available, while the 8-bit PPS counter is used to verify
presence of these pulses. The two-bit Flags register contains the two
leap bits described elsewhere (leap).

All registers except the Prescaler register are ordinarily implemented
in memory. In typical clock interface designs such as the DEC KWV11-C,
the Prescaler register is implemented as a 16-bit buffered counter
driven by a quartz-controlled oscillator at some multiple of 1000 Hz. A
counter overflow is signalled by an interrupt, which results in an
increment of the Clock register at the bit corresponding to the
overflow. The time of day is determined by reading the Prescaler
register, which does not disturb the counting process, and adding its
value to that of the Clock register with fraction points aligned as
shown and with unimplemented low-order bits set to zero. In other
interface designs, such as the LSI-11 event-line mechanism, each tick of
the clock is signalled by an interrupt at intervals of 16-2/3 ms or 20
ms, depending on interface and mains frequency. When this occurs the
appropriate increment in fractional milliseconds is added to the Clock
register.

The various parameters used are summarized in Table 6, in which certain
parameters have been rescaled from those given in Appendix G due to the
units here being in milliseconds.<$&tab6> When the system is
initialized, all registers and counters are cleared and the leap bits
set to 112 (unsynchronized). At adjustment intervals of CLOCK.ADJ
seconds CLOCK.ADJ is subtracted from the PPS counter, but only if the
previous contents of the PPS counter are greater than zero. Also,
CLOCK.ADJ is added to the Watchdog counter, but the latter is clamped
not to exceed NTP.MAXAGE divided by CLOCK.ADJ (one full day). In
addition, if the Watchdog counter reaches this value, the leap bits are
set to 112 (unsynchronized).

In some system configurations a precise source of timing information is
available in the form of a train of timing pulses spaced at one-second
intervals. Usually, this is in addition to a source of timecode
information, such as a radio clock or even NTP itself, to number the
seconds, minutes, hours and days. In typical clock interface designs
such as the DEC KWV11-C, a special input is provided which can trigger
an interrupt as each pulse is received. When this happens the PPS
counter is set to CLOCK.PPS and the current time offset is determined in
the usual way. Then, the PPS-Adjust register is set to the time offset
scaled to milliseconds. Finally, if the PPS-Adjust register is greater
than or equal to 500, 1000 is subtracted from its contents. As described
below, the PPS-Adjust register and PPS counters can be used in

noToC RFC1305 - Page 53

conjunction with an ordinary timecode to produce an extremely accurate
local clock.

Gradual Phase Adjustments

Left uncorrected, the local clock runs at the offset and frequency
resulting from its last update. An update is produced by an event that
results in a valid clock selection. It consists of a signed 48-bit
integer in whole milliseconds and fraction, with fraction point to the
left of bit 32. If the magnitude is greater than the maximum aperture
CLOCK.MAX, a step adjustment is required, in which case proceed as
described later. Otherwise, a gradual phase adjustment is performed.
Normally, the update is computed by the NTP algorithms described
previously; however, if the PPS counter is greater than zero, the value
of the PPS-Adjust register is used instead. Let u be a 32-bit quantity
with bits 0-31 set as bits 16-47 of the update. If some of the low-order
bits of the update are unimplemented, they are set as the value of the
sign bit. These operations move the fraction point of u to the left of
bit 16 and minimize the effects of truncation and roundoff errors. Let b
be the number of leading zeros of the absolute value of the Compliance
register and let c be the number of leading zeros of the Watchdog
counter, both of which are easily computed by compact loops. Then, set b
to
<$Eb~=~b~-~16~+~roman CLOCK.COMP>

and clamp it to be not less than zero. This represents the logarithm of
the loop time constant. Then, set c to

<$Ec~=~10~-~c>

and clamp it to be not greater than NTP.MAXPOLL <196> NTP.MINPOLL. This
represents the logarithm of the integration interval since the last
update. The clamps insure stable operation under typical conditions
encountered in the Internet. Then, compute new values for the Clock-
Adjust and Skew-Compensation registers

<$Ex~=~u~>> >>~b> ,
<$Ey~=~y~+~(u~>> >>~(b~+~b~-~c))> .

Finally, compute the exponential average

<$Ez~=~z~+~(u~<< <<~(b~+~ roman CLOCK.MULT)~-~z)~>> >>~ roman
CLOCK.WEIGHT> ,

where the left shift realigns the fraction point for greater precision
and ease of computation.

At each adjustment interval the final clock correction consisting of two
components is determined. The first (phase) component consists of the
quantity

noToC RFC1305 - Page 54

<$Ex~>> >>~ roman CLOCK.PHASE> ,

which is then subtracted from the previous contents of the Clock-Adjust
register to form the new contents of that register. The second
(frequency) component consists of the quantity

<$Ey~>> >>~ roman CLOCK.FREQ> .

The sum of the phase and frequency components is the final clock
correction, which is then added to the Clock register. FInally, the
Watchdog counter is set to zero. Operation continues in this way until a
new correction is introduced.

The value of b computed above can be used to update the poll interval
system variable (sys.poll). This functions as an adaptive parameter that
provides a very valuable feature which reduces the polling overhead,
especially if the clock-combining algorithm described in Appendix F is
used:

<$Eroman sys.poll~<<-~b~+~roman NTP.MINPOLL> .

Under conditions when update noise is high or the hardware oscillator
frequency is changing relatively rapidly due to environmental
conditions, the magnitude of the compliance increases. With the
parameters specified, this causes the loop bandwidth (reciprocal of time
constant) to increase and the poll interval to decrease, eventually to
NTP.MINPOLL seconds. When noise is low and the hardware oscillator very
stable, the compliance decreases, which causes the loop bandwidth to
decrease and the poll interval to increase, eventually to NTP.MAXPOLL
seconds.

The parameters in Table 6 have been selected so that, under good
conditions with updates in the order of a few milliseconds, a precision
of a millisecond per day (about .01 ppm or 10-8), can be achieved. Care
is required in the implementation to insure monotonicity of the Clock
register and to preserve the highest precision while minimizing the
propagation of roundoff errors. Since all of the multiply/divide
operations (except those involved with the 1-pps pulses) computed in
real time can be approximated by bitwise-shift operations, it is not
necessary to implement a full multiply/divide capability in hardware or
software.

In the various implementations of NTP for many Unix-based systems it has
been the common experience that the single most important factor
affecting local-clock stability is the matching of the phase and
frequency coefficients to the particular kernel implementation. It is
vital that these coefficients be engineered according to the model
values, for otherwise the PLL can fail to track normal oscillator
variations and can even become unstable.

noToC RFC1305 - Page 55

Step Phase Adjustments

When the magnitude of a correction exceeds the maximum aperture
CLOCK.MAX, the possibility exists that the clock is so far out of
synchronization with the reference source that the best action is an
immediate and wholesale replacement of Clock register contents, rather
than in gradual adjustments as described above. However, in cases where
the sample variance is extremely high, it is prudent to disbelieve a
step change, unless a significant interval has elapsed since the last
gradual adjustment. Therefore, if a step change is indicated and the
Watchdog counter is less than the preconfigured value CLOCK.MINSTEP, the
update is ignored and the local-clock procedure exits. These safeguards
are especially useful in those system configurations using a calibrated
atomic clock or LORAN-C receiver in conjunction with a separate source
of seconds-numbering information, such as a radio clock or NTP peer.

If a step change is indicated the update is added directly to the Clock
register and the Clock-Adjust register and Watchdog counter both set to
zero, but the other registers are left undisturbed. Since a step change
invalidates data currently in the clock filters, the leap bits are set
to 112 (unsynchronized) and, as described elsewhere, the clear procedure
is called to purge the clock filters and state variables for all peers.
In practice, the necessity to perform a step change is rare and usually
occurs when the local host or reference source is rebooted, for example.
This is fortunate, since step changes can result in the local clock
apparently running backward, as well as incorrect delay and offset
measurements of the synchronization mechanism itself.

Considerable experience with the Internet environment suggests the
values of CLOCK.MAX tabulated in Table 6 as appropriate. In practice,
these values are exceeded with a single time-server source only under
conditions of the most extreme congestion or when multiple failures of
nodes or links have occurred. The most common case when the maximum is
exceeded is when the time-server source is changed and the time
indicated by the new and old sources exceeds the maximum due to
systematic errors in the primary reference source or large differences
in path delays. It is recommended that implementations include
provisions to tailor CLOCK.MAX for specific situations. The amount that
CLOCK.MAX can be increased without violating the monotonicity
requirement depends on the Clock register increment. For an increment of
10 ms, as used in many workstations, the value shown in Table 6 can be
increased by a factor of five.

Implementation Issues

The basic NTP robustness model is that a host has no other means to
verify time other than NTP itself. In some equipment a battery-backed
clock/calendar is available for a sanity check. If such a device is
available, it should be used only to confirm sanity of the timekeeping

noToC RFC1305 - Page 56

system, not as the source of system time. In the common assumption (not
always justified) that the clock/calendar is more reliable, but less
accurate, than the NTP synchronization subnet, the recommended approach
at initialization is to set the Clock register as determined from the
clock/calendar and the other registers, counters and flags as described
above. On subsequent updates if the time offset is greater than a
configuration parameter (e.g., 1000 seconds), then the update should be
discarded and an error condition reported. Some implementations
periodically record the contents of the Skew-Compensation register in
stable storage such as a system file or NVRAM and retrieve this value at
initialization. This can significantly reduce the time to converge to
the nominal stability and accuracy regime.

Conversion from NTP format to the common date and time formats used by
application programs is simplified if the internal local-clock format
uses separate date and time variables. The time variable is designed to
roll over at 24 hours, give or take a leap second as determined by the
leap-indicator bits, with its overflows (underflows) incrementing
(decrementing) the date variable. The date and time variables then
indicate the number of days and seconds since some previous reference
time, but uncorrected for intervening leap seconds.

On the day prior to the insertion of a leap second the leap bits
(sys.leap) are set at the primary servers, presumably by manual means.
Subsequently, these bits show up at the local host and are passed to the
local-clock procedure. This causes the modulus of the time variable,
which is the length of the current day, to be increased or decreased by
one second as appropriate. Immediately following insertion the leap bits
are reset. Additional discussion on this issue can be found in Appendix
E.

Lack of a comprehensive mechanism to administer the leap bits in the
primary servers is presently an awkward problem better suited to a
comprehensive network-management mechanism yet to be developed. As a
practical matter and unless specific provisions have been made
otherwise, currently manufactured radio clocks have no provisions for
leap seconds, either automatic or manual. Thus, when a leap actually
occurs, the radio must resynchronize to the broadcast timecode, which
may take from a few minutes to some hours. Unless special provisions are
made, a primary server might leap to the new timescale, only to be
yanked back to the previous timescale when it next synchronizes to the
radio. Subsequently, the server will be yanked forward again when the
radio itself resynchronizes to the broadcast timecode.

This problem can not be reliably avoided using any selection algorithm,
since there will always exist an interval of at least a couple of
minutes and possibly as much as some hours when some or all radios will
be out of synchronization with the broadcast timecode and only after the
majority of them have resynchronized will the subnet settle down. The
CLOCK.MINSTEP delay is designed to cope with this problem by forcing a

noToC RFC1305 - Page 57

minimum interval since the last gradual adjustment was made before
allowing a step change to occur. Therefore, until the radio
resynchronizes, it will continue on the old timescale, which is one
second off the local clock after the leap and outside the maximum
aperture CLOCK.MAX permitted for gradual phase adjustments. When the
radio eventually resynchronizes, it will almost certainly come up within
the aperture and again become the reference source. Thus, even in the
unlikely case when the local clock incorrectly leaps, the server will go
no longer than CLOCK.MINSTEP seconds before resynchronizing.

Acknowledgments

Many people contributed to the contents of this document, which was
thoroughly debated by electronic mail and debugged using two different
prototype implementations for the Unix 4.3bsd operating system, one
written by Louis Mamakos and Michael Petry of the University of Maryland
and the other by Dennis Ferguson of the University of Toronto. Another
implementation for the Fuzzball operating system [MIL88b] was written by
the author. Many individuals to numerous to mention meticulously tested
the several beta-test prototype versions and ruthlessly smoked out the
bugs, both in the code and the specification. Especially useful were
comments from Dennis Ferguson and Bill Sommerfeld, as well as
discussions with Joe Comuzzi and others at Digital Equipment
Corporation.

References

[ABA89]

Abate, et al. AT&T's new approach to the synchronization of
telecommunication networks. IEEE Communications Magazine (April 1989),
35-45.

[ALL74a]

Allan, D.W., J.H. Shoaf and D. Halford. Statistics of time and frequency
data analysis. In: Blair, B.E. (Ed.). Time and Frequency Theory and
Fundamentals. National Bureau of Standards Monograph 140, U.S.
Department of Commerce, 1974, 151-204.

[ALL74b]

Allan, D.W., J.E. Gray and H.E. Machlan. The National Bureau of
Standards atomic time scale: generation, stability, accuracy and
accessibility. In: Blair, B.E. (Ed.). Time and Frequency Theory and
Fundamentals. National Bureau of Standards Monograph 140, U.S.
Department of Commerce, 1974, 205-231.

[BEL86]

noToC RFC1305 - Page 58

Bell Communications Research. Digital Synchronization Network Plan.
Technical Advisory TA-NPL-000436, 1 November 1986.

[BER87]

Bertsekas, D., and R. Gallager. Data Networks. Prentice-Hall, Englewood
Cliffs, NJ, 1987.

[BLA74]

Blair, B.E. Time and frequency dissemination: an overview of principles
and techniques. In: Blair, B.E. (Ed.). Time and Frequency Theory and
Fundamentals. National Bureau of Standards Monograph 140, U.S.
Department of Commerce, 1974, 233-314.

[BRA80]

Braun, W.B. Short term frequency effects in networks of coupled
oscillators. IEEE Trans. Communications COM-28, 8 (August 1980), 1269-
1275.

[COL88]

Cole, R., and C. Foxcroft. An experiment in clock synchronisation. The
Computer Journal 31, 6 (1988), 496-502.

[DAR81a]

Defense Advanced Research Projects Agency. Internet Protocol. DARPA
Network Working Group Report RFC-791, USC Information Sciences
Institute, September 1981.

[DAR81b]

Defense Advanced Research Projects Agency. Internet Control Message
Protocol. DARPA Network Working Group Report RFC-792, USC Information
Sciences Institute, September 1981.

[DEC89]

Digital Time Service Functional Specification Version T.1.0.5. Digital
Equipment Corporation, 1989.

[DER90]

Dershowitz, N., and E.M. Reingold. Calendrical Calculations. Software
Practice and Experience 20, 9 (September 1990), 899-928.

[FRA82]

noToC RFC1305 - Page 59

Frank, R.L. History of LORAN-C. Navigation 29, 1 (Spring 1982).

[GUS84]

Gusella, R., and S. Zatti. TEMPO - A network time controller for a
distributed Berkeley UNIX system. IEEE Distributed Processing Technical
Committee Newsletter 6, NoSI-2 (June 1984), 7-15. Also in: Proc. Summer
USENIX Conference (June 1984, Salt Lake City).

[GUS85a]

Gusella, R., and S. Zatti. The Berkeley UNIX 4.3BSD time synchronization
protocol: protocol specification. Technical Report UCB/CSD 85/250,
University of California, Berkeley, June 1985.

[GUS85b]

Gusella, R., and S. Zatti. An election algorithm for a distributed clock
synchronization program. Technical Report UCB/CSD 86/275, University of
California, Berkeley, December 1985.

[HAL84]

Halpern, J.Y., B. Simons, R. Strong and D. Dolly. Fault-tolerant clock
synchronization. Proc. Third Annual ACM Symposium on Principles of
Distributed Computing (August 1984), 89-102.

[JOR85]

Jordan, E.C. (Ed). Reference Data for Engineers, Seventh Edition. H.W.
Sams & Co., New York, 1985.

[KOP87]

Kopetz, H., and W. Ochsenreiter. Clock synchronization in distributed
real-time systems. IEEE Trans. Computers C-36, 8 (August 1987), 933-939.

[LAM78]

Lamport, L., Time, clocks and the ordering of events in a distributed
system. Comm. ACM 21, 7 (July 1978), 558-565.

[LAM85]

Lamport, L., and P.M. Melliar-Smith. Synchronizing clocks in the
presence of faults. J. ACM 32, 1 (January 1985), 52-78.

[LIN80]

Lindsay, W.C., and A.V. Kantak. Network synchronization of random

noToC RFC1305 - Page 60

signals. IEEE Trans. Communications COM-28, 8 (August 1980), 1260-1266.

[LUN84]

Lundelius, J., and N.A. Lynch. A new fault-tolerant algorithm for clock
synchronization. Proc. Third Annual ACM Symposium on Principles of
Distributed Computing (August 1984), 75-88.

[MAR85]

Marzullo, K., and S. Owicki. Maintaining the time in a distributed
system. ACM Operating Systems Review 19, 3 (July 1985), 44-54.

[MIL81a]

Mills, D.L. Time Synchronization in DCNET Hosts. DARPA Internet Project
Report IEN-173, COMSAT Laboratories, February 1981.

[MIL81b]

Mills, D.L. DCNET Internet Clock Service. DARPA Network Working Group
Report RFC-778, COMSAT Laboratories, April 1981.

[MIL83a]

Mills, D.L. Internet Delay Experiments. DARPA Network Working Group
Report RFC-889, M/A-COM Linkabit, December 1983.

[MIL83b]

Mills, D.L. DCN local-network protocols. DARPA Network Working Group
Report RFC-891, M/A-COM Linkabit, December 1983.

[MIL85a]

Mills, D.L. Algorithms for synchronizing network clocks. DARPA Network
Working Group Report RFC-956, M/A-COM Linkabit, September 1985.

[MIL85b]

Mills, D.L. Experiments in network clock synchronization. DARPA Network
Working Group Report RFC-957, M/A-COM Linkabit, September 1985.

[MIL85c]

Mills, D.L. Network Time Protocol (NTP). DARPA Network Working Group
Report RFC-958, M/A-COM Linkabit, September 1985.

[MIL88a]

noToC RFC1305 - Page 61

Mills, D.L. Network Time Protocol (version 1) - specification and
implementation. DARPA Network Working Group Report RFC-1059, University
of Delaware, July 1988.

[MIL88b]

Mills, D.L. The Fuzzball. Proc. ACM SIGCOMM 88 Symposium (Palo Alto, CA,
August 1988), 115-122.

[MIL89]

Mills, D.L. Network Time Protocol (version 2) - specification and
implementation. DARPA Network Working Group Report RFC-1119, University
of Delaware, September 1989.

[MIL90]

Mills, D.L. Measured performance of the Network Time Protocol in the
Internet system. ACM Computer Communication Review 20, 1 (January 1990),
65-75.

[MIL91a]

Mills, D.L. Internet time synchronization: the Network Time Protocol.
IEEE Trans. Communications 39, 10 (October 1991), 1482-1493.

[MIL91b]

Mills, D.L. On the chronology and metrology of computer network
timescales and their application to the Network Time Protocol. ACM
Computer Communications Review 21, 5 (October 1991), 8-17.

[MIT80]

Mitra, D. Network synchronization: analysis of a hybrid of master-slave
and mutual synchronization. IEEE Trans. Communications COM-28, 8 (August
1980), 1245-1259.

[NBS77]

Data Encryption Standard. Federal Information Processing Standards
Publication 46. National Bureau of Standards, U.S. Department of
Commerce, 1977.

[NBS79]

Time and Frequency Dissemination Services. NBS Special Publication 432,
U.S. Department of Commerce, 1979.

[NBS80]

noToC RFC1305 - Page 62

DES Modes of Operation. Federal Information Processing Standards
Publication 81. National Bureau of Standards, U.S. Department of
Commerce, December 1980.

[POS80]

Postel, J. User Datagram Protocol. DARPA Network Working Group Report
RFC-768, USC Information Sciences Institute, August 1980.

[POS83a]

Postel, J. Daytime protocol. DARPA Network Working Group Report RFC-867,
USC Information Sciences Institute, May 1983.

[POS83b]

Postel, J. Time protocol. DARPA Network Working Group Report RFC-868,
USC Information Sciences Institute, May 1983.

[RIC88]

Rickert, N.W. Non Byzantine clock synchronization - a programming
experiment. ACM Operating Systems Review 22, 1 (January 1988), 73-78.

[SCH86]

Schneider, F.B. A paradigm for reliable clock synchronization.
Department of Computer Science Technical Report TR 86-735, Cornell
University, February 1986.

[SMI86]

Smith, J. Modern Communications Circuits. McGraw-Hill, New York, NY,
1986.

[SRI87]

Srikanth, T.K., and S. Toueg. Optimal clock synchronization. J. ACM 34,
3 (July 1987), 626-645.

[STE88]

Steiner, J.G., C. Neuman, and J.I. Schiller. Kerberos: an authentication
service for open network systems. Proc. Winter USENIX Conference
(February 1988).

[SU81]

Su, Z. A specification of the Internet protocol (IP) timestamp option.

noToC RFC1305 - Page 63

DARPA Network Working Group Report RFC-781. SRI International, May 1981.

[TRI86]

Tripathi, S.K., and S.H. Chang. ETempo: a clock synchronization
algorithm for hierarchical LANs - implementation and measurements.
Systems Research Center Technical Report TR-86-48, University of
Maryland, 1986.

[VAN84]

Van Dierendonck, A.J., and W.C. Melton. Applications of time transfer
using NAVSTAR GPS. In: Global Positioning System, Papers Published in
Navigation, Vol. II, Institute of Navigation, Washington, DC, 1984.

[VAS78]

Vass, E.R. OMEGA navigation system: present status and plans 1977-1980.
Navigation 25, 1 (Spring 1978).

Appendix A. NTP Data Format - Version 3

The format of the NTP Message data area, which immediately follows the
UDP header, is shown in Figure 4<$&fig4>. Following is a description of
its fields.

Leap Indicator (LI): This is a two-bit code warning of an impending leap
second to be inserted/deleted in the last minute of the current day,
with bit 0 and bit 1, respectively, coded as follows:

@Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000),
ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT)

@Z_TBL_BODY = TABLE TEXT, TABLE TEXT

00, no warning

01, last minute has 61 seconds

10, last minute has 59 seconds)

11, alarm condition (clock not synchronized)

@Z_TBL_END =

Version Number (VN): This is a three-bit integer indicating the NTP
version number, currently three (3).

Mode: This is a three-bit integer indicating the mode, with values
defined as follows:

noToC RFC1305 - Page 64

@Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000),
ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT)
@Z_TBL_BODY = TABLE TEXT, TABLE TEXT

0, reserved

1, symmetric active

2, symmetric passive

3, client

4, server

5, broadcast

6, reserved for NTP control message (see Appendix B)

7, reserved for private use

@Z_TBL_END =

Stratum: This is a eight-bit integer indicating the stratum level of the
local clock, with values defined as follows:

@Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000),
ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT)

@Z_TBL_BODY = TABLE TEXT, TABLE TEXT

0, unspecified

1, primary reference (e.g.,, radio clock)

2-255, secondary reference (via NTP)

@Z_TBL_END =

The values that can appear in this field range from zero to NTP.INFIN
inclusive.

Poll Interval: This is an eight-bit signed integer indicating the
maximum interval between successive messages, in seconds to the nearest
power of two. The values that can appear in this field range from
NTP.MINPOLL to NTP.MAXPOLL inclusive.

Precision: This is an eight-bit signed integer indicating the precision
of the local clock, in seconds to the nearest power of two.

noToC RFC1305 - Page 65

Root Delay: This is a 32-bit signed fixed-point number indicating the
total roundtrip delay to the primary reference source, in seconds with
fraction point between bits 15 and 16. Note that this variable can take
on both positive and negative values, depending on clock precision and
skew.

Root Dispersion: This is a 32-bit signed fixed-point number indicating
the maximum error relative to the primary reference source, in seconds
with fraction point between bits 15 and 16. Only positive values greater
than zero are possible.

Reference Clock Identifier: This is a 32-bit code identifying the
particular reference clock. In the case of stratum 0 (unspecified) or
stratum 1 (primary reference), this is a four-octet, left-justified,
zero-padded ASCII string. While not enumerated as part of the NTP
specification, the following are suggested ASCII identifiers:
@Z_TBL_BEG = COLUMNS(3), DIMENSION(IN), COLWIDTHS(E2,E2,E5),
WIDTH(4.6700), ABOVE(.1670), BELOW(.0830), HGUTTER(.3330),
BOX(Z_SINGLE), KEEP(ON), ALIGN(CT), L1(R1C0..R1C3)

@Z_TBL_BODY = TABLE HEADER, TABLE HEADER, TABLE HEADER

Stratum, Code, Meaning

@Z_TBL_BODY = TABLE TEXT, TABLE TEXT, TABLE TEXT

0, DCN, DCN routing protocol

0, NIST, NIST public modem

0, TSP, TSP time protocol

0, DTS, Digital Time Service

1, ATOM, Atomic clock (calibrated)

1, VLF, VLF radio (OMEGA,, etc.)

1, callsign, Generic radio

1, LORC, LORAN-C radionavigation

1, GOES, GOES UHF environment satellite

@Z_TBL_BODY = TABLE HEADER, TABLE HEADER, TABLE HEADER

1, GPS, GPS UHF satellite positioning

@Z_TBL_END =

noToC RFC1305 - Page 66

In the case of stratum 2 and greater (secondary reference) this is the
four-octet Internet address of the primary reference host.

Reference Timestamp: This is the local time at which the local clock was
last set or corrected, in 64-bit timestamp format.

Originate Timestamp: This is the local time at which the request
departed the client host for the service host, in 64-bit timestamp
format.

Receive Timestamp: This is the local time at which the request arrived
at the service host, in 64-bit timestamp format.

Transmit Timestamp: This is the local time at which the reply departed
the service host for the client host, in 64-bit timestamp format.

Authenticator (optional): When the NTP authentication mechanism is
implemented, this contains the authenticator information defined in
Appendix C.

Appendix B. NTP Control Messages

In a comprehensive network-management environment, facilities are
presumed available to perform routine NTP control and monitoring
functions, such as setting the leap-indicator bits at the primary
servers, adjusting the various system parameters and monitoring regular
operations. Ordinarily, these functions can be implemented using a
network-management protocol such as SNMP and suitable extensions to the
MIB database. However, in those cases where such facilities are not
available, these functions can be implemented using special NTP control
messages described herein. These messages are intended for use only in
systems where no other management facilities are available or
appropriate, such as in dedicated-function bus peripherals. Support for
these messages is not required in order to conform to this
specification.

The NTP Control Message has the value 6 specified in the mode field of
the first octet of the NTP header and is formatted as shown below. The
format of the data field is specific to each command or response;
however, in most cases the format is designed to be constructed and
viewed by humans and so is coded in free-form ASCII. This facilitates
the specification and implementation of simple management tools in the
absence of fully evolved network-management facilities. As in ordinary
NTP messages, the authenticator field follows the data field. If the
authenticator is used the data field is zero-padded to a 32-bit
boundary, but the padding bits are not considered part of the data field
and are not included in the field count.

IP hosts are not required to reassemble datagrams larger than 576
octets; however, some commands or responses may involve more data than

noToC RFC1305 - Page 67

will fit into a single datagram. Accordingly, a simple reassembly
feature is included in which each octet of the message data is numbered
starting with zero. As each fragment is transmitted the number of its
first octet is inserted in the offset field and the number of octets is
inserted in the count field. The more-data (M) bit is set in all
fragments except the last.

Most control functions involve sending a command and receiving a
response, perhaps involving several fragments. The sender chooses a
distinct, nonzero sequence number and sets the status field and R and E
bits to zero. The responder interprets the opcode and additional
information in the data field, updates the status field, sets the R bit
to one and returns the three 32-bit words of the header along with
additional information in the data field. In case of invalid message
format or contents the responder inserts a code in the status field,
sets the R and E bits to one and, optionally, inserts a diagnostic
message in the data field.

Some commands read or write system variables and peer variables for an
association identified in the command. Others read or write variables
associated with a radio clock or other device directly connected to a
source of primary synchronization information. To identify which type of
variable and association a 16-bit association identifier is used. System
variables are indicated by the identifier zero. As each association is
mobilized a unique, nonzero identifier is created for it. These
identifiers are used in a cyclic fashion, so that the chance of using an
old identifier which matches a newly created association is remote. A
management entity can request a list of current identifiers and
subsequently use them to read and write variables for each association.
An attempt to use an expired identifier results in an exception
response, following which the list can be requested again.

Some exception events, such as when a peer becomes reachable or
unreachable, occur spontaneously and are not necessarily associated with
a command. An implementation may elect to save the event information for
later retrieval or to send an asynchronous response (called a trap) or
both. In case of a trap the IP address and port number is determined by
a previous command and the sequence field is set as described below.
Current status and summary information for the latest exception event is
returned in all normal responses. Bits in the status field indicate
whether an exception has occurred since the last response and whether
more than one exception has occurred.

Commands need not necessarily be sent by an NTP peer, so ordinary
access-control procedures may not apply; however, the optional
mask/match mechanism suggested elsewhere in this document provides the
capability to control access by mode number, so this could be used to
limit access for control messages (mode 6) to selected address ranges.

NTP Control Message Format

noToC RFC1305 - Page 68

The format of the NTP Control Message header, which immediately follows
the UDP header, is shown in Figure 5<$&fig5>. Following is a description
of its fields. Bit positions marked as zero are reserved and should
always be transmitted as zero.

Version Number (VN): This is a three-bit integer indicating the NTP
version number, currently three (3).

Mode: This is a three-bit integer indicating the mode. It must have the
value 6, indicating an NTP control message.

Response Bit (R): Set to zero for commands, one for responses.

Error Bit (E): Set to zero for normal response, one for error response.

More Bit (M): Set to zero for last fragment, one for all others.

Operation Code (Op): This is a five-bit integer specifying the command
function. Values currently defined include the following:

@Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000),
ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT)

@Z_TBL_BODY = TABLE TEXT, TABLE TEXT

0, reserved

1, read status command/response

2, read variables command/response

3, write variables command/response

4, read clock variables command/response

5, write clock variables command/response

6, set trap address/port command/response

7, trap response

8-31, reserved

@Z_TBL_END =

Sequence: This is a 16-bit integer indicating the sequence number of the
command or response.

Status: This is a 16-bit code indicating the current status of the

noToC RFC1305 - Page 69

system, peer or clock, with values coded as described in following
sections.

Association ID: This is a 16-bit integer identifying a valid
association.

Offset: This is a 16-bit integer indicating the offset, in octets, of
the first octet in the data area.

Count: This is a 16-bit integer indicating the length of the data field,
in octets.
Data: This contains the message data for the command or response. The
maximum number of data octets is 468.

Authenticator (optional): When the NTP authentication mechanism is
implemented, this contains the authenticator information defined in
Appendix C.

Status Words

Status words indicate the present status of the system, associations and
clock. They are designed to be interpreted by network-monitoring
programs and are in one of four 16-bit formats shown in Figure 6<$&fig6>
and described in this section. System and peer status words are
associated with responses for all commands except the read clock
variables, write clock variables and set trap address/port commands. The
association identifier zero specifies the system status word, while a
nonzero identifier specifies a particular peer association. The status
word returned in response to read clock variables and write clock
variables commands indicates the state of the clock hardware and
decoding software. A special error status word is used to report
malformed command fields or invalid values.

System Status Word

The system status word appears in the status field of the response to a
read status or read variables command with a zero association
identifier. The format of the system status word is as follows:

Leap Indicator (LI): This is a two-bit code warning of an impending leap
second to be inserted/deleted in the last minute of the current day,
with bit 0 and bit 1, respectively, coded as follows:

@Z_TBL_BEG = COLUMNS(2), DIMENSION(IN), COLWIDTHS(E1,E8), WIDTH(5.0000),
ABOVE(.0830), BELOW(.0830), HGUTTER(.0560), KEEP(OFF), ALIGN(CT)

@Z_TBL_BODY = TABLE TEXT, TABLE TEXT

00, no warning

(next page on part 3)