RFC 5905

Network Time Protocol Version 4: Protocol and Algorithms Specification

Pages: 110
Proposed Standard
→ Errata
Obsoletes: 1305 4330
Updated by: 7822 8573 9109

Part 3 of 5 – Pages 39 to 60

RFC5905 - Page 39 prevText

11.  System Process

   As each new sample (theta, delta, epsilon, jitter, t) is produced by
   the clock filter algorithm, all peer processes are scanned by the
   mitigation algorithms consisting of the selection, cluster, combine,
   and clock discipline algorithms in the system process.  The selection
   algorithm scans all associations and casts off the falsetickers,
   which have demonstrably incorrect time, leaving the truechimers as
   result.  In a series of rounds, the cluster algorithm discards the
   association statistically furthest from the centroid until a
   specified minimum number of survivors remain.  The combine algorithm
   produces the best and final statistics on a weighted average basis.
   The final offset is passed to the clock discipline algorithm to steer
   the system clock to the correct time.

   The cluster algorithm selects one of the survivors as the system
   peer.  The associated statistics (theta, delta, epsilon, jitter, t)
   are used to construct the system variables inherited by dependent
   servers and clients and made available to other applications running
   on the same machine.

RFC5905 - Page 40

11.1.  System Process Variables

   Figure 23 summarizes the common names, formula names, and a short
   description of each system variable.  Unless noted otherwise, all
   variables have assumed prefix s.

                +-----------+------------+------------------------+
                | Name      | Formula    | Description            |
                +-----------+------------+------------------------+
                | t         | t          | update time            |
                | p         | p          | system peer identifier |
                | leap      | leap       | leap indicator         |
                | stratum   | stratum    | stratum                |
                | precision | rho        | precision              |
                | offset    | THETA      | combined offset        |
                | jitter    | PSI        | combined jitter        |
                | rootdelay | DELTA      | root delay             |
                | rootdisp  | EPSILON    | root dispersion        |
                | v         | v          | survivor list          |
                | refid     | refid      | reference ID           |
                | reftime   | reftime    | reference time         |
                | NMIN      | 3          | minimum survivors      |
                | CMIN      | 1          | minimum candidates     |
                +-----------+------------+------------------------+

                    Figure 23: System Process Variables

   Except for the t, p, offset, and jitter variables and the NMIN and
   CMIN constants, the variables have the same format and interpretation
   as the peer variables of the same name.  The NMIN and CMIN parameters
   are used by the selection and cluster algorithms described in the
   next section.

   The t variable is the seconds counter at the time of the last update.
   An example is shown by the clock_update() routine in
   Appendix A.5.5.4.  The p variable is the system peer identifier
   determined by the cluster() routine in Section 11.2.2.  The precision
   variable has the same format as the packet variable of the same name.
   The precision is defined as the larger of the resolution and time to
   read the clock, in log2 units.  For instance, the precision of a
   mains-frequency clock incrementing at 60 Hz is 16 ms, even when the
   system clock hardware representation is to the nanosecond.

   The offset and jitter variables are determined by the combine
   algorithm in Section 11.2.3.  These values represent the best and
   final offset and jitter used to discipline the system clock.

RFC5905 - Page 41

   Initially, all variables are cleared to zero, then the leap is set to
   3 (unsynchronized) and stratum is set to MAXSTRAT (16).  Remember
   that MAXSTRAT is mapped to zero in the transmitted packet.

11.2.  System Process Operations

   Figure 24 summarizes the system process operations performed by the
   clock select routine.  The selection algorithm described in
   Section 11.2.1 produces a majority clique of presumed correct
   candidates (truechimers) based on agreement principles.  The cluster
   algorithm described in Section 11.2.2 discards outliers to produce
   the most accurate survivors.  The combine algorithm described in
   Section 11.2.3 provides the best and final offset for the clock
   discipline algorithm.  An example is described in Appendix A.5.5.6.
   If the selection algorithm cannot produce a majority clique, or if it
   cannot produce at least CMIN survivors, the system process exits
   without disciplining the system clock.  If successful, the cluster
   algorithm selects the statistically best candidate as the system peer
   and its variables are inherited as the system variables.

RFC5905 - Page 42

                          +-----------------+
                          | clock_select()  |
                          +-----------------+
   ................................|...........
   .                               V          .
   .      yes +---------+ +-----------------+ .
   .       +--| accept? | | scan candidates | .
   .       |  +---------+ |                 | .
   .       V        no |  |                 | .
   .  +---------+      |  |                 | .
   .  | add peer|      |  |                 | .
   .  +----------      |  |                 | .
   .       |           V  |                 | .
   .       +---------->-->|                 | .
   .                      |                 | .
   . Selection Algorithm  +-----------------+ .
   .................................|..........
                                    V
                       no +-------------------+
            +-------------|     survivors?    |
            |             +-------------------+
            |                       | yes
            |                       V
            |             +-------------------+
            |             | Cluster Algorithm |
            |             +-------------------+
            |                       |
            |                       V
            V         yes +-------------------+
            |<------------|     n < CMIN?     |
            |             +-------------------+
            V                       |
     +-----------------+            V no
     |   s.p = NULL    |  +-------------------+
     +-----------------+  |   s.p = v_0.p     |
            |             +-------------------+
            V                       |
     +-----------------+            V
     | return (UNSYNC) |  +-------------------+
     +-----------------+  |   return (SYNC)   |
                          +-------------------+

                      Figure 24: Clock Select Routine

RFC5905 - Page 43

11.2.1.  Selection Algorithm

   Note that the selection and cluster algorithms are described
   separately, but combined in the code skeleton.  The selection
   algorithm operates to find an intersection interval containing a
   majority clique of truechimers using Byzantine agreement principles
   originally proposed by Marzullo [ref6], but modified to improve
   accuracy.  An overview of the algorithm is given below and described
   in the first half of the clock_select() routine in Appendix A.5.5.1.

   First, those servers that are unusable according to the rules of the
   protocol are detected and discarded as shown by the accept() routine
   in Appendix A.5.5.3.  Next, a set of tuples (p, type, edge) is
   generated for the remaining candidates.  Here, p is the association
   identifier and type identifies the upper (+1), middle (0), and lower
   (-1) endpoints of a correctness interval centered on theta for that
   candidate.  This results in three tuples, lowpoint (p, -1, theta -
   lambda), midpoint (p, 0, theta), and highpoint (p, +1, theta +
   lambda), where lambda is the root synchronization distance.  An
   example of this calculation is shown by the rootdist() routine in
   Appendix A.5.1.1.  The steps of the algorithm are:

   1.  For each of m associations, place three tuples as defined above
   on the candidate list.

   2.  Sort the tuples on the list by the edge component.  Order the
   lowpoint, midpoint, and highpoint of these intervals from lowest to
   highest.  Set the number of falsetickers f = 0.

   3.  Set the number of midpoints d = 0.  Set c = 0.  Scan from lowest
   endpoint to highest.  Add one to c for every lowpoint, subtract one
   for every highpoint, add one to d for every midpoint.  If c >= m - f,
   stop; set l = current lowpoint.

   4.  Set c = 0.  Scan from highest endpoint to lowest.  Add one to c
   for every highpoint, subtract one for every lowpoint, add one to d
   for every midpoint.  If c >= m - f, stop; set u = current highpoint.

   5.  Is d = f and l < u?  If yes, then follow step 5A; else, follow
   step 5B.

   5A. Success: the intersection interval is [l, u].

   5B. Add one to f.  Is f < (m / 2)?  If yes, then go to step 3 again.
   If no, then go to step 6.

   6.  Failure; a majority clique could not be found.  There are no
   suitable candidates to discipline the system clock.

RFC5905 - Page 44

   The algorithm is described in detail in Appendix A.5.5.1.  Note that
   it starts with the assumption that there are no falsetickers (f = 0)
   and attempts to find a non-empty intersection interval containing the
   midpoints of all correct servers, i.e., truechimers.  If a non-empty
   interval cannot be found, it increases the number of assumed
   falsetickers by one and tries again.  If a non-empty interval is
   found and the number of falsetickers is less than the number of
   truechimers, a majority clique has been found and the midpoint of
   each truechimer (theta) represents the candidates available to the
   cluster algorithm.

   If a majority clique is not found, or if the number of truechimers is
   less than CMIN, there are insufficient candidates to discipline the
   system clock.  CMIN defines the minimum number of servers consistent
   with the correctness requirements.  Suspicious operators would set
   CMIN to ensure multiple redundant servers are available for the
   algorithms to mitigate properly.  However, for historic reasons the
   default value for CMIN is one.

11.2.2.  Cluster Algorithm

   The candidates of the majority clique are placed on the survivor list
   v in the form of tuples (p, theta_p, psi_p, lambda_p), where p is an
   association identifier, theta_p, psi_p, and stratum_p the current
   offset, jitter and stratum of association p, respectively, and
   lambda_p is a merit factor equal to stratum_p * MAXDIST + lambda,
   where lambda is the root synchronization distance for association p.
   The list is processed by the cluster algorithm below.  An example is
   shown by the second half of the clock_select() algorithm in
   Appendix A.5.5.1.

   1.  Let (p, theta_p, psi_p, lambda_p) represent a survivor candidate.

   2.  Sort the candidates by increasing lambda_p.  Let n be the number
   of candidates and NMIN the minimum required number of survivors.

   3.  For each candidate, compute the selection jitter psi_s:

             +-----                       -----+^1/2
             |        n-1                      |
             |        ---                      |
             |   1    \                     2  |
     psi_s = | ---- * /  (theta_s - theta_j)   |
             |  n-1   ---                      |
             |        j=1                      |
             +-----                       -----+

   4.  Select psi_max as the candidate with maximum psi_s.

RFC5905 - Page 45

   5.  Select psi_min as the candidate with minimum psi_p.

   6.  Is psi_max < psi_min or n <= NMIN?  If yes, follow step 6A;
   otherwise, follow step 6B.

   6A. Done.  The remaining candidates on the survivor list are ranked
   in the order of preference.  The first entry on the list represents
   the system peer; its variables are used later to update the system
   variables.

   6B. Delete the outlier candidate with psi_max; reduce n by one and go
   back to step 3.

   The algorithm operates in a series of rounds where each round
   discards the statistical outlier with maximum selection jitter psi_s.
   However, if psi_s is less than the minimum peer jitter psi_p, no
   improvement is possible by discarding outliers.  This and the minimum
   number of survivors represent the terminating conditions of the
   algorithm.  Upon termination, the final value of psi_max is saved as
   the system selection jitter PSI_s for use later.

11.2.3.  Combine Algorithm

   The clock combine route processes the remaining survivors to produce
   the best and final data for the clock discipline algorithm.  The
   routine processes peer offset and jitter statistics to produce the
   combined system offset THETA and system peer jitter PSI_p, where each
   server statistic is weighted by the reciprocal of the root
   synchronization distance and the result normalized.  An example is
   shown by the clock_combine() routine in Appendix A.5.5.5

   The combined THETA is passed to the clock update routine.  The first
   candidate on the survivor list is nominated as the system peer with
   identifier p.  The system peer jitter PSI_p is a component of the
   system jitter PSI.  It is used along with the selection jitter PSI_s
   to produce the system jitter:

   PSI = [(PSI_s)^2 + (PSI_p)^2]^1/2

   Each time an update is received from the system peer, the clock
   update routine is called.  By rule, an update is discarded if its
   time of arrival p.t is not strictly later than the last update used
   s.t.  The labels IGNOR, PANIC, ADJ, and STEP refer to return codes
   from the local clock routine described in the next section.

   IGNORE means the update has been ignored as an outlier.  PANIC means
   the offset is greater than the panic threshold PANICT (1000 s) and
   SHOULD cause the program to exit with a diagnostic message to the

RFC5905 - Page 46

   system log.  STEP means the offset is less than the panic threshold,
   but greater than the step threshold STEPT (125 ms).  In this case,
   the clock is stepped to the correct offset, but since this means all
   peer data have been invalidated, all associations MUST be reset and
   the client begins as at initial start.

   ADJ means the offset is less than the step threshold and thus a valid
   update.  In this case, the system variables are updated from the peer
   variables as shown in Figure 25.

                  +-------------------------------------------+
                  | System Variable <-- System Peer Variable  |        |
                  +-------------------------------------------+
                  | s.leap      <-- p.leap                    |
                  | s.stratum   <-- p.stratum + 1             |
                  | s.offset    <-- THETA                     |
                  | s.jitter    <-- PSI                       |
                  | s.rootdelay <-- p.delta_r + delta         |
                  | s.rootdisp  <-- p.epsilon_r + p.epsilon + |
                  |                 p.psi + PHI * (s.t - p.t) |
                  |                 + |THETA|                 |
                  | s.refid     <-- p.refid                   |
                  | s.reftime   <-- p.reftime                 |
                  | s.t         <-- p.t                       |
                  +-------------------------------------------+

                    Figure 25: System Variables Update

   There is an important detail not shown.  The dispersion increment
   (p.epsilon + p.psi + PHI * (s.t - p.t) + |THETA|) is bounded from
   below by MINDISP.  In subnets with very fast processors and networks
   and very small delay and dispersion this forces a monotone-definite
   increase in s.rootdisp (EPSILON), which avoids loops between peers
   operating at the same stratum.

   The system variables are available to dependent application programs
   as nominal performance statistics.  The system offset THETA is the
   clock offset relative to the available synchronization sources.  The
   system jitter PSI is an estimate of the error in determining this
   value, elsewhere called the expected error.  The root delay DELTA is
   the total round-trip delay relative to the primary server.  The root
   dispersion EPSILON is the dispersion accumulated over the network
   from the primary server.  Finally, the root synchronization distance
   is defined as:

RFC5905 - Page 47

   LAMBDA = EPSILON + DELTA / 2,

   which represents the maximum error due all causes and is designated
   the root synchronization distance.

   An example of the clock update routine is provided in
   Appendix A.5.5.4.

11.3.  Clock Discipline Algorithm

   The NTPv4 clock discipline algorithm, shortened to discipline in the
   following, functions as a combination of two quite philosophically
   different feedback control systems.  In a phase-locked loop (PLL)
   design, periodic phase updates at update intervals mu seconds are
   used directly to minimize the time error and indirectly the frequency
   error.  In a frequency-locked loop (FLL) design, periodic frequency
   updates at intervals mu are used directly to minimize the frequency
   error and indirectly the time error.  As shown in [ref7], a PLL
   usually works better when network jitter dominates, while an FLL
   works better when oscillator wander dominates.  This section contains
   an outline of how the NTPv4 design works.  An in-depth discussion of
   the design principles is provided in [ref7], which also includes a
   performance analysis.

   The discipline is implemented as the feedback control system shown in
   Figure 26.  The variable theta_r represents the combine algorithm
   offset (reference phase) and theta_c the VFO offset (control phase).
   Each update produces a signal V_d representing the instantaneous
   phase difference theta_r - theta_c.  The clock filter for each server
   functions as a tapped delay line, with the output taken at the tap
   selected by the clock filter algorithm.  The selection, cluster, and
   combine algorithms combine the data from multiple filters to produce
   the signal V_s.  The loop filter, with impulse response F(t),
   produces the signal V_c, which controls the VFO frequency omega_c and
   thus the integral of the phase theta_c which closes the loop.  The
   V_c signal is generated by the clock-adjust process in Section 12.
   The detailed equations that implement these functions are best
   presented in the routines of Appendices A.5.5.6 and A.5.6.1.

RFC5905 - Page 48

                theta_r + +---------\        +----------------+
            NTP --------->|  Phase   \  V_d  |                | V_s
                theta_c - | Detector  ------>|  Clock Filter  |----+
                +-------->|          /       |                |    |
                |         +---------/        +----------------+    |
                |                                                  |
              -----                                                |
             /     \                                               |
             | VFO |                                               |
             \     /                                               |
              -----    .......................................     |
                ^      .            Loop Filter              .     |
                |      . +---------+   x  +-------------+    .     |
                | V_c  . |         |<-----|             |    .     |
                +------.-|  Clock  |   y  | Phase/Freq  |<---------+
                       . | Adjust  |<-----| Prediction  |    .
                       . |         |      |             |    .
                       . +---------+      +-------------+    .
                       .......................................

                 Figure 26: Clock Discipline Feedback Loop

   Ordinarily, the pseudo-linear feedback loop described above operates
   to discipline the system clock.  However, there are cases where a
   non-linear algorithm offers considerable improvement.  One case is
   when the discipline starts without knowledge of the intrinsic clock
   frequency.  The pseudo-linear loop takes several hours to develop an
   accurate measurement and during most of that time the poll interval
   cannot be increased.  The non-linear loop described below does this
   in 15 minutes.  Another case is when occasional bursts of large
   jitter are present due to congested network links.  The state machine
   described below resists error bursts lasting less than 15 minutes.

   Figure 27 contains a summary of the variables and parameters
   including the variable (lowercase) or parameter (uppercase) name,
   formula name, and short description.  Unless noted otherwise, all
   variables have assumed prefix c.  The variables t, tc, state, hyster,
   and count are integers; the remaining variables are floating doubles.
   The function of each will be explained in the algorithm descriptions
   below.

RFC5905 - Page 49

                +--------+------------+--------------------------+
                | Name   | Formula    | Description              |
                +--------+------------+--------------------------+
                | t      | timer      | seconds counter          |
                | offset | theta      | combined offset          |
                | resid  | theta_r    | residual offset          |
                | freq   | phi        | clock frequency          |
                | jitter | psi        | clock offset jitter      |
                | wander | omega      | clock frequency wander   |
                | tc     | tau        | time constant (log2)     |
                | state  | state      | state                    |
                | adj    | adj        | frequency adjustment     |
                | hyster | hyster     | hysteresis counter       |
                | STEPT  | 125        | step threshold (.125 s)  |
                | WATCH  | 900        | stepout thresh(s)        |
                | PANICT | 1000       | panic threshold (1000 s) |
                | LIMIT  | 30         | hysteresis limit         |
                | PGATE  | 4          | hysteresis gate          |
                | TC     | 16         | time constant scale      |
                | AVG    | 8          | averaging constant       |
                +--------+------------+--------------------------+

           Figure 27: Clock Discipline Variables and Parameters

   The process terminates immediately if the offset is greater than the
   panic threshold PANICT (1000 s).  The state transition function is
   described by the rstclock() function in Appendix A.5.5.7.  Figure 28
   shows the state transition function used by this routine.  It has
   four columns showing, respectively, the state name, predicate and
   action if the offset theta is less than the step threshold, the
   predicate and actions otherwise, and finally some comments.

RFC5905 - Page 50

      +-------+---------------------+-------------------+--------------+
      | State | theta < STEP        | theta > STEP      | Comments     |
      +-------+---------------------+-------------------+--------------+
      | NSET  | ->FREQ              | ->FREQ            | no frequency |
      |       | adjust time         | step time         | file         |
      +-------+---------------------+-------------------+--------------+
      | FSET  | ->SYNC              | ->SYNC            | frequency    |
      |       | adjust time         | step time         | file         |
      +-------+---------------------+-------------------+--------------+
      | SPIK  | ->SYNC              | if < 900 s ->SPIK | outlier      |
      |       | adjust freq         | else ->SYNC       | detected     |
      |       | adjust time         | step freq         |              |
      |       |                     | step time         |              |
      +-------+---------------------+-------------------+--------------+
      | FREQ  | if < 900 s ->FREQ   | if < 900 s ->FREQ | initial      |
      |       | else ->SYNC         | else ->SYNC       | frequency    |
      |       | step freq           | step freq         |              |
      |       | adjust time         | adjust time       |              |
      +-------+---------------------+-------------------+--------------+
      | SYNC  | ->SYNC              | if < 900 s ->SPIK | normal       |
      |       | adjust freq         | else ->SYNC       | operation    |
      |       | adjust time         | step freq         |              |
      |       |                     | step time         |              |
      +-------+---------------------+-------------------+--------------+

                   Figure 28: State Transition Function

   In the table entries, the next state is identified by the arrow ->
   with the actions listed below.  Actions such as adjust time and
   adjust frequency are implemented by the PLL/FLL feedback loop in the
   local_clock() routine.  A step clock action is implemented by setting
   the clock directly, but this is done only after the stepout threshold
   WATCH (900 s) when the offset is more than the step threshold STEPT
   (.125 s).  This resists clock steps under conditions of extreme
   network congestion.

   The jitter (psi) and wander (omega) statistics are computed using an
   exponential average with weight factor AVG.  The time constant
   exponent (tau) is determined by comparing psi with the magnitude of
   the current offset theta.  If the offset is greater than PGATE (4)
   times the clock jitter, the hysteresis counter hyster is reduced by
   two; otherwise, it is increased by one.  If hyster increases to the
   upper limit LIMIT (30), tau is increased by one; if it decreases to
   the lower limit -LIMIT (-30), tau is decreased by one.  Normally, tau
   hovers near MAXPOLL, but quickly decreases if a temperature spike
   causes a frequency surge.

RFC5905 - Page 51

12.  Clock-Adjust Process

   The actual clock-adjust process runs at one-second intervals to add
   the frequency correction and a fixed percentage of the residual
   offset theta_r.  The theta_r is, in effect, the exponential decay of
   the theta value produced by the loop filter at each update.  The TC
   parameter scales the time constant to match the poll interval for
   convenience.  Note that the dispersion EPSILON increases by PHI at
   each second.

   The clock-adjust process includes a timer interrupt facility driving
   the seconds counter c.t.  It begins at zero when the service starts
   and increments once each second.  At each interrupt, the
   clock_adjust() routine is called to incorporate the clock discipline
   time and frequency adjustments, then the associations are scanned to
   determine if the seconds counter equals or exceeds the p.next state
   variable defined in the next section.  If so, the poll process is
   called to send a packet and compute the next p.next value.

   An example of the clock-adjust process is shown by the clock_adjust()
   routine in Appendix A.5.6.1.

13.  Poll Process

   Each association supports a poll process that runs at regular
   intervals to construct and send packets in symmetric, client, and
   broadcast server associations.  It runs continuously, whether or not
   servers are reachable in order to manage the clock filter and reach
   register.

13.1.  Poll Process Variables

   Figure 29 summarizes the common names, formula names, and a short
   description of the poll process variables (lowercase) and parameters
   (uppercase).  Unless noted otherwise, all variables have assumed
   prefix p.

RFC5905 - Page 52

                   +---------+---------+--------------------+
                   | Name    | Formula | Description        |
                   +---------+---------+--------------------+
                   | hpoll   | hpoll   | host poll exponent |
                   | last    | last    | last poll time     |
                   | next    | next    | next poll time     |
                   | reach   | reach   | reach register     |
                   | unreach | unreach | unreach counter    |
                   | UNREACH | 24      | unreach limit      |
                   | BCOUNT  | 8       | burst count        |
                   | BURST   | flag    | burst enable       |
                   | IBURST  | flag    | iburst enable      |
                   +---------+---------+--------------------+

             Figure 29: Poll Process Variables and Parameters

   The poll process variables are allocated in the association data
   structure along with the peer process variables.  The following is a
   detailed description of the variables.  The parameters will be called
   out in the following text.

   hpoll: signed integer representing the poll exponent, in log2 seconds

   last: integer representing the seconds counter when the most recent
   packet was sent

   next: integer representing the seconds counter when the next packet
   is to be sent

   reach: 8-bit integer shift register shared by the peer and poll
   processes

   unreach: integer representing the number of seconds the server has
   been unreachable

13.2.  Poll Process Operations

   As described previously, once each second the clock-adjust process is
   called.  This routine calls the poll routine for each association in
   turn.  If the time for the next poll message is greater than the
   seconds counter, the routine returns immediately.  Symmetric (modes
   1, 2), client (mode 3), and broadcast server (mode 5) associations
   routinely send packets.  A broadcast client (mode 6) association runs
   the routine to update the reach and unreach variables, but does not
   send packets.  The poll process calls the transmit process to send a
   packet.  If in a burst (burst > 0), nothing further is done except
   call the poll update routine to set the next poll interval.

RFC5905 - Page 53

   If not in a burst, the reach variable is shifted left by one bit,
   with zero replacing the rightmost bit.  If the server has not been
   heard for the last three poll intervals, the clock filter routine is
   called to increase the dispersion.  An example is shown in
   Appendix A.5.7.3.

   If the BURST flag is lit and the server is reachable and a valid
   source of synchronization is available, the client sends a burst of
   BCOUNT (8) packets at each poll interval.  The interval between
   packets in the burst is two seconds.  This is useful to accurately
   measure jitter with long poll intervals.  If the IBURST flag is lit
   and this is the first packet sent when the server has been
   unreachable, the client sends a burst.  This is useful to quickly
   reduce the synchronization distance below the distance threshold and
   synchronize the clock.

   If the P_MANY flag is lit in the p.flags word of the association,
   this is a manycast client association.  Manycast client associations
   send client mode packets to designated multicast group addresses at
   MINPOLL intervals.  The association starts out with a TTL of 1.  If
   by the time of the next poll there are fewer than MINCLOCK servers
   have been mobilized, the ttl is increased by one.  If the ttl reaches
   the limit TTLMAX, without finding MINCLOCK servers, the poll interval
   increases until reaching BEACON, when it starts over from the
   beginning.

   The poll() routine includes a feature that backs off the poll
   interval if the server becomes unreachable.  If reach is nonzero, the
   server is reachable and unreach is set to zero; otherwise, unreach is
   incremented by one for each poll to the maximum UNREACH.  Thereafter
   for each poll hpoll is increased by one, which doubles the poll
   interval up to the maximum MAXPOLL determined by the poll_update()
   routine.  When the server again becomes reachable, unreach is set to
   zero, hpoll is reset to the tc system variable, and operation resumes
   normally.

   A packet is sent by the transmit process.  Some header values are
   copied from the peer variables left by a previous packet and others
   from the system variables.  Figure 30 shows which values are copied
   to each header field.  In those implementations, using floating
   double data types for root delay and root dispersion, these must be
   converted to NTP short format.  All other fields are either copied
   intact from peer and system variables or struck as a timestamp from
   the system clock.

RFC5905 - Page 54

                   +-----------------------------------+
                   | Packet Variable <--   Variable    |
                   +-----------------------------------+
                   | x.leap        <--     s.leap      |
                   | x.version     <--     s.version   |
                   | x.mode        <--     s.mode      |
                   | x.stratum     <--     s.stratum   |
                   | x.poll        <--     s.poll      |
                   | x.precision   <--     s.precision |
                   | x.rootdelay   <--     s.rootdelay |
                   | x.rootdisp    <--     s.rootdisp  |
                   | x.refid       <--     s.refid     |
                   | x.reftime     <--     s.reftime   |
                   | x.org         <--     p.xmt       |
                   | x.rec         <--     p.dst       |
                   | x.xmt         <--     clock       |
                   | x.keyid       <--     p.keyid     |
                   | x.digest      <--     md5 digest  |
                   +-----------------------------------+

                   Figure 30: xmit_packet Packet Header

   The poll update routine is called when a valid packet is received and
   immediately after a poll message has been sent.  If in a burst, the
   poll interval is fixed at 2 s; otherwise, the host poll exponent
   hpoll is set to the minimum of ppoll from the last packet received
   and hpoll from the poll routine, but not less than MINPOLL or greater
   than MAXPOLL.  Thus, the clock discipline can be oversampled but not
   undersampled.  This is necessary to preserve subnet dynamic behavior
   and protect against protocol errors.

   The poll exponent is converted to an interval, which, when added to
   the last poll time variable, determines the value of the next poll
   time variable.  Finally, the last poll time variable is set to the
   current seconds counter.

14.  Simple Network Time Protocol (SNTP)

   Primary servers and clients complying with a subset of NTP, called
   the Simple Network Time Protocol (SNTPv4) [RFC4330], do not need to
   implement the mitigation algorithms described in Section 9 and
   following sections.  SNTP is intended for primary servers equipped
   with a single reference clock, as well as for clients with a single
   upstream server and no dependent clients.  The fully developed NTPv4
   implementation is intended for secondary servers with multiple
   upstream servers and multiple downstream servers or clients.  Other
   than these considerations, NTP and SNTP servers and clients are
   completely interoperable and can be intermixed in NTP subnets.

RFC5905 - Page 55

   An SNTP primary server implementing the on-wire protocol described in
   Section 8 has no upstream servers except a single reference clock.
   In principle, it is indistinguishable from an NTP primary server that
   has the mitigation algorithms and therefore capable of mitigating
   between multiple reference clocks.

   Upon receiving a client request, an SNTP primary server constructs
   and sends the reply packet as described in Figure 31.  Note that the
   dispersion field in the packet header must be updated as described in
   Section 5.

                   +-----------------------------------+
                   | Packet Variable <--   Variable    |
                   +-----------------------------------+
                   | x.leap        <--     s.leap      |
                   | x.version     <--     r.version   |
                   | x.mode        <--     4           |
                   | x.stratum     <--     s.stratum   |
                   | x.poll        <--     r.poll      |
                   | x.precision   <--     s.precision |
                   | x.rootdelay   <--     s.rootdelay |
                   | x.rootdisp    <--     s.rootdisp  |
                   | x.refid       <--     s.refid     |
                   | x.reftime     <--     s.reftime   |
                   | x.org         <--     r.xmt       |
                   | x.rec         <--     r.dst       |
                   | x.xmt         <--     clock       |
                   | x.keyid       <--     r.keyid     |
                   | x.digest      <--     md5 digest  |
                   +-----------------------------------+

                    Figure 31: fast_xmit Packet Header

   An SNTP client implementing the on-wire protocol has a single server
   and no dependent clients.  It can operate with any subset of the NTP
   on-wire protocol, the simplest approach using only the transmit
   timestamp of the server packet and ignoring all other fields.
   However, the additional complexity to implement the full on-wire
   protocol is minimal so that a full implementation is encouraged.

15.  Security Considerations

   NTP security requirements are even more stringent than most other
   distributed services.  First, the operation of the authentication
   mechanism and the time synchronization mechanism are inextricably
   intertwined.  Reliable time synchronization requires cryptographic
   keys that are valid only over a designated time interval; but, time
   intervals can be enforced only when participating servers and clients

RFC5905 - Page 56

   are reliably synchronized to UTC.  In addition, the NTP subnet is
   hierarchical by nature, so time and trust flow from the primary
   servers at the root through secondary servers to the clients at the
   leaves.

   An NTP client can claim to have authentic time to dependent
   applications only if all servers on the path to the primary servers
   are authenticated.  In NTP each server authenticates the next lower
   stratum servers and authenticates by induction the lowest stratum
   (primary) servers.  It is important to note that authentication in
   the context of NTP does not necessarily imply the time is correct.
   An NTP client mobilizes a number of concurrent associations with
   different servers and uses a crafted agreement algorithm to pluck
   truechimers from the population possibly including falsetickers.

   The NTP specification assumes that the goal of the intruder is to
   inject false time values, disrupt the protocol, or clog the network,
   servers, or clients with spurious packets that exhaust resources and
   deny service to legitimate applications.  There are a number of
   defense mechanisms already built in the NTP architecture, protocol,
   and algorithms.  The on-wire timestamp exchange scheme is inherently
   resistant to spoofing, packet-loss, and replay attacks.  The
   engineered clock filter, selection and clustering algorithms are
   designed to defend against evil cliques of Byzantine traitors.  While
   not necessarily designed to defeat determined intruders, these
   algorithms and accompanying sanity checks have functioned well over
   the years to deflect improperly operating but presumably friendly
   scenarios.  However, these mechanisms do not securely identify and
   authenticate servers to clients.  Without specific further
   protection, an intruder can inject any or all of the following
   attacks:

   1.  An intruder can intercept and archive packets forever, as well as
       all the public values ever generated and transmitted over the
       net.

   2.  An intruder can generate packets faster than the server, network
       or client can process them, especially if they require expensive
       cryptographic computations.

   3.  In a wiretap attack, the intruder can intercept, modify, and
       replay a packet.  However, it cannot permanently prevent onward
       transmission of the original packet; that is, it cannot break the
       wire, only tell lies and congest it.  Generally, the modified
       packet cannot arrive at the victim before the original packet,
       nor does it have the server private keys or identity parameters.

RFC5905 - Page 57

   4.  In a middleman or masquerade attack, the intruder is positioned
       between the server and client, so it can intercept, modify and
       replay a packet and prevent onward transmission of the original
       packet.  However, the middleman does not have the server private
       keys.

   The NTP security model assumes the following possible limitations:

   1.  The running times for public key algorithms are relatively long
       and highly variable.  In general, the performance of the time
       synchronization function is badly degraded if these algorithms
       must be used for every NTP packet.

   2.  In some modes of operation, it is not feasible for a server to
       retain state variables for every client.  It is however feasible
       to regenerated them for a client upon arrival of a packet from
       that client.

   3.  The lifetime of cryptographic values must be enforced, which
       requires a reliable system clock.  However, the sources that
       synchronize the system clock must be trusted.  This circular
       interdependence of the timekeeping and authentication functions
       requires special handling.

   4.  Client security functions must involve only public values
       transmitted over the net.  Private values must never be disclosed
       beyond the machine on which they were created, except in the case
       of a special trusted agent (TA) assigned for this purpose.

   Unlike the Secure Shell (SSH) security model, where the client must
   be securely authenticated to the server, in NTP the server must be
   securely authenticated to the client.  In SSH, each different
   interface address can be bound to a different name, as returned by a
   reverse-DNS query.  In this design, separate public/private key pairs
   may be required for each interface address with a distinct name.  A
   perceived advantage of this design is that the security compartment
   can be different for each interface.  This allows a firewall, for
   instance, to require some interfaces to authenticate the client and
   others not.

   In the case of NTP as specified herein, NTP broadcast clients are
   vulnerable to disruption by misbehaving or hostile SNTP or NTP
   broadcast servers elsewhere in the Internet.  Such disruption can be
   minimized by several approaches.  Filtering can be employed to limit
   the access of NTP clients to known or trusted NTP broadcast servers.
   Such filtering will prevent malicious traffic from reaching the NTP
   clients.  Cryptographic authentication at the client will only allow

RFC5905 - Page 58

   timing information from properly signed NTP messages to be utilized
   in synchronizing its clock.  Higher levels of authentication may be
   gained by the use of the Autokey mechanism [RFC5906].

   Section 8 describes a potential security concern with the replay of
   client requests.  Following the recommendations in that section
   provides protection against such attacks.

   It should be noted that this specification is describing an existing
   implementation.  While the security shortfalls of the MD5 algorithm
   are well-known, its use in the NTP specification is consistent with
   widescale deployment in the Internet community.

16.  IANA Considerations

   UDP/TCP Port 123 was previously assigned by IANA for this protocol.
   The IANA has assigned the IPv4 multicast group address 224.0.1.1 and
   the IPv6 multicast address ending :101 for NTP.  This document
   introduces NTP extension fields allowing for the development of
   future extensions to the protocol, where a particular extension is to
   be identified by the Field Type sub-field within the extension field.
   IANA has established and will maintain a registry for Extension Field
   Types associated with this protocol, populating this registry with no
   initial entries.  As future needs arise, new Extension Field Types
   may be defined.  Following the policies outlined in [RFC5226], new
   values are to be defined by IETF Review.

   The IANA has created a new registry for NTP Reference Identifier
   codes.  This includes the current codes defined in Section 7.3, and
   may be extended on a First-Come-First-Serve (FCFS) basis.  The format
   of the registry is:

     +------+----------------------------------------------------------+
     | ID   | Clock Source                                             |
     +------+----------------------------------------------------------+
     | GOES | Geosynchronous Orbit Environment Satellite               |
     | GPS  | Global Position System                                   |
     | ...  | ...                                                      |
     +------+----------------------------------------------------------+

                   Figure 32: Reference Identifier Codes

   The IANA has created a new registry for NTP Kiss-o'-Death codes.
   This includes the current codes defined in Section 7.4, and may be
   extended on a FCFS basis.  The format of the registry is:

RFC5905 - Page 59

   +------+------------------------------------------------------------+
   | Code |                           Meaning                          |
   +------+------------------------------------------------------------+
   | ACST | The association belongs to a unicast server.               |
   | AUTH | Server authentication failed.                              |
   | ...  | ...                                                        |
   +------+------------------------------------------------------------+

                           Figure 33: Kiss Codes

   For both Reference Identifiers and Kiss-o'-Death codes, IANA is
   requested to never assign a code beginning with the character "X", as
   this is reserved for experimentation and development.

17.  Acknowledgements

   The editors would like to thank Karen O'Donoghue, Brian Haberman,
   Greg Dowd, Mark Elliot, Harlan Stenn, Yaakov Stein, Stewart Bryant,
   and Danny Mayer for technical reviews and specific text contributions
   to this document.

18.  References

18.1.  Normative References

   [RFC0768]       Postel, J., "User Datagram Protocol", STD 6, RFC 768,
                   August 1980.

   [RFC0791]       Postel, J., "Internet Protocol", STD 5, RFC 791,
                   September 1981.

   [RFC0793]       Postel, J., "Transmission Control Protocol", STD 7,
                   RFC 793, September 1981.

   [RFC1321]       Rivest, R., "The MD5 Message-Digest Algorithm",
                   RFC 1321, April 1992.

   [RFC2119]       Bradner, S., "Key words for use in RFCs to Indicate
                   Requirement Levels", BCP 14, RFC 2119, March 1997.

18.2.  Informative References

   [CGPM]          Bureau International des Poids et Mesures, "Comptes
                   Rendus de la 15e CGPM", 1976.

   [ITU-R_TF.460]  International Telecommunications Union, "ITU-R TF.460
                   Standard-frequency and time-signal emissions",
                   February 2002.

RFC5905 - Page 60

   [RFC1305]       Mills, D., "Network Time Protocol (Version 3)
                   Specification, Implementation and Analysis",
                   RFC 1305, March 1992.

   [RFC1345]       Simonsen, K., "Character Mnemonics and Character
                   Sets", RFC 1345, June 1992.

   [RFC4330]       Mills, D., "Simple Network Time Protocol (SNTP)
                   Version 4 for IPv4, IPv6 and OSI", RFC 4330,
                   January 2006.

   [RFC5226]       Narten, T. and H. Alvestrand, "Guidelines for Writing
                   an IANA Considerations Section in RFCs", BCP 26,
                   RFC 5226, May 2008.

   [RFC5906]       Haberman, B., Ed. and D. Mills, "Network Time
                   Protocol Version 4: Autokey Specification", RFC 5906,
                   June 2010.

   [ref6]          Marzullo and S. Owicki, "Maintaining the time in a
                   distributed system", ACM Operating Systems Review 19,
                   July 1985.

   [ref7]          Mills, D.L., "Computer Network Time Synchronization -
                   the Network Time Protocol", CRC Press, 304 pp, 2006.

   [ref9]          Mills, D.L., Electrical and Computer Engineering
                   Technical Report 06-6-1, NDSS, June 2006, "Network
                   Time Protocol Version 4 Reference and Implementation
                   Guide", 2006.

(next page on part 4)