Listen carefully to what I say; it is very complicated.
Last update: 30-Jul-2011 19:45 UTC
This page summarizes the criteria for choosing from among a number of potential sources suitable contributors to the clock discipline algorithm. The criteria are very meticulous, since they have to handle many different scenarios that may be optimized for peculiar circumstances, including some scenarios designed to support planetary and deep space missions.
Recall the suite of NTP data acquisition and grooming algorithms as these algorithms proceed in five phases. Phase one discovers the available sources and mobilizes an association for each candidate found. These candidates can result from explicit configuration, broadcast discovery or the pool and manycast autonomous configuration schemes.
Phase two grooms the selectable candidates excluding those sources showing one or more of the following errors
Phase three uses the select algorithm to determine the truechimers from among the candidates, leaving behind the falsetickers. A server or peer configured with the true option is declared a truechimer independent of this algorithm. Phase four uses the cluster algorithm to cast off statistical outliers from the truechimers until a set of survivors not less than minclock remain. The minclock has default 3, but can be changed with the minclock option of the tos command.
Phase five uses a set of algorithms and mitigation rules to rank the survivors to produce combined statistics used to discipline the clock, as well as to select from among the survivors a system peer from which a set of system statistics can be inherited and passed along to a dependent client population. The algorithms and rules are the main topic of this page. The clock offset developed from these algorithms can discipline the system clock either using the ntpd clock discipline algorithm or enable the kernel to discipline the system clock directly, as described on the A Kernel Model for Precision Timekeeping page. Phase five is the topic of this page.
The cluster algorithm produces a list of survivors ranked by increasing synchronization distance. The combine algorithm uses this list to produce a weighted average of both offset and jitter. Absent other considerations discussed later, the combined offset is used to discipline the clock, while the combined jitter is combined with other components to produce the system jitter statistic inherited by dependent clients.
The combine algorithm uses a weight factor for each survivor computed as the select threshold minus the synchronization distance. Since the select algorithm rejects candidates with synchronization distance greater than the select threshold, the weight factor is always positive. The system jitter is calculated as the RMS weighted differences between the offset of each survivor and the offset of the first candidate on the survivor list.
Generally speaking, the best candidate for the system peer is the first candidate on the survivor list, as this has the smallest synchronization distance. However, it can cause some trauma in downstream clients when the system peer is changed frequently, so some care is taken to insure that a change does not occur, unless to do so would materially improve quality. Frequently when there are two or more servers available with substantially the same synchronization distance, one or another will show up at the head of the survivor list followed closely by the others of substantially the same distance. The object of the anti-clockhop algorithm is to avoid reselecting the system peer unless it becomes stale or significantly worse than the candidate at the head of the list.
Previously, the algorithm has remembered the last selected system peer or whether there was none. In addition, the algorithm has initialized the anti-clockhop threshold with the value of the mindist statistic, by default 1 ms. To help compact this discussion, we will call the last selected system peer the old peer, and the peer at the head of the survivor peer the candidate peer. If there was no old peer or the old and candidate peers are the same, the candidate peer becomes the system peer. If not, the algorithm measures the difference between the offset of the old peer and candidate peer. If the difference exceeds the anti-clockhop threshold, the candidate peer becomes the system peer and the anti-clockhop threshold is restored to its original value. If not, the old peer continues as the system peer. However, at each subsequent call, the algorithm reduces the anti-clockhop threshold by half. Should operation continue in this way, the candidate peer will eventually become the system peer.
The anti-clockhop algorithm is most effective in cases where multiple primary servers are available on fast LANs with modern computers. Typical offset differences in such cases are less than 0.5 ms and clockhops are much less frequent than if the algorithm is not used.
The behavior of the various algorithms and mitigation rules involved depends on how the various synchronization sources are classified. This depends on whether the source is local or remote and if local the type of source. The following classes are defined:
The mitigation rules are designed to provide an intelligent selection of the system peer from among the selectable sources of different types. When used with the server or peer commands, the prefer option designates one or more survivors as preferred over all others. While the rules do not forbid it, it is usually not useful to designate more than one source as preferred; however, if more than one source is so designated, they are used in the order specified in the configuration file; that is, if the first one becomes unselectable, the second one is considered and so forth. This order of priority is also applicable to multiple PPS drivers, multiple modem drivers and even multiple local drivers, although that would not normally be useful.
The cluster algorithm works on the set of truechimers produced by the select algorithm. At each round the algorithm casts off the survivor least likely to influence the choice of system peer. However, in the prefer scheme the cluster algorithm is modified so that the prefer peer is never discarded; on the contrary, its potential removal becomes a termination condition. However, the prefer peer can still be discarded by the select algorithm as a falseticker; otherwise, the prefer peer becomes the system peer.
Ordinarily, the combining algorithm computes a weighted average of the survivor offsets to produce the final synchronization source. However, if a prefer peer is among the survivors, the combining algorithm is not used. Instead, the offset of the prefer peer is used exclusively as the final synchronization source. In the common case involving a radio clock and a flock of remote backup servers, and with the radio clock designated a prefer peer, the result is that the radio clock normally disciplines the system clock as long as the radio itself remains operational. However, if the radio fails or becomes a falseticker, the averaged backup sources continue to discipline the system clock.
As the select algorithm scans the associations for selectable candidates, the modem driver and local driver are segregated for later, but only if not designated a prefer peer. If so designated, a driver is included among the candidate population. In addition, if orphan parents are found, the parent with the lowest metric is segregated for later; the others are discarded. For this purpose the metric is defined as the four-octet IPv4 address or the first four octets of the hashed IPv6 address. The resulting candidates, including any prefer peers found, are processed by the select algorithm to produce a possibly empty set of truechimers. The clustering algorithm ranks the truechimers first by stratum then by synchronization distance and temporarily designates the survivor with the lowest distance as the potential system peer.
If one or more truechimers support a pulse-per-second (PPS) signal and the PPS signal is operating correctly, it is designated a PPS driver. If more than one PPS diver are found, only the first one is used. The PPS driver is not included in the combining algorithm and is mitigated separately.
At this point we have the following contributors to the system clock discipline:
The mitigation algorithm proceeds in three steps in turn.
If none of the above is the case, the data are disregarded and the system variables remain as they are.
The minsane option of the tos command, the prefer option of the server and peer commands and the flag options of the fudge command for the PPS driver can be used with the mitigation rules to provide many useful configurations. The minsane option specifies the minimum number of survivors required to synchronized the system clock. The prefer option designates the prefer peer. The driver-dependent flag options enable the PPS driver for various conditions.
A common scenario is a GPS driver with a serial timecode and PPS signal. The PPS signal is disabled until the system clock has been set by some means, not necessarily the GPS driver. If the serial timecode is within 0.4 s of the PPS signal, the GPS driver is designated the PPS driver and the PPS signal disciplines the system clock. If no GPS satellites are in view, or if the PPS signal is disconnected, the GPS driver stops updating the system clock and so eventually becomes unreachable and replaced by other sources.
Whether or not the GPS driver disables the PPS signal when unreachable is at the discretion of the driver. Ordinarily, the PPS signal is disabled in this case; however, when the GPS receiver has a precision holdover oscillator, the driver may elect to continue PPS operation. In this case, minsane can be set to zero so the PPS signal continues to discipline the system clock.