The synchronization subnet is a connected network of
primary and secondary time servers, clients and interconnecting transmission paths. A primary time server is directly synchronized to
a primary reference source, usually a radio clock. A secondary time server derives synchronization, possibly via other secondary
servers, from a primary server over network paths possibly shared with other services. Under normal circumstances it is intended that
the synchronization subnet of primary and secondary servers assumes a hierarchical-master-slave configuration with the primary servers
at the root and secondary servers of decreasing accuracy at successive levels toward the leaves.
Following conventions established by the telephone
industry [BEL86], the accuracy of each server is defined by a number called the stratum, with the topmost level (primary servers)
assigned as one and each level downwards (secondary servers) in the hierarchy assigned as one greater than the preceding level. With
current technology and available radio clocks, single-sample accuracies in the order of a millisecond can be achieved at the network
interface of a primary server. Accuracies of this order require special care in the design and implementation of the operating system
and the local-clock mechanism, such as described in Section 5.
As the stratum increases from one, the single-sample
accuracies achievable will degrade depending on the network paths and local-clock stabilities. In order to avoid the tedious
calculations [BRA80] necessary to estimate errors in each specific configuration, it is useful to assume the mean measurement errors
accumulate approximately in proportion to the measured delay and dispersion relative to the root of the synchronization subnet.
Appendix H contains an analysis of errors, including a derivation of maximum error as a function of delay and dispersion, where the
latter quantity depends on the precision of the timekeeping system, frequency tolerance of the local clock and various residuals.
Assuming the primary servers are synchronized to standard time within known accuracies, this provides a reliable, deterministic
specification on timekeeping accuracies throughout the synchronization subnet.
Again drawing from the experience of the telephone
industry, which learned such lessons at considerable cost [ABA89], the synchronization subnet topology should be organized to produce
the highest accuracy, but must never be allowed to form a loop. An additional factor is that each increment in stratum involves a
potentially unreliable time server which introduces additional measurement errors. The selection algorithm used in NTP uses a variant
of the Bellman-Ford distributed routing algorithm  to compute the minimum-weight spanning trees rooted on the primary servers. The
distance metric used by the algorithm consists of the (scaled) stratum plus the synchronization distance, which itself consists of the
dispersion plus one-half the absolute delay. Thus, the synchronization path will always take the minimum number of servers to the
root, with ties resolved on the basis of maximum error.
As a result of this design, the subnet reconfigures
automatically in a hierarchical-master-slave configuration to produce the most accurate and reliable time, even when one or more
primary or secondary servers or the network paths between them fail. This includes the case where all normal primary servers (e.g.,
highly accurate WWVB radio clock operating at the lowest synchronization distances) on a possibly partitioned subnet fail, but one or
more backup primary servers (e.g., less accurate WWV radio clock operating at higher synchronization distances) continue operation.
However, should all primary servers throughout the subnet fail, the remaining secondary servers will synchronize among themselves
while distances ratchet upwards to a preselected maximum infinity due to the well-known properties of the Bellman-Ford algorithm. Upon
reaching the maximum on all paths, a server will drop off the subnet and free-run using its last determined time and frequency. Since
these computations are expected to be very precise, especially in frequency, even extended outage periods can result in timekeeping
errors not greater than a few milliseconds per day with appropriately stabilized oscillators (see Section 5).
In the case of multiple primary servers, the
spanning-tree computation will usually select the server at minimum synchronization distance. However, when these servers are at
approximately the same distance, the computation may result in random selections among them as the result of normal dispersive delays.
Ordinarily, this does not degrade accuracy as long as any discrepancy between the primary servers is small compared to the
synchronization distance. If not, the filter and selection algorithms will select the best of the available servers and cast out
outlyers as intended.