Computer Clock Modelling and Analysis
A computer clock includes some kind of reference oscillator, which is stabilized by a quartz crystal or some other means, such as the power grid. Usually, the clock includes a prescaler, which divides the oscillator frequency to a standard value, such as 1 MHz or 100 Hz, and a
counter, implemented in hardware, software or some combination of the two, which can be read by the processor. For systems intended to be synchronized to an external source of standard time, there must be some means to correct the phase and frequency by occasional vernier adjustments produced by
the timekeeping protocol. Special care is necessary in all timekeeping system designs to insure that the clock indications are always monotonically increasing; that is, system time never "runs backwards."
Computer Clock Models
The simplest computer clock consists of a hardware latch which is set by overflow of a hardware counter or prescaler, and causes a processor interrupt or tick. The latch is reset when acknowledged by the processor, which then increments the value of a software clock counter. The
phase of the clock is adjusted by adding periodic corrections to the counter as necessary. The frequency of the clock can be adjusted by changing the value of the increment itself, in order to make the clock run faster or slower. The precision of this simple clock model is limited to the tick
interval, usually in the order of 10 ms; although in some systems the tick interval can be changed using a kernel variable.
This software clock model requires a processor interrupt on every tick, which can cause significant overhead if the tick interval is small, say in the order less 1 ms with the newer RISC processors. Thus, in order to achieve timekeeping precisions less than 1 ms, some kind of
hardware assist is required. A straightforward design consists of a voltagecontrolled oscillator (VCO), in which the
frequency is controlled by a buffered, digital/analog converter (DAC). Under the assumption that the VCO tolerance is 104 or 100 partspermillion (ppm) (a reasonable value for inexpensive crystals) and the precision required is 100 us (a reasonable goal for a RISC processor),
the DAC must include at least ten bits.
A design sketch of a computer clock constructed entirely of hardware logic components is shown in Figure 10a. The clock is read by first pulsing the read signal, which latches the current value of the clock counter, then adding the contents of the clockcounter latch and a
64bit clockoffset variable, which is maintained in processor memory. The clock phase is adjusted by adding a correction to the clockoffset variable, while the clock frequency is adjusted by loading a correction to the DAC latch. In principle, this clock model can be adapted to any precision by
changing the number of bits of the prescaler or clock counter or changing the VCO frequency. However, it does not seem useful to reduce precision much below the minimum interrupt latency, which is in the low microseconds for a modern RISC processor.
If it is not possible to vary the oscillator frequency, which might be the case if the oscillator is an external frequency standard, a design such as shown in Figure 10b may be used. It includes a fixedfrequency oscillator and prescaler which includes a dualmodulus swallow
counter that can be operated in either divideby10 or divideby11 modes as controlled by a pulse produced by a programmable divider (PD). The PD is loaded with a value representing the frequency offset. Each time the divider overflows a pulse is produced which switches the swallow counter from
the divideby10 mode to the divideby11 mode and then back again, which in effect "swallows" or deletes a single pulse of the prescaler pulse train.
The pulse train produced by the prescaler is controlled precisely over a small range by the contents of the PD. If programmed to emit pulses at a low rate, relatively few pulses are swallowed per second and the frequency counted is near the upper limit of its range; while, if
programmed to emit pulses at a high rate, relatively many pulses are swallowed and the frequency counted is near the lower limit. Assuming some degree of freedom in the choice of oscillator frequency and prescaler ratios, this design can compensate for a wide range of oscillator frequency
tolerances.
In all of the above designs it is necessary to limit the amount of adjustment incorporated in any step to insure that the system clock indications are always monotonically increasing. With the software clock model this is assured as long as the increment is never negative. When
the magnitude of a phase adjustment exceeds the tick interval (as corrected for the frequency adjustment), it is necessary to spread the adjustments over multiple tick intervals. This strategy amounts to a deliberate frequency offset sustained for an interval equal to the total number of ticks
required and, in fact, is a feature of the Unix clock model discussed below.
In the hardware clock models the same considerations apply; however, in these designs the tick interval amounts to a single pulse at the prescaler output, which may be in the order of 1 ms. In order to avoid decreasing the indicated time when a negative phase correction occurs,
it is necessary to avoid modifying the clockoffset variable in processor memory and to confine all adjustments to the VCO or prescaler. Thus, all phase adjustments must be performed by means of programmed frequency adjustments in much the same way as with the software clock model described
previously.
It is interesting to conjecture on the design of a processor assist that could provide all of the above functions in a compact, generalpurpose hardware interface. The interface might consist of a multifunction timer chip such as the AMD 9513A, which includes five 16bit
counters, each with programmable load and hold registers, plus an onboard crystal oscillator, prescaler and control circuitry. A 48bit hardware clock counter would utilize three of the 16bit counters, while the fourth would be used as the swallow counter and the fifth as the programmable divider.
With the addition of a programmablearray logic device and architecturespecific host interface, this compact design could provide all the functions necessary for a comprehensive timekeeping system.
The Fuzzball Clock Model
The Fuzzball clock model uses a combination of hardware and software to provide precision timing with a minimum of software and processor overhead. The model includes an oscillator, prescaler and hardware counter; however, the oscillator frequency remains constant and the
hardware counter produces only a fraction of the total number of bits required by the clock counter. A typical design uses a 64bit software clock counter and a 16bit hardware counter which counts the prescaler output. A hardwarecounter overflow causes the processor to increment the software
counter at the bit corresponding to the frequency 2N f p, where N is the number
of bits of the hardware counter and fp is the counted frequency at the prescaler output. The processor reads the clock counter by first generating a read pulse, which latches the hardware counter, and then adding its contents, suitably aligned, to the software counter.
The Fuzzball clock can be corrected in phase by adding a (signed) adjustment to the software clock counter. In practice, this is done only when the local time is substantially different from the time indicated by the clock and may violate the monotonicity requirement. Vernier
phase adjustments determined in normal system operation must be limited to no more than the period of the counted frequency, which is 1 kHz for LSI11 Fuzzballs. In the Fuzzball model these adjustments are performed at intervals of 4 s, called the adjustment interval, which provides a maximum
frequency adjustment range of 250 ppm. The adjustment opportunities are created using the intervaltimer facility, which is a feature of most operating systems and independent of the timeofday clock. However, if the counted frequency is increased from 1 kHz to 1 MHz for enhanced precision,
the adjustment frequency must be increased to 250 Hz, which substantially increases processor overhead. A modified design suitable for high precision clocks is presented in the next section.
In some applications involving the Fuzzball model, an external pulsepersecond (pps) signal is available from a reference source such as a cesium clock or GPS receiver. Such a signal generally provides much higher accuracy than the serial character string produced by a radio
timecode receiver, typically in the low nanoseconds. In the Fuzzball model this signal is processed by an interface which produces a hardware interrupt coincident with the arrival of the pps pulse. The processor then reads the clock counter and computes the residual modulo 1 s of the clock counter.
This represents the localclock error relative to the pps signal.
Assuming the seconds numbering of the clock counter has been determined by a reliable source, such as a timecode receiver, the offset within the second is determined by the residual computed above. In the NTP localclock model the timecode receiver or NTP establishes the time to
within ±128 ms, called the aperture, which guarantees the seconds numbering to within the second. Then, the pps residual can be used directly to correct the oscillator, since the offset must be
less than the aperture for a correctly operating timecode receiver and pps signal.
The above technique has an inherent error equal to the latency of the interrupt system, which in modern RISC processors is in the low tens of microseconds. It is possible to improve accuracy by latching the hardware timeofday counter directly by the pps pulse and then reading
the counter in the same way as usual. This requires additional circuitry to prioritize the pps signal relative to the pulse generated by the program to latch the counter.
The Unix Clock Model
The Unix 4.3bsd clock model is based on two system calls, settimeofday and adjtime, together with two kernel variables tick and tickadj. The settimeofday call unceremoniously resets the kernel clock to the value given, while the adjtime call slews the kernel clock to a new value
numerically equal to the sum of the present time of day and the (signed) argument given in the adjtime call. In order to understand the behavior of the Unix clock as controlled by the Fuzzball clock model described above, it is helpful to explore the operations of adjtime in more detail.
The Unix clock model assumes an interrupt produced by an onboard frequency source, such as the clock counter and prescaler described previously, to deliver a pulse train in the 100Hz range. In principle, the power grid frequency can be used, although it is much less stable than
a crystal oscillator. Each interrupt causes an increment called tick to be added to the clock counter. The value of the increment is chosen so that the clock counter, plus an initial offset established by the settimeofday call, is equal to the time of day in microseconds.
The Unix clock can actually run at three different rates, one corresponding to tick, which is related to the intrinsic frequency of the particular oscillator used as the clock source, one to tick + tickadj and the third to tick tickadj. Normally the rate corresponding to tick
is used; but, if adjtime is called, the argument delta given is used to calculate an interval DELTA t = delta tick over tickadj during which one or the other of the two rates are used, depending on the sign of delta. The effect is to slew the clock to a new value at a small, constant rate,
rather than incorporate the adjustment all at once, which could cause the clock to be set backward. With common values of tick = 10 ms and tickadj = 5 us, the maximum frequency adjustment range is ±
tickadj over tick = + {5 x 106} over {102} or ±500 ppm. Even larger ranges may be required in the case of some workstations (e.g., SPARC
stations) with extremely poor component tolerances.
When precisions not less than about 1 ms are required, the Fuzzball clock model can be adapted to the Unix model by software simulation, as described in Section 5 of the NTP specification, and calling adjtime at each adjustment interval. When precisions substantially better than
this are required, the hardware microsecond clock provided in some workstations can be used together with certain refinements of the Fuzzball and Unix clock models. The particular design described below is appropriate for a maximum oscillator frequency tolerance of 100 ppm (.01%), which can
be
obtained using a relatively inexpensive quartz crystal oscillator, but is readily scalable for other assumed
tolerances.
The clock model requires the capability to slew the clock frequency over the range ±100 ppm with an intrinsic
oscillator frequency error as great as ±100 ppm. Figure 11 shows the timing relationships at the extremes of the requirements
envelope. Starting from an assumed offset of nominal zero and an assumed error of +100 ppm at time 0 s, the line AC shows how the uncorrected offset grows with time. Let sigma represent the adjustment interval and a the interval AB, in seconds, and let r be the slew, or rate at which corrections
are introduced, in ppm. For an accuracy specification of 100 us, then
The line AE represents the extreme case where the clock is to be steered 100 ppm. Since the slew must be complete at the end of the adjustment interval,
These relationships are satisfied only if r > 200 ppm and sigma < 2 s. Using r = 300 ppm for convenience, sigma = 1.5 s and a < 0.5 s. For the Unix clock model with tick = 10 ms, this results in the value of tickadj = 3us.
One of the assumptions made in the Unix clock model is that the period of adjustment computed in the adjtime call must be completed before the next call is made. If not, this results in an error message to the system log. However, in order to correct for the intrinsic frequency
offset of the clock oscillator, the NTP clock model requires adjtime to be called at regular adjustment intervals of sigma s. Using the algorithms described here and the architecture constants in the NTP specification, these adjustments will always complete.
Mathematical Model of the NTP Logical Clock
The NTP logical clock can be represented by the feedbackcontrol model shown in Figure 12. The model consists of an adaptiveparameter, phaselock loop (PLL), which continuously adjusts the phase and frequency of an oscillator to compensate for its intrinsic jitter, wander and
drift. A mathematical analysis of this model developed along the lines of [SMI86] is presented in following sections, along with a design example useful for implementation guidance in operatingsystems environments such as Unix and Fuzzball. Table 9 summarizes the quantities ordinarily treated as
variables in the model. By convention, v is used for internal loop variables, theta for phase, omega for frequency and tau for time. Table 10 summarizes those quantities ordinarily fixed as constants in the model. Note that these are all expressed as a power of two in order to simplify the
implementation.
In Figure 12 the variable theta sub r represents the phase of the reference signal and theta sub o the phase of the voltagecontrolled oscillator (VCO). The phase detector (PD) produces a voltage v sub d representing the phase difference theta sub r  theta sub o . The clock
filter functions as a tapped delay line, with the output v sub s taken at the tap selected by the clockfilter algorithm described in the NTP specification. The loop filter, represented by the equations given below, produces a VCO correction voltage v sub c, which controls the oscillator frequency
and thus the phase theta sub o.
The PLL behavior is completely determined by its openloop, Laplace transfer function G(s) in the s domain. Since both frequency and phase corrections are required, an appropriate design consists of a typeII PLL, which is defined by the function
where omega sub c is the crossover frequency (also called loop gain), omega sub z is the corner frequency (required for loop stability) and tau determines the PLL time constant and thus the bandwidth. While this is a firstorder function and some improvement in phase noise might
be gained from a higherorder function, in practice the improvement is lost due to the effects of the clockfilter delay, as described below.
The openloop transfer function G(s) is constructed by breaking the loop at point a on Figure 12 and computing the ratio of the output phase theta sub o (s) to the reference phase theta sub r (s). This function is the product of the individual transfer functions for the phase
detector, clock filter, loop filter and VCO. The phase detector delivers a voltage v sub d (t) = theta sub r (t), so its transfer function is simply F sub d (s) = 1, expressed in V/rad. The VCO delivers a frequency change DELTA omega = { d theta sub o (t)} over {dt} = alpha {v sub c (t)}, where
alpha is the VCO gain in rad/Vsec and theta sub o (t) = alpha int v sub c (t) dt. Its transfer function is the Laplace transform of the integral, F sub o (s) = alpha over s, expressed in rad/V. The clock filter contributes a stochastic delay due to the clockfilter algorithm; but, for
present purposes, this delay will be assumed a constant T, so its transfer function is the Laplace transform of the delay, F sub s (s) = e sup { Ts}. Let F(s) be the transfer function of the loop filter, which has yet to be determined. The openloop transfer function G(s) is the product of these
four individual transfer functions:
For the moment, assume that the product Ts is small, so that e sup {Ts} approx 1. Making the following substitutions,
and rearranging yields
which corresponds to a constant term plus an integrating term scaled by the PLL time constant tau. This form is convenient for implementation as a sampleddata system, as described later.
With the parameter values given in Table 10, the Bode plot of the openloop transfer function G(s) consists of a 12 dB/octave line which intersects the 0dB baseline at omega sub c = 2 sup 12 rad/s, together with a +6 dB/octave line at the corner frequency omega sub z = 2 sup
14 rad/s. The damping factor zeta = omega sub c over {2 omega sub z} = 2 suggests the PLL will be stable and have a large phase margin together with a low overshoot. However, if the clockfilter delay T is not small compared to the loop delay, which is approximately equal to 1 over omega sub c,
the above analysis becomes unreliable and the loop can become unstable. With the values determined as above, T is ordinarily small enough to be neglected.
Assuming the output is taken at v sub s, the closedloop transfer function H(s) is
If only the relative response is needed and the clockfilter delay can be neglected, H(s) can be written
For some input function I(s) the output function I(s)H(s) can be inverted to find the time response. Using a unitstep input I(s) = 1 over s and the values determined as above, This yields a PLL rise time of about 52 minutes, a maximum overshoot of about 4.8 percent in about 1.7
hours and a settling time to within one percent of the initial offset in about 8.7 hours.
Parameter Management
A very important feature of the NTP PLL design is the ability to adapt its behavior to match the prevailing stability of the local oscillator and transmission conditions in the network. This is done using the <$Ealpha> and <$Etau> parameters shown in Table 10.
Mechanisms for doing this are described in following sections.
Adjusting VCO Gain
The alpha parameter is determined by the maximum frequency tolerance of the local oscillator and the maximum jitter requirements of the timekeeping system. This parameter is usually an architecture constant and fixed during system operation. In the implementation model described
below, the reciprocal of alpha, called the adjustment interval sigma, determines the time between corrections of the local clock, and thus the value of alpha. The value of sigma can be determined by the following procedure.
The maximum frequency tolerance for boardmounted, uncompensated quartzcrystal oscillators is probably in the range of 104 (100 ppm). Many if not most Internet timekeeping systems can tolerate jitter to at least the order of the intrinsic localclock resolution, called
precision in the NTP specification, which is commonly in the range from one to 20 ms. Assuming 103 s peaktopeak as the most demanding case, the interval between clock corrections must be no more than sigma = 10 sup 3 over {2 x 10 sup 4} = 5 sec. For the NTP reference model sigma = 4 sec in
order to allow for known features of the Unix operatingsystem kernel. However, in order to support future anticipated improvements in accuracy possible with faster workstations, it may be useful to decrease sigma to as little as onetenth the present value.
Note that if sigma is changed, it is necessary to adjust the parameters K sub f and K sub g in order to retain the same loop bandwidth; in particular, the same omega sub c and omega sub z. Since alpha varies as the reciprocal of sigma, if sigma is changed to something other than
22, as in Table 10, it is necessary to divide both K sub f and K sub g by sigma over 4 to obtain the new values.
Adjusting PLL Bandwidth
A key feature of the typeII PLL design is its capability to compensate for the intrinsic frequency errors of the local oscillator. This requires a initial period of adaptation in order to refine the frequency estimate (see later sections of this appendix). The tau parameter
determines the PLL time constant and thus the loop bandwidth, which is approximately equal to {omega sub c} over tau. When operated with a relatively large bandwidth small tau, as in the analysis above, the PLL adapts quickly to changes in the input reference signal, but has poor long term
stability. Thus, it is possible to accumulate substantial errors if the system is deprived of the reference signal for an extended period. When operated with a relatively small bandwidth large tau, the PLL adapts slowly to changes in the input reference signal, and may even fail to lock onto it.
Assuming the frequency estimate has stabilized, it is possible for the PLL to coast for an extended period without external corrections and without accumulating significant error.
In order to achieve the best performance without requiring individual tailoring of the loop bandwidth, it is necessary to compute each value of tau based on the measured values of offset, delay and dispersion, as produced by the NTP protocol itself. The traditional way of doing
this in precision timekeeping systems based on cesium clocks, is to relate tau to the Allan variance, which is defined
as the mean of the firstorder differences of sequential samples measured during a specified interval tau,
where y is the fractional frequency measured with respect to the local time scale and N is the number of samples.
In the NTP localclock model the Allan variance (called the compliance, h in Table 11) is approximated on a continuous basis by exponentially averaging the firstorder differences of the offset samples using an empirically determined averaging constant. Using somewhat adhoc
mapping functions determined from simulation and experience, the compliance is manipulated to produce the loop time constant and update interval.
The NTP Clock Model
The PLL behavior can also be described by a set of recurrence equations, which depend upon several variables and constants. The variables and parameters used in these equations are shown in Tables 9, 10 and 11. Note the use of powers of two, which facilitates implementation
using arithmetic shifts and avoids the requirement for a multiply/divide capability.
A capsule overview of the design may be helpful in understanding how it operates. The logical clock is continuously adjusted in small increments at fixed intervals of sigma. The increments are determined while updating the variables shown in Tables 9 and 11, which are computed
from received NTP messages as described in the NTP specification. Updates computed from these messages occur at discrete times as each is received. The intervals mu between updates are variable and can range up to about 17 minutes. As part of update processing the compliance h is computed and used
to adjust the PLL time constant tau. Finally, the update interval rho for transmitted NTP messages is determined as a fixed multiple of tau.
Updates are numbered from zero, with those in the neighborhood of the ith update shown in Figure 13. All variables are initialized at i = 0 to zero, except the time constant tau (0) = tau, poll interval mu (0) = tau (from Table 10) and compliance h (0) = K sub s. After an
interval mu (i)> ( i > 0) from the previous update the ith update arrives at time t(i) including the time offset v sub s (i). Then, after an interval mu (i +1) the i+1th update arrives at time t(i + 1) including the time offset v sub s (i + 1). When the update v sub s (i) is received, the
frequency error f(i + 1) and phase error g(i+1) are computed:
Note that these computations depend on the value of the time constant tau (i)> and poll interval mu (i) previously computed from the i1th update. Then, the time constant for the next interval is computed from the current value of the compliance h(i)
Next, using the new value of tau, called tau prime to avoid confusion, the poll interval is computed
Finally, the compliance h(i + 1) is recomputed for use in the i+1th update:
The factor tau prime in the above has the effect of adjusting the bandwidth of the PLL as a function of compliance. When the compliance has been low over some relatively long period, tau prime is increased and the bandwidth is decreased. In this mode small timing fluctuations
due to jitter in the network are suppressed and the PLL attains the most accurate frequency estimate. On the other hand, if the compliance becomes high due to greatly increased jitter or a systematic frequency offset, tau prime is decreased and the bandwidth is increased. In this mode the PLL is
most adaptive to transients which can occur due to reboot of the system or a major timing error. In order to maintain optimum stability, the poll interval rho is varied directly with tau.
A model suitable for simulation and parameter refinement can be constructed from the above recurrence relations. It is convenient to set the temporary variable a = g(i +1). At each adjustment interval sigma the quantity a over K sub g + {f(i + 1)} over K sub f is added to
the localclock phase and the quantity a over K sub g is subtracted from a. For convenience, let n be the greatest integer in {mu (i)} over sigma; that is, the number of adjustments that occur in the ith interval. Thus, at the end of the ith interval just before the i+1th update, the VCO control
voltage is:
Detailed simulation of the NTP PLL with the values specified in Tables 9, 10 and 11 and the clock filter described in the NTP specification results in the following characteristics: For a 100ms phase change the loop reaches zero error in 39 minutes, overshoots 7 ms at 54
minutes and settles to less than 1 ms in about six hours. For a 50ppm frequency change the loop reaches 1 ppm in about 16 hours and 0.1 ppm in about 26 hours. When the magnitude of correction exceeds a few milliseconds or a few ppm for more than a few updates, the compliance begins to increase,
which causes the loop time constant and update interval to decrease. When the magnitude of correction falls below about 0.1 ppm for a few hours, the compliance begins to decrease, which causes the loop time constant and update interval to increase. The effect is to provide a broad capture range
exceeding 4 s per day, yet the capability to resolve oscillator skew well below 1 ms per day. These characteristics are appropriate for typical crystalcontrolled oscillators with or without temperature compensation or oven control.
