2020-08-07 10:18:33 -07:00

9.3 KiB


First, a note on typical 1990's and 2000's computer crystals. They are truly pathetic. A "real" crystal oscillator (TCXO) usually has an initial set error of less than 5 ppm, and variation over time, voltage, and temperature measured in tenths of a ppm (and an OCXO can reach ±0.3 ppm stability over ten years and 85°C temperature swing). The devices used in conventional PC motherboards and single board computers, however, often have initial set errors up to 150 ppm, and will vary 5 ppm over the course of a day-night cycle in a pseudo-air-conditioned space.

(Operating systems can sometimes exacerbate the problem. On some i686 Red Hat 7.3 systems with the clock at 512 Hz, or 953 microseconds per tick, gives a built in 64 ppm error. Even the normally exemplary DEC Alpha has, when run with Linux, a truly awful calibration scheme; Linux runs it with a nominal ticks per second of 1024, which gives a tick value of 977, theoretical additional error -448 ppm, actual frequency observed -443.7 ppm.)

Still, the pattern is clear: the first and largest error of a crystal is its initial set error. It is strongly recommended to calibrate each computer, and store its frequency error in a non-volatile medium, before attempting anything else with time setting and locking. While one could do it in a few seconds using an accurate frequency counter, another way is shown below using a software-only method with ntpclient and a high quality NTP server.

To perform the activities described, you need a way to control and monitor your system's clock -- both its frequency and value. On Linux, the kernel API is described in adjtimex(2). There are at least two tools that provide shell-level access to this interface, both called adjtimex(1).

One is written by Steven Dick and Jim Van Zandt, see the adjtimex* files at http://www.ibiblio.org/pub/Linux/system/admin/time/. It uses long options, and includes some interesting functionality beyond the basic exposure of adjtimex(2).

Larry Doolittle wrote the other; it uses short options, and has no bloat^H^H^H^H^Hextra features. This version is included with ntpclient as a standalone version; it is also incorporated into BusyBox, although you may have to select it at compile time, like any other component.

Fortunately (and not coincidentally) the core functions of the two adjtimex programs can be used interchangeably, as long as you only use the short option variant of the Dick/Van Zandt adjtimex. The options discussed here are:

-f    frequency (integer kernel units)
-o    time offset in microseconds
-t    kernel tick (microseconds per jiffy)

First, set the time approximately right, as root:

ntpclient -s -h $NTPHOST

You should see a single line printed like

36765 4980.373    1341.0     39.7  956761.4    839.2  0

Get used to this line: column headers are:

  1. day since 1900
  2. seconds since midnight
  3. elapsed time for NTP transaction (microseconds)
  4. internal server delay (microseconds)
  5. clock difference between your computer and the NTP server (microseconds)
  6. dispersion reported by server (microseconds)
  7. your computer's adjtimex frequency (ppm * 65536)

So in the example above, your computer's clock was a bit more than 0.95 seconds fast, compared to the clock on $NTPHOST.

Now check that the clock setting worked.

ntpclient -c 1 -h $NTPHOST
36765 4993.512    1345.0     40.9    3615.3    839.2  0

So now the time difference is only a few milliseconds.

On to measure the frequency calibration for your system. If you're in a hurry, it's OK to only spend 20 minutes on this step.

ntpclient -i 60 -c 20 -h $NTPHOST >$(hostname).ntp.log &

Otherwise, you will learn much more about your system and its communication with the NTP server by letting the log run for 24 hours.

ntpclient -i 300 -c 288 -h $NTPHOST >$(hostname).ntp.log &

Things to watch for in the above log: If the last column (kernel frequency fine tune) ever changes, you haven't turned off other time adjustment programs. AFAIK the only programs around that would move this number are ntpclient and xntpd. On most out-of-the-box systems, that last column should start zero and stay zero.

Use gnuplot to plot the resulting file as follows:

plot "HOSTNAME.ntp.log" using (($1-36765)*86400+$2):5:($3+$6-$4) with yerrorbars

This shows time error (microseconds) as a function of elapsed time (seconds). The error bars show the uncertainty in the measurement. Ideally, it would be a smooth, straight line, where the slope represents the frequency error of your crystal.

If an occasional point is both off-center and has a large error bar, it shows a transaction got delayed somewhere in the process, either inside the server, or one of the two UDP packet propagation steps. This is normal, and ntpclient can deal with those quite well. If points are not evenly spaced on the horizontal axis, packets were actually lost; this is less common, but still OK.

If the error bar becomes suddenly large, and takes a few minutes to slowly recover, your NTP host (presumably xntpd) had problems communicating with its server, and reported that problem to you by increasing its "dispersion" (this is a hack, required by xntpd's core incorrect assumption that errors in network delays have Gaussian statistics; ntpclient does not have this flaw).

If there are sudden large, persistent steps in error, some other program is making step changes to time. Check for, e.g., ntpdate run as a cron job. If your client machine is OK, check for problems on the host machine.

Assuming the graph above is clean, and has non-garbled data for the first and last points, you can run it through the enclosed awk script (rate.awk) to determine the appropriate frequency value.

$ awk -f rate.awk <test.dat
delta-t 119400 seconds
delta-o -142308 useconds
slope -1.19186 ppm
old frequency -1240000 ( -18.9209 ppm)
new frequency -1318109 ( -20.1127 ppm)

For now, you should plug in the new frequency value

adjtimex -f -1318109

Then reset the clock

ntpclient -s -h $NTPHOST

and ponder how it makes sense in your (possibly embedded) environment to have the number -1318109 applied via adjtimex every time your machine boots. Or, simpler still, combine these two steps using a post-2005 version:

ntpclient -f -1318109 -s -h $NTPHOST

If the frequency offset (absolute value) is greater than about 230 ppm (15073280), you have a problem: you may be able to fix it with the -t option to adjtimex, or you need to hack phaselock.c, that has a maximum adjustment extent of +/- 250 ppm built in (change the #define MAX_CORRECT and rebuild ntpclient). I'd like to suggest that you replace the defective crystal instead, but I understand that is rarely practical.

On to ntpclient -l. This is actually easy, if you performed and understood the previous steps. Run

ntpclient -l -h $NTPHOST

in the background. It will make small (probably less than 3 ppm) adjustments to the system frequency to keep the clocks locked. Typical performance over Ethernet (even through a few routers) is a worst case error of +/- 10 ms.

I won't try to tell you where to put the boot time commands. They should boil down to:

ntpclient -s -i 15 -g 10000 -h $NTPHOST
ntpclient -l -h $NTPHOST >some_log_file

The second line makes explicit the retries that may be required for this UDP-based time protocol. If the first time request takes longer than 10000 microseconds to resolve, or the packets get lost, it instructs ntpclient to try again 15 seconds later (the minimum retry period mandated by RFC 4330), and it won't exit until it gets such a suitable response.

As of 2006, ntpclient can in theory combine the three lines above into one:

ntpclient -f $NONVOLATILE_MEMORY_VALUE -s -l -i 600 -g 10000 -h $NTPHOST >some_log_file

This can streamline the startup process, since you may be able to avoid a layer of shell scripting. On the other hand, it is less tested, and there is no (current) means to independently set the packet interval for the set and lock phases.

It's an interesting question how sensitive the boot process should be to the time set process. If you have a battery backed hardware clock, there's not much problem running for a while without a network-accurate system clock. In that case you could put both ntpclient commands into a background script, and the only possible issue is the sudden (but probably small) warp of the clock at the indefinite time in the boot sequence when ntpclient gets its acceptable answer. On the other hand, some embedded computers have no clue what time it is until the network responds. Any files created will be marked Jan 1 1970, and other application-dependent issues may arise if there is a nonsense time on the system during later parts of the boot sequence. Then you may well want to enforce completion of the first ntpclient before starting your application. If this is too drastic for you, and you want a fallback mode when the time server is dead, add a -c 5 switch to the end of that ntpclient command, giving at most 5 retries, if something goes wrong with the time set. For that approach to be useful, consider patching the source to lower the minimum packet send interval from the RFC 4330-mandated 15 seconds.