217 lines
9.3 KiB
Markdown
217 lines
9.3 KiB
Markdown
HowTo
|
|
=====
|
|
|
|
First, a note on typical 1990's and 2000's computer crystals. They are
|
|
truly pathetic. A "real" crystal oscillator (TCXO) usually has an
|
|
initial set error of less than 5 ppm, and variation over time, voltage,
|
|
and temperature measured in tenths of a ppm (and an OCXO can reach ±0.3
|
|
ppm stability over ten years and 85°C temperature swing). The devices
|
|
used in conventional PC motherboards and single board computers,
|
|
however, often have initial set errors up to 150 ppm, and will vary 5
|
|
ppm over the course of a day-night cycle in a pseudo-air-conditioned
|
|
space.
|
|
|
|
(Operating systems can sometimes exacerbate the problem. On some i686
|
|
Red Hat 7.3 systems with the clock at 512 Hz, or 953 microseconds per
|
|
tick, gives a built in 64 ppm error. Even the normally exemplary DEC
|
|
Alpha has, when run with Linux, a truly awful calibration scheme; Linux
|
|
runs it with a nominal ticks per second of 1024, which gives a tick
|
|
value of 977, theoretical additional error -448 ppm, actual frequency
|
|
observed -443.7 ppm.)
|
|
|
|
Still, the pattern is clear: the first and largest error of a crystal is
|
|
its initial set error. It is strongly recommended to calibrate each
|
|
computer, and store its frequency error in a non-volatile medium, before
|
|
attempting anything else with time setting and locking. While one could
|
|
do it in a few seconds using an accurate frequency counter, another way
|
|
is shown below using a software-only method with ntpclient and a high
|
|
quality NTP server.
|
|
|
|
To perform the activities described, you need a way to control and
|
|
monitor your system's clock -- both its frequency and value. On Linux,
|
|
the kernel API is described in `adjtimex(2)`. There are at least two
|
|
tools that provide shell-level access to this interface, both called
|
|
`adjtimex(1)`.
|
|
|
|
One is written by Steven Dick and Jim Van Zandt, see the adjtimex* files
|
|
at http://www.ibiblio.org/pub/Linux/system/admin/time/. It uses long
|
|
options, and includes some interesting functionality beyond the basic
|
|
exposure of `adjtimex(2)`.
|
|
|
|
Larry Doolittle wrote the other; it uses short options, and has no
|
|
bloat^H^H^H^H^Hextra features. This version is included with ntpclient
|
|
as a standalone version; it is also incorporated into [BusyBox][],
|
|
although you may have to select it at compile time, like any other
|
|
component.
|
|
|
|
Fortunately (and not coincidentally) the core functions of the two
|
|
adjtimex programs can be used interchangeably, as long as you only use
|
|
the short option variant of the Dick/Van Zandt adjtimex. The options
|
|
discussed here are:
|
|
|
|
-f frequency (integer kernel units)
|
|
-o time offset in microseconds
|
|
-t kernel tick (microseconds per jiffy)
|
|
|
|
First, set the time approximately right, as root:
|
|
|
|
ntpclient -s -h $NTPHOST
|
|
|
|
You should see a single line printed like
|
|
|
|
36765 4980.373 1341.0 39.7 956761.4 839.2 0
|
|
|
|
Get used to this line: column headers are:
|
|
|
|
1. day since 1900
|
|
2. seconds since midnight
|
|
3. elapsed time for NTP transaction (microseconds)
|
|
4. internal server delay (microseconds)
|
|
5. clock difference between your computer and the NTP server (microseconds)
|
|
6. dispersion reported by server (microseconds)
|
|
7. your computer's adjtimex frequency (ppm * 65536)
|
|
|
|
So in the example above, your computer's clock was a bit more than
|
|
0.95 seconds fast, compared to the clock on $NTPHOST.
|
|
|
|
Now check that the clock setting worked.
|
|
|
|
ntpclient -c 1 -h $NTPHOST
|
|
36765 4993.512 1345.0 40.9 3615.3 839.2 0
|
|
|
|
So now the time difference is only a few milliseconds.
|
|
|
|
On to measure the frequency calibration for your system. If you're in a
|
|
hurry, it's OK to only spend 20 minutes on this step.
|
|
|
|
ntpclient -i 60 -c 20 -h $NTPHOST >$(hostname).ntp.log &
|
|
|
|
Otherwise, you will learn much more about your system and its
|
|
communication with the NTP server by letting the log run for 24 hours.
|
|
|
|
ntpclient -i 300 -c 288 -h $NTPHOST >$(hostname).ntp.log &
|
|
|
|
Things to watch for in the above log: If the last column (kernel
|
|
frequency fine tune) ever changes, you haven't turned off other time
|
|
adjustment programs. AFAIK the only programs around that would move
|
|
this number are ntpclient and xntpd. On most out-of-the-box systems,
|
|
that last column should start zero and stay zero.
|
|
|
|
Use gnuplot to plot the resulting file as follows:
|
|
|
|
plot "HOSTNAME.ntp.log" using (($1-36765)*86400+$2):5:($3+$6-$4) with yerrorbars
|
|
|
|
This shows time error (microseconds) as a function of elapsed time
|
|
(seconds). The error bars show the uncertainty in the measurement.
|
|
Ideally, it would be a smooth, straight line, where the slope represents
|
|
the frequency error of your crystal.
|
|
|
|
If an occasional point is both off-center and has a large error bar, it
|
|
shows a transaction got delayed somewhere in the process, either inside
|
|
the server, or one of the two UDP packet propagation steps. This is
|
|
normal, and ntpclient can deal with those quite well. If points are not
|
|
evenly spaced on the horizontal axis, packets were actually lost; this
|
|
is less common, but still OK.
|
|
|
|
If the error bar becomes suddenly large, and takes a few minutes to
|
|
slowly recover, your NTP host (presumably [xntpd][]) had problems
|
|
communicating with _its_ server, and reported that problem to you by
|
|
increasing its "dispersion" (this is a hack, required by xntpd's core
|
|
incorrect assumption that errors in network delays have Gaussian
|
|
statistics; ntpclient does not have this flaw).
|
|
|
|
If there are sudden large, persistent steps in error, some other program
|
|
is making step changes to time. Check for, e.g., ntpdate run as a cron
|
|
job. If your client machine is OK, check for problems on the _host_
|
|
machine.
|
|
|
|
Assuming the graph above is clean, and has non-garbled data for the
|
|
first and last points, you can run it through the enclosed awk script
|
|
(rate.awk) to determine the appropriate frequency value.
|
|
|
|
$ awk -f rate.awk <test.dat
|
|
delta-t 119400 seconds
|
|
delta-o -142308 useconds
|
|
slope -1.19186 ppm
|
|
old frequency -1240000 ( -18.9209 ppm)
|
|
new frequency -1318109 ( -20.1127 ppm)
|
|
$
|
|
|
|
For now, you should plug in the new frequency value
|
|
|
|
adjtimex -f -1318109
|
|
|
|
Then reset the clock
|
|
|
|
ntpclient -s -h $NTPHOST
|
|
|
|
and ponder how it makes sense in _your_ (possibly embedded) environment
|
|
to have the number -1318109 applied via adjtimex every time your machine
|
|
boots. Or, simpler still, combine these two steps using a post-2005
|
|
version:
|
|
|
|
ntpclient -f -1318109 -s -h $NTPHOST
|
|
|
|
If the frequency offset (absolute value) is greater than about 230 ppm
|
|
(15073280), you have a problem: you may be able to fix it with the -t
|
|
option to adjtimex, or you need to hack phaselock.c, that has a maximum
|
|
adjustment extent of +/- 250 ppm built in (change the `#define MAX_CORRECT`
|
|
and rebuild ntpclient). I'd like to suggest that you replace the defective
|
|
crystal instead, but I understand that is rarely practical.
|
|
|
|
On to `ntpclient -l`. This is actually easy, if you performed and understood
|
|
the previous steps. Run
|
|
|
|
ntpclient -l -h $NTPHOST
|
|
|
|
in the background. It will make small (probably less than 3 ppm)
|
|
adjustments to the system frequency to keep the clocks locked. Typical
|
|
performance over Ethernet (even through a few routers) is a worst case
|
|
error of +/- 10 ms.
|
|
|
|
I won't try to tell you _where_ to put the boot time commands. They should
|
|
boil down to:
|
|
|
|
adjtimex -f $NONVOLATILE_MEMORY_VALUE
|
|
ntpclient -s -i 15 -g 10000 -h $NTPHOST
|
|
ntpclient -l -h $NTPHOST >some_log_file
|
|
|
|
The second line makes explicit the retries that may be required for this
|
|
UDP-based time protocol. If the first time request takes longer than 10000
|
|
microseconds to resolve, or the packets get lost, it instructs ntpclient to
|
|
try again 15 seconds later (the minimum retry period mandated by [RFC 4330][]),
|
|
and it won't exit until it gets such a suitable response.
|
|
|
|
As of 2006, ntpclient can in theory combine the three lines above into one:
|
|
|
|
ntpclient -f $NONVOLATILE_MEMORY_VALUE -s -l -i 600 -g 10000 -h $NTPHOST >some_log_file
|
|
|
|
This can streamline the startup process, since you may be able to avoid
|
|
a layer of shell scripting. On the other hand, it is less tested, and
|
|
there is no (current) means to independently set the packet interval for
|
|
the set and lock phases.
|
|
|
|
It's an interesting question how sensitive the boot process should be to
|
|
the time set process. If you have a battery backed hardware clock,
|
|
there's not much problem running for a while without a network-accurate
|
|
system clock. In that case you could put both ntpclient commands into a
|
|
background script, and the only possible issue is the sudden (but
|
|
probably small) warp of the clock at the indefinite time in the boot
|
|
sequence when ntpclient gets its acceptable answer. On the other hand,
|
|
some embedded computers have no clue what time it is until the network
|
|
responds. Any files created will be marked Jan 1 1970, and other
|
|
application-dependent issues may arise if there is a nonsense time on
|
|
the system during later parts of the boot sequence. Then you may well
|
|
want to enforce completion of the first ntpclient before starting your
|
|
application. If this is too drastic for you, and you want a fallback
|
|
mode when the time server is dead, add a `-c 5` switch to the end of
|
|
that ntpclient command, giving at most 5 retries, if something goes
|
|
wrong with the time set. For that approach to be useful, consider
|
|
patching the source to lower the minimum packet send interval from the
|
|
[RFC 4330][]-mandated 15 seconds.
|
|
|
|
|
|
[xntpd]: http://www.eecis.udel.edu/~mills/ntp/
|
|
[BusyBox]: http://www.busybox.net
|
|
[RFC 4330]: http://tools.ietf.org/html/rfc4330
|