0day_dev/ntpclient-2018_244/docs/HOWTO.md

HowTo
=====

First, a note on typical 1990's and 2000's computer crystals.  They are
truly pathetic.  A "real" crystal oscillator (TCXO) usually has an
initial set error of less than 5 ppm, and variation over time, voltage,
and temperature measured in tenths of a ppm (and an OCXO can reach ±0.3
ppm stability over ten years and 85°C temperature swing).  The devices
used in conventional PC motherboards and single board computers,
however, often have initial set errors up to 150 ppm, and will vary 5
ppm over the course of a day-night cycle in a pseudo-air-conditioned
space.

(Operating systems can sometimes exacerbate the problem.  On some i686
Red Hat 7.3 systems with the clock at 512 Hz, or 953 microseconds per
tick, gives a built in 64 ppm error.  Even the normally exemplary DEC
Alpha has, when run with Linux, a truly awful calibration scheme; Linux
runs it with a nominal ticks per second of 1024, which gives a tick
value of 977, theoretical additional error -448 ppm, actual frequency
observed -443.7 ppm.)

Still, the pattern is clear: the first and largest error of a crystal is
its initial set error.  It is strongly recommended to calibrate each
computer, and store its frequency error in a non-volatile medium, before
attempting anything else with time setting and locking.  While one could
do it in a few seconds using an accurate frequency counter, another way
is shown below using a software-only method with ntpclient and a high
quality NTP server.

To perform the activities described, you need a way to control and
monitor your system's clock -- both its frequency and value.  On Linux,
the kernel API is described in `adjtimex(2)`.  There are at least two
tools that provide shell-level access to this interface, both called
`adjtimex(1)`.

One is written by Steven Dick and Jim Van Zandt, see the adjtimex* files
at http://www.ibiblio.org/pub/Linux/system/admin/time/. It uses long
options, and includes some interesting functionality beyond the basic
exposure of `adjtimex(2)`.

Larry Doolittle wrote the other; it uses short options, and has no
bloat^H^H^H^H^Hextra features.  This version is included with ntpclient
as a standalone version; it is also incorporated into [BusyBox][],
although you may have to select it at compile time, like any other
component.

Fortunately (and not coincidentally) the core functions of the two
adjtimex programs can be used interchangeably, as long as you only use
the short option variant of the Dick/Van Zandt adjtimex.  The options
discussed here are:

    -f    frequency (integer kernel units)
    -o    time offset in microseconds
    -t    kernel tick (microseconds per jiffy)

First, set the time approximately right, as root:

    ntpclient -s -h $NTPHOST

You should see a single line printed like

    36765 4980.373    1341.0     39.7  956761.4    839.2  0

Get used to this line: column headers are:

1. day since 1900
2. seconds since midnight
3. elapsed time for NTP transaction (microseconds)
4. internal server delay (microseconds)
5. clock difference between your computer and the NTP server (microseconds)
6. dispersion reported by server (microseconds)
7. your computer's adjtimex frequency (ppm * 65536)

So in the example above, your computer's clock was a bit more than
0.95 seconds fast, compared to the clock on $NTPHOST.

Now check that the clock setting worked.

    ntpclient -c 1 -h $NTPHOST
    36765 4993.512    1345.0     40.9    3615.3    839.2  0

So now the time difference is only a few milliseconds.

On to measure the frequency calibration for your system.  If you're in a
hurry, it's OK to only spend 20 minutes on this step.

    ntpclient -i 60 -c 20 -h $NTPHOST >$(hostname).ntp.log &

Otherwise, you will learn much more about your system and its
communication with the NTP server by letting the log run for 24 hours.

    ntpclient -i 300 -c 288 -h $NTPHOST >$(hostname).ntp.log &

Things to watch for in the above log: If the last column (kernel
frequency fine tune) ever changes, you haven't turned off other time
adjustment programs.  AFAIK the only programs around that would move
this number are ntpclient and xntpd.  On most out-of-the-box systems,
that last column should start zero and stay zero.

Use gnuplot to plot the resulting file as follows:

    plot "HOSTNAME.ntp.log" using (($1-36765)*86400+$2):5:($3+$6-$4) with yerrorbars

This shows time error (microseconds) as a function of elapsed time
(seconds).  The error bars show the uncertainty in the measurement.
Ideally, it would be a smooth, straight line, where the slope represents
the frequency error of your crystal.

If an occasional point is both off-center and has a large error bar, it
shows a transaction got delayed somewhere in the process, either inside
the server, or one of the two UDP packet propagation steps.  This is
normal, and ntpclient can deal with those quite well.  If points are not
evenly spaced on the horizontal axis, packets were actually lost; this
is less common, but still OK.

If the error bar becomes suddenly large, and takes a few minutes to
slowly recover, your NTP host (presumably [xntpd][]) had problems
communicating with _its_ server, and reported that problem to you by
increasing its "dispersion" (this is a hack, required by xntpd's core
incorrect assumption that errors in network delays have Gaussian
statistics; ntpclient does not have this flaw).

If there are sudden large, persistent steps in error, some other program
is making step changes to time.  Check for, e.g., ntpdate run as a cron
job.  If your client machine is OK, check for problems on the _host_
machine.

Assuming the graph above is clean, and has non-garbled data for the
first and last points, you can run it through the enclosed awk script
(rate.awk) to determine the appropriate frequency value.

    $ awk -f rate.awk <test.dat
    delta-t 119400 seconds
    delta-o -142308 useconds
    slope -1.19186 ppm
    old frequency -1240000 ( -18.9209 ppm)
    new frequency -1318109 ( -20.1127 ppm)
    $

For now, you should plug in the new frequency value

    adjtimex -f -1318109

Then reset the clock

    ntpclient -s -h $NTPHOST

and ponder how it makes sense in _your_ (possibly embedded) environment
to have the number -1318109 applied via adjtimex every time your machine
boots.  Or, simpler still, combine these two steps using a post-2005
version:

    ntpclient -f -1318109 -s -h $NTPHOST

If the frequency offset (absolute value) is greater than about 230 ppm
(15073280), you have a problem: you may be able to fix it with the -t
option to adjtimex, or you need to hack phaselock.c, that has a maximum
adjustment extent of +/- 250 ppm built in (change the `#define MAX_CORRECT`
and rebuild ntpclient).  I'd like to suggest that you replace the defective
crystal instead, but I understand that is rarely practical.

On to `ntpclient -l`.  This is actually easy, if you performed and understood
the previous steps.  Run

    ntpclient -l -h $NTPHOST

in the background.  It will make small (probably less than 3 ppm)
adjustments to the system frequency to keep the clocks locked.  Typical
performance over Ethernet (even through a few routers) is a worst case
error of +/- 10 ms.

I won't try to tell you _where_ to put the boot time commands.  They should
boil down to:

    adjtimex -f $NONVOLATILE_MEMORY_VALUE
    ntpclient -s -i 15 -g 10000 -h $NTPHOST
    ntpclient -l -h $NTPHOST >some_log_file

The second line makes explicit the retries that may be required for this
UDP-based time protocol.  If the first time request takes longer than 10000
microseconds to resolve, or the packets get lost, it instructs ntpclient to
try again 15 seconds later (the minimum retry period mandated by [RFC 4330][]),
and it won't exit until it gets such a suitable response.

As of 2006, ntpclient can in theory combine the three lines above into one:

    ntpclient -f $NONVOLATILE_MEMORY_VALUE -s -l -i 600 -g 10000 -h $NTPHOST >some_log_file

This can streamline the startup process, since you may be able to avoid
a layer of shell scripting.  On the other hand, it is less tested, and
there is no (current) means to independently set the packet interval for
the set and lock phases.

It's an interesting question how sensitive the boot process should be to
the time set process.  If you have a battery backed hardware clock,
there's not much problem running for a while without a network-accurate
system clock.  In that case you could put both ntpclient commands into a
background script, and the only possible issue is the sudden (but
probably small) warp of the clock at the indefinite time in the boot
sequence when ntpclient gets its acceptable answer.  On the other hand,
some embedded computers have no clue what time it is until the network
responds.  Any files created will be marked Jan 1 1970, and other
application-dependent issues may arise if there is a nonsense time on
the system during later parts of the boot sequence.  Then you may well
want to enforce completion of the first ntpclient before starting your
application.  If this is too drastic for you, and you want a fallback
mode when the time server is dead, add a `-c 5` switch to the end of
that ntpclient command, giving at most 5 retries, if something goes
wrong with the time set.  For that approach to be useful, consider
patching the source to lower the minimum packet send interval from the
[RFC 4330][]-mandated 15 seconds.


[xntpd]: http://www.eecis.udel.edu/~mills/ntp/
[BusyBox]: http://www.busybox.net
[RFC 4330]: http://tools.ietf.org/html/rfc4330