Linux - ServerThis forum is for the discussion of Linux Software used in a server related context.
Notices
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
My first message here!
I am dealing with an NTP issue. I have searched the web for a couple of days, got a basic knowledge of how the NTP protocol works, but still I am a bit puzzled and I have a few questions.
Everything started some of our servers could not keep up with time, ending up with awful OFFSET values and drifting.
I understand that the NTP is not a simple synchronisation task that steps the time every time it's run: it's an alghoritm, that pools the time from a number of server and assess the accuracy of the system clock, coming up with a way to slew it so the user will never see the time changing.
Only when the time is wildly out, will the NTP step the time "one off" for restoring the time.
I also understand that the NTP will step the time when the offset is over 125ms and it would refuse to operate when the clock is more than 1000s off. Also, that NTP has a limitation of - if memory serves - 49s per day.
I have set my NTP on my server using 4 external NTP servers, stratum 1, 2 and 3. It looks like the offset varies from 0 to something as big as 600, and I do not know why.
Yesterday I first set my clock 'one off' manually (using NTPDATE, with the NTPD off) then I amended my configuration file and started the NTP deamon.
After a while the NTP.DRIFT file was populated.
I have been monitoring the NTP using NTPQ and I could not find anything obvious (to me).
My problem is that the OFFSET value randomly jumps from 0-ish to 300/500 and I am not sure that that behaviour is normal? I will keep monitoring with my last configuration (previous time I was only using one NTP server). My other servers eventually drift to 30000/50000 until the NTP comes up with a "frequency error".
Here is my current configuration (while I was writing, the offset drifted to 1000!)
From the RV command, I can see that my clock IS adjusted continuously, so why the OFFSET is getting bigger and bigger?
I understand that this could be caused by a hardware clock too drifty, but I am still puzzled.
Also, over 128ms the NTP deamon should step the time, why does that not happen?
What I did notice is that the value in the DRIFT file is changing. It was 49, then -5.99, now it's 24.032.
My logs do not show anything strange
Code:
7 May 18:27:32 ntpd[4093]: synchronized to 130.88.200.4, stratum 2
7 May 18:30:53 ntpd[4093]: ntpd exiting on signal 15
7 May 18:31:12 ntpd[13566]: synchronized to 130.88.200.4, stratum 2
7 May 21:01:54 ntpd[13566]: ntpd exiting on signal 15
7 May 21:03:16 ntpd[10666]: invalid flags (9088) in file /tmp/ntpDMGX5S
7 May 21:03:33 ntpd[7313]: synchronized to 130.88.200.4, stratum 2
7 May 21:41:44 ntpd[7313]: ntpd exiting on signal 15
7 May 21:43:00 ntpd[13653]: synchronized to 81.168.77.149, stratum 3
7 May 21:57:49 ntpd[13653]: ntpd exiting on signal 15
7 May 21:58:08 ntpd[19317]: synchronized to 81.168.77.149, stratum 3
7 May 22:00:14 ntpd[19317]: ntpd exiting on signal 15
7 May 22:00:33 ntpd[25420]: synchronized to 81.168.77.149, stratum 3
Besides a "invalid flags" error which I have never seen before.
Can anybody help me? Please let me know if you need further details.
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
You're correct that the offset values look pretty much like a problem and that problem is network related (almost always).
You don't say where in the world you are (I'll assume North America); here's a suggestion that you try the pool servers plus the "local clock" as here:
Code:
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
#server pool.ntp.org
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
With these settings, on HugesNet satellite service (which has delays for the round trip to the satellite) a "normal" display should look like this:
Code:
ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 15h 64 0 0.000 0.000 0.000
*50.116.55.161 192.5.41.40 2 u 730 1024 377 1280.03 -22.906 32.752
+65.23.154.62 149.20.64.28 2 u 325 1024 377 1341.86 -67.265 73.929
+66.162.15.65 64.236.96.53 2 u 274 1024 377 1414.38 -93.636 95.378
You only need three and you should not be using a stratum 1 server unless you've asked for permission to do so (it's considered impolite, more or less).
The inclusion of
Code:
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
is so that NTP will fall back on the system clock when no external source is available; like when your network connection goes away for some reason.
What NTP does is evaluate the time servers to select the "best" one of them to synchronize to. That will change from time to time -- it will replace servers that are slow, noisy, or just plain gone out of service periodically which is why it's a good idea to use the pool servers rather than specifying specific addresses.
The drift file, generally /etc/ntp/drift, will change over time as NTP evaluates your system clock versus a time standard. That value should not swing wildly, but it will change for a while then settle down over time -- it can take a few days for that to happen.
When you first implement NTP your system clock may be in never-never land somewhere and it's a good idea to initially set the system clock with ntpdate. Once NTP synchronizes, though, you should not need to do that. The system clock, which is a "software clock," run by the kernel via interrupts. On boot, it is initially set from the hardware clock then NTP keeps it on-time once synchronized to an external time source. My systems are currently synchronized to
Code:
*50.116.55.161 192.5.41.40 2 u 730 1024 377 1280.03 -22.906 32.752
the one that has the asterisk is the one you're synchronized to, the others are candidates for synchronization if that one goes away.
The routine that starts the NTP daemon should look a lot like this:
In particular, the CMDLINE="/usr/sbin/ntpd -g", the -g, allows the first adjustment to be big; when the daemon first synchronizes it will slew the system clock (within the limits you've discovered).
So, how does the system time get set? At boot, it should be set from the hardware clock; looks something like this:
Code:
# Set the system time from the hardware clock using hwclock --hctosys.
if [ -x /sbin/hwclock ]; then
# Check for a broken motherboard RTC clock (where ioports for rtc are
# unknown) to prevent hwclock causing a hang:
if ! grep -q -w rtc /proc/ioports ; then
CLOCK_OPT="--directisa"
fi
if grep -wq "^UTC" /etc/hardwareclock ; then
echo -n "Setting system time from the hardware clock (UTC): "
/sbin/hwclock $CLOCK_OPT --utc --hctosys
else
echo -n "Setting system time from the hardware clock (localtime): "
/sbin/hwclock $CLOCK_OPT --localtime --hctosys
fi
date
fi
So, there's hint -- the hardware clock is run by a battery on the mother board and is usually accurate (not as good as your wristwatch, but pretty good). Depending on how you specified your hardware clock -- either local time or UTC -- the above reads it and initially sets the system clock.
On shutdown, the opposite happens:
Code:
# Save the system time to the hardware clock using hwclock --systohc.
if [ -x /sbin/hwclock ]; then
# Check for a broken motherboard RTC clock (where ioports for rtc are
# unknown) to prevent hwclock causing a hang:
if ! grep -q -w rtc /proc/ioports ; then
CLOCK_OPT="--directisa"
fi
if grep -q "^UTC" /etc/hardwareclock 2> /dev/null ; then
echo "Saving system time to the hardware clock (UTC)."
/sbin/hwclock $CLOCK_OPT --utc --systohc
else
echo "Saving system time to the hardware clock (localtime)."
/sbin/hwclock $CLOCK_OPT --localtime --systohc
fi
fi
You're running NTP, the system clock is synchronized to an external time source, when you shut down, that sets the hardware clock to the correct time. Pretty neat, huh?
So, bottom line here -- your ntpq display looks like you've hard-defined time sources that may not be worthwhile and you might want to try the pool servers (for a day or so) and see if you get better results. That "for a day or two" is meaningful -- NTP takes time to settle down, it does adjustments over time so let run for a few days and see.
I would remove the multiple "prefer" and "iburst" options from you configuration (you really don't need them and multiple "prefer," well, not good -- see the comment in the ntp.conf file below).
Just in case it helps, here's a long-term, known-good ntp.conf you may find interesting; the stuff that's commented-out just is not used:
Code:
cat /etc/ntp.conf
# Sample /etc/ntp.conf: Configuration file for ntpd.
#
# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available. The
# default stratum is usually 3, but in this case we elect to use stratum
# 0. Since the server line does not have the prefer keyword, this driver
# is never used for synchronization, unless no other other
# synchronization source is available. In case the local host is
# controlled by some external source, such as an external oscillator or
# another protocol, the prefer keyword would cause the local host to
# disregard all other synchronization sources, unless the kernel
# modifications are in use and declare an unsynchronized condition.
#
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
#server pool.ntp.org
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
#
# Drift file.
# Put this in a directory which the daemon can write to.
# No symbolic links allowed, either, since the daemon updates the file
# by creating a temporary in the same directory and then rename()'ing
# it to the file.
#
driftfile /etc/ntp/drift
#
# Log file
#
#logconfig=allclock +allpeer +allsys +allsync
#logfile /var/log/ntp.log
#
# Statistics stuff
#
# statsdir /var/log/ntpstats/ # directory for statistics files
# filegen peerstats file peerstats type day enable
# filegen loopstats file loopstats type day enable
# filegen clockstats file clockstats type day enable
multicastclient 224.0.1.1
broadcastdelay 0.008
#
# Keys file. If you want to diddle your server at run time, make a
# keys file (mode 600 for sure) and define the key number to be
# used for making requests.
# PLEASE DO NOT USE THE DEFAULT VALUES HERE. Pick your own, or remote
# systems might be able to reset your clock at will.
#
#keys /etc/ntp/keys
#trustedkey 65535
#requestkey 65535
#controlkey 65535
# Don't serve time or stats to anyone else by default (more secure)
restrict default noquery nomodify
# Trust ourselves. :-)
restrict 127.0.0.1
Thanks for the very detailed reply, it is really helpful. I do appreciate the time you've taken to write the post.
I have only a couple of problems here. It looks like I do not have DNS on this server - I cannot ping the pool servers, or any www sites, so I am assuming the DNS is not on. It's a special purpose server and the configuration is done by the manufacturer. I have the root access and I can change the NTP configuration file - which is going to be replaced by the 'factory' one on reboot - but so far I haven't got the DNS working). I tried the pool servers before and it would not work.
Is there a way to use the pool servers without DNS? Can I just find out the IP number of 0.uk.pool.ntp.org? (and BTW I am in the UK!)
Also, what puzzles me is that everything seems ok but there is nothing to tell you that the NTP is not actually working? Is there a sort of debug mode or log where I can see what is actually happening? Not that I do not trust you, but it seems strange to me that everything looks fine and there is not a way to find out what is wrong.
Finally, the manufacturer actually suggested me to change the pooling time, reducing the maximum time from 1024s to 256s (maxpool=8). After what I found, I feel that this is not a solution. I think that this wrongly assumes that when the NTP pools a server, it syncs the clock on it, while the entire algorithm is actually constantly evaluating the system clock. My opinion is that reducing the pool time is not going to improve things here. Your opinion?
I will try your configuration file (maybe using static NTP servers for the time being) and I'll post the results shortly.
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
You can, of course, ping the pool servers (on a machine with DNS). Downside: they do change from time to time (which is, of course, why you use the pool servers in the first place).
Have you tried adding a DNS server to /etc/resolv.conf? Of the form
Code:
earch com
# Google Free DNS Servers
nameserver 8.8.8.8
nameserver 8.8.4.4
(obviously, use public DNS servers -- just two -- available in the UK) and see if that helps. Watch that DHCP (if you're using that) doesn't wipe it out, though (there's a configuration to stop DHCP from overwriting /etc/resolv.conf).
If NTP is not working (like it died) you won't see something like
ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 20h 64 0 0.000 0.000 0.000
+50.116.55.161 192.5.41.40 2 u 839 1024 377 1405.66 -31.217 83.114
*65.23.154.62 149.20.64.28 2 u 369 1024 377 1311.92 -20.765 67.435
+66.162.15.65 64.236.96.53 2 u 282 1024 377 1348.51 -24.707 46.598
it is working.
If you must use addresses, ping them like this
Code:
ping -c 5 65.23.154.62
PING 65.23.154.62 (65.23.154.62) 56(84) bytes of data.
64 bytes from 65.23.154.62: icmp_req=1 ttl=45 time=863 ms
64 bytes from 65.23.154.62: icmp_req=2 ttl=45 time=1101 ms
64 bytes from 65.23.154.62: icmp_req=3 ttl=45 time=1200 ms
64 bytes from 65.23.154.62: icmp_req=4 ttl=45 time=776 ms
64 bytes from 65.23.154.62: icmp_req=5 ttl=45 time=1094 ms
--- 65.23.154.62 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3999ms
rtt min/avg/max/mdev = 776.025/1007.216/1200.003/159.899 ms, pipe 2
(that's the server I'm synchronized with); you're looking for the shortest time value and, of course, no dropped packets. Too, keep in mind that my times are longer than yours probably will be because of the satellite delay (22,500 miles up, 22,500 miles down, find the site, then back up and back down again -- takes a while). Pick the best three, put them in your /etc/ntp.conf file as servers and do not use the prefer or iburst directives. Keep in mind that simpler is better.
I'd leave the pool time at the default value unless somebody could prove to me that changing is actually beneficial. I can't imagine any good reason to do that unless the manufacturer has some magical mystical thing going that would be reasonable. Once you're synchronized, you're synchronized and NTP is managing your system clock; end of story.
Are you using this gadget as your internal network time server?
No, it's a multimedia server. It streams audio and picture to another device. I don't need it to be massively precise, but it may be a problem if after 6 months the clocks are +/- 5 minutes!
I'd rather use IPs for the time being. I selected the ones you saw by pinging them, as you suggested. I will scrap the stratum 1 servers - I did notify them though! - and add everything else you mentioned.
Yes, I do understand that the NTP daemon is working, but I'm still puzzled that there is not a diagnostic tool that tells us what is actually wrong. The offset is going up, we *assume* it's due to the network. It would be nice to have a set of tools that could show what is exactly going wrong and why.
(just seen the addendum,thanks, I'll give it a try!)
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
Quote:
Originally Posted by tony359
Yes, I do understand that the NTP daemon is working, but I'm still puzzled that there is not a diagnostic tool that tells us what is actually wrong. The offset is going up, we *assume* it's due to the network. It would be nice to have a set of tools that could show what is exactly going wrong and why.
It's not an assumption; the offset value shows the difference between the reference time and the system clock (in milliseconds).
For a tiny offset ntpd will adjust the local clock as usual; for small and larger offsets, ntpd will reject the reference time for a while. In the latter case the operation system's clock will continue with the last corrections effective while the new reference time is being rejected. After some time, small offsets (significantly less than a second) will be slewed (adjusted slowly), while larger offsets will cause the clock to be stepped (set anew). Huge offsets are rejected, and ntpd will terminate itself, believing something very strange must have happened. Do you see ntpd terminated in your log?
ntpq -pn displays the offsets for each reachable server in milliseconds (ntpdc -p uses seconds instead).
ntpdc -c loopinfo displays the combined offset in seconds, as seen at the last poll. If supported, ntpdc -c kerninfo will display the current remaining correction, just as ntptime does.
For example,
Code:
ntpdc -c loopinfo
offset: 0.050242 s
frequency: -9.689 ppm
poll adjust: 30
watchdog timer: 2497 s
and
Code:
ntpdc -c kerninfo
pll offset: 0.0261205 s
pll frequency: -9.689 ppm
maximum error: 2.83619 s
estimated error: 0.037673 s
status: 2001 pll nano
pll time constant: 10
precision: 1e-09 s
frequency tolerance: 500 ppm
and
Code:
ntptime
ntp_gettime() returns code 0 (OK)
time d5351340.42643ca8 Wed, May 8 2013 14:09:04.259, (.259342745),
maximum error 2875694 us, estimated error 37673 us, TAI offset 0
ntp_adjtime() returns code 0 (OK)
modes 0x0 (),
offset 25621.470 us, frequency -9.689 ppm, interval 1 s,
maximum error 2875694 us, estimated error 37673 us,
status 0x2001 (PLL,NANO),
time constant 10, precision 0.001 us, tolerance 500 ppm,
Note that yours may not display the above (ntptime) if the kernel does not support it.
Bottom line? Make sure you've got electrically close servers defined (meaning shortest ping times) or use the pool servers (meaning you most likely have to put entries in /etc/resolv.conf), start 'er up and let it run for at least three days -- it takes that long for things to really settle down.
edit: in the meantime...
Monitoring the log as you suggested, it came up with this:
Quote:
8 May 17:52:30 ntpd[26069]: logging to file /tmp/ntp.log
8 May 17:52:30 ntpd[26069]: precision = 1000.000 usec
8 May 17:52:30 ntpd[26069]: unable to bind to wildcard socket address 0.0.0.0 - another process may be running - EXITING
It seems to work, as predicted.
I think I read somewhere that there is a way to graph the offset value - or just to log it, I'll then process the file.
Any thoughts on how can I do it? I'll come back with more details later.
edit: would that work? I can't see any files in the /var/log/ntp folder, are they only created at the end of the day?
Code:
server 127.127.1.0
fudge 127.127.1.0 stratum 10
server 158.43.128.66
server 81.168.77.149
server 130.88.200.4
driftfile /etc/ntp/drift
logconfig=allclock +allpeer +allsys +allsync
logfile /var/log/ntp.log
multicastclient 224.0.1.1
broadcastdelay 0.008
restrict 127.0.0.1
statistics loopstats
statsdir /var/log/ntp/
filegen peerstats file peers type day link enable
filegen loopstats file loops type day link enable
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
So, NTP has walked your clock into synchronization? OK, that tells that everything is working as it should (and you can probably stop fiddling with it and just ignore it).
Note that there are at least two ways to log what NTP is doing. One is the simple log that gets started by
this is the NTP daemon start at boot time. That log is quite useful. Note that it logs into the /tmp directory which can simply be changed to /var/log/ntp/ntp.log.
Another way is here: If you've defined your logging to go into /var/log/ntp, you need a definition of that in /etc/ntp.conf; e.g.,
# Statistics stuff
#
statsdir /var/log/ntpstats/ # directory for statistics files
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
In either or both cases, you need to manually create the directory(ies):
Code:
su -
<root password>
mkdir -p /var/log/ntp
mkdir -p /var/log/ntpstats
Be aware that the statistics files get... well, big and they need to be dealt with on a weekly basis (and, frankly, they don't show you a heck of lot unless you've got a big server farm and you're using the box you've got to serve time to everybody else on your intranet). If you choose that route, you probably want to get a radio or GPS clock with an Ethernet connection on it and use that to serve time to your entire network (and those things ain't cheap). One server can sever time to all without a lot of difficulty (been there, did that, it works) -- NTP does not place a great load on your intranet. You do not need to mess with keys on a private network but you need to decide that for yourself.
There is a great deal of information about monitoring at file:///usr/doc/ntp-4.2.6p5/html/monopt.html (that's the NTP manual that should be installed in /usr/doc/ntp-whatever/monopt.htm on your system; if it's not in /usr/doc, look around for it or go to http://www.ntp.ogr and see the documentation pages.
Now, in most installations there are examples found in /usr/doc/ntp-whatever/scripts. In particular, /usr/doc/ntp-whatever/scripts/monitoring, where you will find a README file and a group of utilities for dealing with the statistics data (and some other stuff).
Read the README. Pay attention to the warnings. Really, pay attention to the warnings.
There are examples that feed data to gnuplot (you should have that on your system, it's kind of a standard -- if not, go get and install it from your distribution software archive). That's what you use to make pretty graphs.
Be aware that the sample utilities will most likely require a little editing for the path to the log files you've created -- the paths and file names I suggested above and elsewhere may not be what's in those example files and you'll need to twiddle some things to get them going.
At one time, some years ago, I did use the statistics to look at some 250 machines using a central NTP time server (25 Solaris boxes, 4 Linux servers [I did say this was a long time ago] and the rest desktop winders things). To be honest, it was more information than I needed and I turned it off after a month or so -- NTP does work and keeps on working if you get it configured and then just leave it alone and let it do its thing.
One final thing you'll want to do (if you're logging) is rotate the logs (they can grow quite large).
Somewhere, hopefully in /etc/logrotate.d, you will want to put this:
That will rotate the log weekly, compressing the log that was rotated and keeping 10 weeks worth (the oldest falls off the end). Notice that this example is my system and, depending upon which logging you choose (from the top of this post), you'll need to edit the location of the log files and make sure that an empty file gets created (when I'm logging, I do that in /etc/rc.d/rc.ntpd with
Code:
# Empty the log file
>/var/log/ntp.log
which is executed when the daemon gets started or restarted.
It'll take some fiddling and twiddling and reviewing what you get until you're happy with the information (and not, you know, overwhelmed by it). Take it slow, try one thing at time and you'll get there.
Thanks again, you're a mine of clear and precious information.
I cannot "leave" it unfortunately! I'll explain why: as mentioned, this are multimedia servers. The NTP.conf is created during boot by the software that runs on the server. When configuring this software, you are allowed to enter only ONE NTP server, which is then embedded in their configuration, which is similar to the one I quoted in my first post - but only with one server.
Now, I have found that the NTP does not work on these servers, apparently we now know why. I am working with the manufacturer - which came up with the suggestion I mentioned. I need to show them that my configuration works. To do so, I need to graph the offset, to confirm them that it eventually settles around zero. I will graph another machine with the configuration the manufacturer suggested and then I'll report to them.
Bottom line, I need the manufacturer to fix the configuration in their configuration files! And I need to show them that there is a problem with the configurations they provide.
I have amended the configuration with the statistics you suggested. Am I expecting to see something happening in /var/log/ntpstats/ in the short terms or will the file be created after a day?
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
Rep:
You have init.d, so that's what you want to use to start, stop and restart the daemon. It's just a file, you can look at it (it's similar to the /etc/rc.d/rc.ntpd on my machines). There should be a start, stop, restart in it and you should be able to
Code:
/etc/init.d/ntp restart
If you make any changes to /etc/ntp.conf you will have to restart the daemon (and be sure to only have one instance of the daemon running; use ps -ef | grep ntpd, you should see only one. If there are more than one, you can manually kill the PID for all of them with
Code:
kill -9 PID
where PID is the number shown by ps -ef | grep ntpd; e.g.,
(don't bother killing the "grep" one, it won't exist).
Then start the daemon with
Code:
/etc/init.d/ntp start
The status stuff doesn't get generated too often so you won't see content for a while (I don't remember how often, but it's not minute-by-minute, more like hours-by-hours, maybe lots of hours).
I just love systems that vendors lock in so that you can't do anything with them (it's the Microsoft Click-'n'-Drool school taken from the sublime to the ridiculous, I think). They figure you're too stupid to read and follow directions so they do it to you like it or not. Kind of guys I won't buy anything from, them.
Question 1: after a day or more still no sign of files in /var/log/ntpstats.
Question 2: today I log in to find that all the NTP servers are marked as "reject", the reach field said "7" then moved to "17" i think, then "3" which I understand it's not a good thing. I have monitored the first server and it looks like it's not responding? Please see below.
I can try and restart the NTP, but I would like to know the root cause first!
This is my current configuration
Code:
server 127.127.1.0
fudge 127.127.1.0 stratum 10
server 158.43.128.66
server 81.168.77.149
server 130.88.200.4
driftfile /etc/ntp/drift
logconfig=allclock +allpeer +allsys +allsync
logfile /var/log/ntp.log
multicastclient 224.0.1.1
broadcastdelay 0.008
restrict 127.0.0.1
statsdir /var/log/ntpstats/
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
My ntp.log does not show anything since the 13th, I must have done something wrong. The only thing I have added was that "logconfig", is that ok?
cat /var/log/ntp.log
16 May 08:13:04 ntpd[32453]: proto: precision = 2.334 usec
16 May 08:13:04 ntpd[32453]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
16 May 08:13:04 ntpd[32453]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
16 May 08:13:04 ntpd[32453]: Listen and drop on 1 v6wildcard :: UDP 123
16 May 08:13:04 ntpd[32453]: Listen normally on 2 lo 127.0.0.1 UDP 123
16 May 08:13:04 ntpd[32453]: Listen normally on 3 eth0 192.168.1.10 UDP 123
16 May 08:13:04 ntpd[32453]: Listen normally on 4 eth0 fe80::210:18ff:fe8a:82c1 UDP 123
16 May 08:13:04 ntpd[32453]: Listen normally on 5 lo ::1 UDP 123
16 May 08:13:04 ntpd[32453]: peers refreshed
16 May 08:13:04 ntpd[32453]: Listening on routing socket on fd #22 for interface updates
16 May 08:13:04 ntpd[32453]: Listen normally on 6 multicast 224.0.1.1 UDP 123
16 May 08:13:04 ntpd[32453]: Joined 224.0.1.1 socket to multicast group 224.0.1.1
The statistics (in /etc/ntp.conf:
Code:
# Statistics stuff
#
statsdir /var/log/ntpstats/ # directory for statistics files
filegen peerstats file peerstats type day enable
filegen loopstats file loopstats type day enable
filegen clockstats file clockstats type day enable
The content of /var/log/ntpstats:
Code:
ls -l /var/log/ntpstats
total 16
-rw-r--r-- 2 root root 295 May 16 08:17 loopstats
-rw-r--r-- 2 root root 295 May 16 08:17 loopstats.20130516
-rw-r--r-- 2 root root 989 May 16 08:18 peerstats
-rw-r--r-- 2 root root 989 May 16 08:18 peerstats.20130516
Each of those files have content; e.g., loopstats:
At this writing, NTP has not synchronized (takes a few minutes -- it's still looking at localhost) and
Code:
ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
*127.127.1.0 .LOCL. 10 l 55 64 177 0.000 0.000 0.002
69.164.217.193 128.59.59.177 3 u 43 64 177 1270.55 25.188 28.224
50.116.55.161 192.5.41.40 2 u 45 64 177 1328.45 -35.175 42.605
128.113.28.67 18.26.4.105 2 u 43 64 177 1313.59 -16.841 56.794
OK, it synchronized:
Code:
ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 65 64 376 0.000 0.000 0.002
+69.164.217.193 128.59.59.177 3 u 43 64 377 1066.87 -99.831 128.517
+50.116.55.161 192.5.41.40 2 u 46 64 377 1381.67 -22.610 90.407
*128.113.28.67 18.26.4.105 2 u 46 64 377 1325.91 26.532 36.315
It'll take some time for it to settle down and, maybe, reject one or both of the "+" addresses, but the offset to the "*" is 26.532 milliseconds so it's a happy camper at the moment. There is no indication in the log file that it synchronized (and won't be for a while) and there haven't been any "throw out one of these and get another time source" messages (there will be, but not for some time and not too often -- it'll only change if a better source of time is found when it analyzes periodically).
Note that I did not turn on logging in /etc/ntp.conf, I used the -l /var/log/ntp.og in the start up command (the one in /etc/init.d for you, as above). I'll let everything settle down for while then change that for the internal logging and see what happens (actually, I know that it'll work, I just don't find that logging interesting and prefer the "command line" option). Actually, I've shut logging and statistics off a year or two ago on all servers and don't really remember all that much about either anymore. Sigh.
OK, things have settled down:
Code:
ntpq -pn
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 920 64 0 0.000 0.000 0.000
+69.164.217.193 128.59.59.177 3 u 39 128 377 1312.52 38.031 35.231
*50.116.55.161 192.5.41.40 2 u 119 128 377 1316.28 45.418 20.799
+128.113.28.67 18.26.4.105 2 u 43 128 377 1403.99 47.553 51.225
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.