Linux - Server This forum is for the discussion of Linux Software used in a server related context. |
Notices |
Welcome to LinuxQuestions.org, a friendly and active Linux Community.
You are currently viewing LQ as a guest. By joining our community you will have the ability to post topics, receive our newsletter, use the advanced search, subscribe to threads and access many other special features. Registration is quick, simple and absolutely free. Join our community today!
Note that registered members see fewer ads, and ContentLink is completely disabled once you log in.
Are you new to LinuxQuestions.org? Visit the following links:
Site Howto |
Site FAQ |
Sitemap |
Register Now
If you have any problems with the registration process or your account login, please contact us. If you need to reset your password, click here.
Having a problem logging in? Please visit this page to clear all LQ-related cookies.
Get a virtual cloud desktop with the Linux distro that you want in less than five minutes with Shells! With over 10 pre-installed distros to choose from, the worry-free installation life is here! Whether you are a digital nomad or just looking for flexibility, Shells can put your Linux machine on the device that you want to use.
Exclusive for LQ members, get up to 45% off per month. Click here for more info.
|
 |
05-19-2011, 03:37 AM
|
#1
|
LQ Newbie
Registered: Mar 2011
Posts: 10
Rep:
|
ntp Offset value too high
Hi guys,
i have a problem here whereby some 20 over servers in my network is having regular Offset values. The configured threshold is 55, but whenever it exceeds 55, i have alerts coming in. My questions are;
1)how do i find out what is causing the Offset value to be high?
2)i have workaround in mind, that is to create a cronjob to restart the ntp daily, or hourly - is this workaround an acceptable practice?
My main concern is to find out what is causing the frequent offset spike/increase...please advise on this...
real example from my servers:
-bash-2.05b$ /usr/sbin/ntpq -np -c assoc
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.211.0.246 57.7.62.97 4 u 111 128 377 0.679 -0.689 0.159
+10.211.0.254 57.7.62.97 4 u 109 128 377 288.260 144.072 143.251
ind assID status conf reach auth condition last_event cnt
===========================================================
1 47484 9614 yes yes none sys.peer reachable 1
2 47485 9414 yes yes none candidat reachable 1
root@server:~# /usr/sbin/ntpq -np -c assoc
remote refid st t when poll reach delay offset jitter
==============================================================================
10.211.0.246 57.7.62.97 4 u 22 64 1 0.439 0.044 0.004
10.211.0.254 57.7.62.97 4 u 23 64 1 1245.50 623.659 0.004
ind assID status conf reach auth condition last_event cnt
===========================================================
1 40284 9014 yes yes none reject reachable 1
2 40285 9014 yes yes none reject reachable 1
|
|
|
05-19-2011, 08:26 AM
|
#2
|
Senior Member
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
|
I believe that the offset value is an indication of how long it takes to get to and from the server; i.e., if you were to
Code:
ping -c 5 10.211.0.246
and see what you get versus
Code:
ping -c 5 10.211.0.254
you'd probably see a significant difference in the time= value returned by ping.
My systems are set up
Code:
...
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
#server pool.ntp.org
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
...
which, at this time, returns
Code:
remote refid st t when poll reach delay offset jitter
==============================================================================
127.127.1.0 .LOCL. 10 l 8h 64 0 0.000 0.000 0.000
+69.65.40.29 192.43.244.18 2 u 751 1024 177 1328.12 114.156 133.471
+97.107.134.28 128.4.40.12 3 u 524 1024 377 1331.00 -0.803 40.868
*149.20.68.17 204.123.2.5 2 u 705 1024 377 1127.83 44.828 50.905
ind assid status conf reach auth condition last_event cnt
===========================================================
1 38317 80e3 yes no none reject unreachable 14
2 38318 941a yes yes none candidate sys_peer 1
3 38319 9414 yes yes none candidate reachable 1
4 38320 961a yes yes none sys.peer sys_peer 1
And pinging
Code:
ping -c 5 69.65.40.29
PING 69.65.40.29 (69.65.40.29) 56(84) bytes of data.
64 bytes from 69.65.40.29: icmp_req=1 ttl=48 time=786 ms
64 bytes from 69.65.40.29: icmp_req=2 ttl=48 time=954 ms
64 bytes from 69.65.40.29: icmp_req=3 ttl=48 time=1097 ms
64 bytes from 69.65.40.29: icmp_req=4 ttl=48 time=1294 ms
64 bytes from 69.65.40.29: icmp_req=5 ttl=48 time=880 ms
--- 69.65.40.29 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 786.314/1002.781/1294.949/178.120 ms, pipe 2
ping -c 5 97.107.134.28
PING 97.107.134.28 (97.107.134.28) 56(84) bytes of data.
64 bytes from 97.107.134.28: icmp_req=2 ttl=47 time=980 ms
64 bytes from 97.107.134.28: icmp_req=3 ttl=47 time=1000 ms
64 bytes from 97.107.134.28: icmp_req=4 ttl=47 time=1327 ms
64 bytes from 97.107.134.28: icmp_req=5 ttl=47 time=690 ms
--- 97.107.134.28 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 3999ms
rtt min/avg/max/mdev = 690.951/999.724/1327.038/225.202 ms, pipe 2
ping -c 5 149.20.68.17
PING 149.20.68.17 (149.20.68.17) 56(84) bytes of data.
64 bytes from 149.20.68.17: icmp_req=1 ttl=50 time=701 ms
64 bytes from 149.20.68.17: icmp_req=2 ttl=50 time=1028 ms
64 bytes from 149.20.68.17: icmp_req=3 ttl=50 time=964 ms
64 bytes from 149.20.68.17: icmp_req=4 ttl=50 time=1322 ms
64 bytes from 149.20.68.17: icmp_req=5 ttl=50 time=681 ms
--- 149.20.68.17 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3998ms
rtt min/avg/max/mdev = 681.071/939.626/1322.892/236.216 ms, pipe 2
tells the tale (I'm on a satellite and the turn-around isn't always the best -- there is always a delay -- plus it's raining today). The packet loss value tells a real tale -- that you don't ever want to see.
Basically, your intranet or internet connection causes this kind of stuff. Looking at the location of two servers, 10.211.0.246 (I suspect is your private network) and 57.7.62.97 is in France? Have you tried using three pool servers (as above, but rather than "0.us.pool" using "0.fr.pool" or perhaps "0.eu.pool" or whatever is electrically close to you). Also, you may try adding the
Code:
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
entries for fall-back when or if the internet goes away.
NTP does a real good job of synchronizing with pool servers, dropping "bad" ones and adding new ones as time goes on (if you check your NTP log, you'll see that happen periodically). If you have a large intranet, it may be worthwhile to buy a GPS receiver (read: expensive) and use one of your servers as the time server for your intranet. Consider too whether your intranet may be a problem -- if your servers are far-flung, do you have highly reliable communications between them; i.e., what kind of ping times do you get, how much traffic is there and is that bogging you down, things like that.
So, to answer your two questions, the high offset values are network related -- try using three pool servers (not specific IP addresses) and see what results. Restarting the NTP daemon hourly or daily will probably not be too useful -- it's most likely a communications problem.
If you take a look at the lists of public NTP servers at http://www.ntp.org/ you could identify three servers that are electrically close to you (the best ping times) and use those (although using the pool servers would most likely be the best bet rather than specific IP addresses). You should look for stratum 2 servers, stratum 1 servers should only be used with permission (and using stratum 1 servers is discouraged in any event).
Hope this helps some.
|
|
1 members found this post helpful.
|
05-20-2011, 04:53 AM
|
#3
|
LQ Newbie
Registered: Mar 2011
Posts: 10
Original Poster
Rep:
|
hi tronayne,
thanks for the reply. i think i can't really rule out if there is any network latency in the private network at which these servers are located.however the ping results do look good and i have pasted the tcpdump output ran on port udp/123. i have totally no clue what the output says. The offset values at times spikes up to 400-600 and returns normal after a restart of the NTP daemon. am really kinda confused and stressed out as i am performing an RCA for this thing...
Code:
root@server:/# ntpstat
synchronised to NTP server (10.211.0.246) at stratum 5
time correct to within 51 ms
polling server every 1024 s
root@server:/# more /etc/ntp.conf
server 10.211.0.246
server 10.211.0.254
authenticate no
driftfile /var/lib/ntp/drift
root@server:/# more /var/lib/ntp/drift
-20.844
root@server:/# ping -c 5 10.211.0.246
PING 10.211.0.246 (10.211.0.246) 56(84) bytes of data.
64 bytes from 10.211.0.246: icmp_seq=0 ttl=250 time=0.517 ms
64 bytes from 10.211.0.246: icmp_seq=1 ttl=250 time=0.510 ms
64 bytes from 10.211.0.246: icmp_seq=2 ttl=250 time=0.492 ms
64 bytes from 10.211.0.246: icmp_seq=3 ttl=250 time=0.471 ms
64 bytes from 10.211.0.246: icmp_seq=4 ttl=250 time=0.542 ms
--- 10.211.0.246 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4002ms
rtt min/avg/max/mdev = 0.471/0.506/0.542/0.031 ms, pipe 2
root@server:/#
root@server:/#
root@server:/# ping -c 5 10.211.0.254
PING 10.211.0.254 (10.211.0.254) 56(84) bytes of data.
64 bytes from 10.211.0.254: icmp_seq=0 ttl=250 time=73.7 ms
64 bytes from 10.211.0.254: icmp_seq=1 ttl=250 time=0.769 ms
64 bytes from 10.211.0.254: icmp_seq=2 ttl=250 time=0.737 ms
64 bytes from 10.211.0.254: icmp_seq=3 ttl=250 time=0.776 ms
64 bytes from 10.211.0.254: icmp_seq=4 ttl=250 time=0.945 ms
--- 10.211.0.254 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 0.737/15.394/73.745/29.175 ms, pipe 2
root@server:/#
root@server:/# ping -c 5 10.211.0.254
PING 10.211.0.254 (10.211.0.254) 56(84) bytes of data.
64 bytes from 10.211.0.254: icmp_seq=0 ttl=250 time=0.891 ms
64 bytes from 10.211.0.254: icmp_seq=1 ttl=250 time=25.4 ms
64 bytes from 10.211.0.254: icmp_seq=2 ttl=250 time=4.04 ms
64 bytes from 10.211.0.254: icmp_seq=3 ttl=250 time=26.6 ms
64 bytes from 10.211.0.254: icmp_seq=4 ttl=250 time=45.5 ms
--- 10.211.0.254 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 0.891/20.533/45.596/16.418 ms, pipe 2
07:01:54.579380 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
Client, Leap indicator: (0), Stratum 5, poll 9s, precision -20
Root Delay: 0.276870, Root dispersion: 0.033279, Reference-ID: ntpserver
Reference Timestamp: 3514769981.448965013 (2011/05/19 06:59:41)
Originator Timestamp: 3514769600.050399731 (2011/05/19 06:53:20)
Receive Timestamp: 3514769600.060997001 (2011/05/19 06:53:20)
Transmit Timestamp: 3514770114.579110026 (2011/05/19 07:01:54)
Originator - Receive Timestamp: +0.010597269
Originator - Transmit Timestamp: +514.528710246
07:10:29.092099 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
Client, Leap indicator: (0), Stratum 5, poll 10s, precision -20
Root Delay: 0.276870, Root dispersion: 0.040985, Reference-ID: ntpserver
Reference Timestamp: 3514769981.448965013 (2011/05/19 06:59:41)
Originator Timestamp: 3514770114.563525557 (2011/05/19 07:01:54)
Receive Timestamp: 3514770114.580267012 (2011/05/19 07:01:54)
Transmit Timestamp: 3514770629.092008002 (2011/05/19 07:10:29)
Originator - Receive Timestamp: +0.016741460
Originator - Transmit Timestamp: +514.528482437
07:27:34.122123 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
Client, Leap indicator: (0), Stratum 5, poll 10s, precision -20
Root Delay: 0.277374, Root dispersion: 0.040328, Reference-ID: ntpserver
Reference Timestamp: 3514771009.478163987 (2011/05/19 07:16:49)
Originator Timestamp: 3514770629.084662914 (2011/05/19 07:10:29)
Receive Timestamp: 3514770629.093208000 (2011/05/19 07:10:29)
Transmit Timestamp: 3514771654.122050002 (2011/05/19 07:27:34)
Originator - Receive Timestamp: +0.008545082
Originator - Transmit Timestamp: +1025.037387084
07:44:40.151972 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
Client, Leap indicator: (0), Stratum 5, poll 10s, precision -20
Root Delay: 0.276748, Root dispersion: 0.051666, Reference-ID: ntpserver
Reference Timestamp: 3514772036.506492018 (2011/05/19 07:33:56)
Originator Timestamp: 3514771654.112666256 (2011/05/19 07:27:34)
Receive Timestamp: 3514771654.123115003 (2011/05/19 07:27:34)
Transmit Timestamp: 3514772680.151896998 (2011/05/19 07:44:40)
Originator - Receive Timestamp: +0.010448744
Originator - Transmit Timestamp: +1026.039230745
08:01:46.188465 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
Client, Leap indicator: (0), Stratum 5, poll 10s, precision -20
Root Delay: 0.277664, Root dispersion: 0.055419, Reference-ID: ntpserver
Reference Timestamp: 3514773062.539682984 (2011/05/19 07:51:02)
Originator Timestamp: 3514772680.148708224 (2011/05/19 07:44:40)
Receive Timestamp: 3514772680.152939006 (2011/05/19 07:44:40)
Transmit Timestamp: 3514773706.188389003 (2011/05/19 08:01:46)
Originator - Receive Timestamp: +0.004230779
Originator - Transmit Timestamp: +1026.039680778
08:18:50.217918 IP (tos 0x10, ttl 64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
--More--(65%)
|
|
|
05-20-2011, 08:53 AM
|
#4
|
Senior Member
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541
|
Well, I haven't a clue about what the tcpdump output says either; however, your ping output looks pretty good -- I sure wouldn't complain about 0.xxx ms. Frankly, I wouldn't be too concerned about a 55 ms offset either. Even through your ping times on 10.211.0.254 do bounce around quite a bit, that's your secondary server and your primary looks rock solid.
Have you tried specifying a log file; e.g., defining the logfile in /etc/ntp.conf
Code:
logfile /var/log/ntpd.log
Or, in the system start-up that launches the daemon; e.g.,
Code:
/usr/sbin/ntpd -g -p /var/run/ntpd.pid -l /tmp/ntp.log
After NTPD runs for a while and synchronizes typical log entries will look something like
Code:
18 May 12:17:08 ntpd[1869]: proto: precision = 1.789 usec
18 May 12:17:08 ntpd[1869]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
18 May 12:17:08 ntpd[1869]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
18 May 12:17:08 ntpd[1869]: Listen and drop on 1 v6wildcard :: UDP 123
18 May 12:17:08 ntpd[1869]: Listen normally on 2 lo 127.0.0.1 UDP 123
18 May 12:17:08 ntpd[1869]: Listen normally on 3 eth0 192.168.1.10 UDP 123
18 May 12:17:08 ntpd[1869]: Listen normally on 4 lo ::1 UDP 123
18 May 12:17:08 ntpd[1869]: peers refreshed
18 May 12:17:08 ntpd[1869]: Listen normally on 5 multicast 224.0.1.1 UDP 123
18 May 12:17:08 ntpd[1869]: Joined 224.0.1.1 socket to multicast group 224.0.1.1
18 May 12:17:29 ntpd[1869]: Listen normally on 6 eth0 fe80::210:18ff:fe8a:82c1 UDP 123
18 May 12:17:29 ntpd[1869]: peers refreshed
18 May 12:17:29 ntpd[1869]: new interface(s) found: waking up resolver
Another thing you may want to look into is the statistics functions provided in the documentation directory that should be on your system (mine is found in /usr/doc/ntp-4.2.6p3/scripts/stats). A study of the README files found in that directory may make it worth a little time to implement the statistics functions -- you may need to fiddle with them to adapt to your system but they can be quite useful in tracking down problems.
This is just a thought -- your servers appear to be synchronizing with your time server (by the way, where are 10.211.0.246 and 10.211.0.254 getting their time synchronization from -- a reference clock, a stratum 2 server, a pool of outside servers, and what do their logs look like?).
If you're synchronized and you stay synchronized and you're not concerned with atomic-clock accuracy, is there really a problem?
And, one last thing, just for the sake of "maybe this'll work for you," this is what my /etc/ntp.conf files look like on all servers (using external pool servers rather than a local time server just because the boxes tend to get taken in and out of service periodically):
Code:
# Sample /etc/ntp.conf: Configuration file for ntpd.
#
# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available. The
# default stratum is usually 3, but in this case we elect to use stratum
# 0. Since the server line does not have the prefer keyword, this driver
# is never used for synchronization, unless no other other
# synchronization source is available. In case the local host is
# controlled by some external source, such as an external oscillator or
# another protocol, the prefer keyword would cause the local host to
# disregard all other synchronization sources, unless the kernel
# modifications are in use and declare an unsynchronized condition.
#
server 127.127.1.0 # local clock
fudge 127.127.1.0 stratum 10
#server pool.ntp.org
server 0.us.pool.ntp.org
server 1.us.pool.ntp.org
server 2.us.pool.ntp.org
#
# Drift file. Put this in a directory which the daemon can write to.
# No symbolic links allowed, either, since the daemon updates the file
# by creating a temporary in the same directory and then rename()'ing
# it to the file.
#
driftfile /etc/ntp/drift
multicastclient 224.0.1.1
broadcastdelay 0.008
#
# Keys file. If you want to diddle your server at run time, make a
# keys file (mode 600 for sure) and define the key number to be
# used for making requests.
# PLEASE DO NOT USE THE DEFAULT VALUES HERE. Pick your own, or remote
# systems might be able to reset your clock at will.
#
#keys /etc/ntp/keys
#trustedkey 65535
#requestkey 65535
#controlkey 65535
# Don't serve time or stats to anyone else by default (more secure)
restrict default noquery nomodify
# Trust ourselves. :-)
restrict 127.0.0.1
The above is bare-bones (it's the file that comes with the system) and, well, works just fine, been using pretty much the same thing for years now (the only thing I do is add the three pool servers).
Other stuff that may be worth a look is your network, particularly routers and switches and, you know, cables and plugs too. Bad connections or bad router channels can lead to a lot grief.
Bottom line, enable statistics gathering, edit the functions in the stats directory (above) and use them to analyze and correct problems; bear in mind that statistics are gathered periodically and you'll need to let thing run for a day or two the get meaningful information.
Hope this helps some.
|
|
|
All times are GMT -5. The time now is 11:44 PM.
|
LinuxQuestions.org is looking for people interested in writing
Editorials, Articles, Reviews, and more. If you'd like to contribute
content, let us know.
|
Latest Threads
LQ News
|
|