LinuxQuestions.org
Share your knowledge at the LQ Wiki.
Go Back   LinuxQuestions.org > Forums > Linux Forums > Linux - Server
User Name
Password
Linux - Server This forum is for the discussion of Linux Software used in a server related context.

Notices


Reply
  Search this Thread
Old 05-19-2011, 03:37 AM   #1
newcranium
LQ Newbie
 
Registered: Mar 2011
Posts: 10

Rep: Reputation: 0
Unhappy ntp Offset value too high


Hi guys,

i have a problem here whereby some 20 over servers in my network is having regular Offset values. The configured threshold is 55, but whenever it exceeds 55, i have alerts coming in. My questions are;

1)how do i find out what is causing the Offset value to be high?
2)i have workaround in mind, that is to create a cronjob to restart the ntp daily, or hourly - is this workaround an acceptable practice?


My main concern is to find out what is causing the frequent offset spike/increase...please advise on this...


real example from my servers:

-bash-2.05b$ /usr/sbin/ntpq -np -c assoc
remote refid st t when poll reach delay offset jitter
==============================================================================
*10.211.0.246 57.7.62.97 4 u 111 128 377 0.679 -0.689 0.159
+10.211.0.254 57.7.62.97 4 u 109 128 377 288.260 144.072 143.251
ind assID status conf reach auth condition last_event cnt
===========================================================
1 47484 9614 yes yes none sys.peer reachable 1
2 47485 9414 yes yes none candidat reachable 1


root@server:~# /usr/sbin/ntpq -np -c assoc
remote refid st t when poll reach delay offset jitter
==============================================================================
10.211.0.246 57.7.62.97 4 u 22 64 1 0.439 0.044 0.004
10.211.0.254 57.7.62.97 4 u 23 64 1 1245.50 623.659 0.004
ind assID status conf reach auth condition last_event cnt
===========================================================
1 40284 9014 yes yes none reject reachable 1
2 40285 9014 yes yes none reject reachable 1
 
Old 05-19-2011, 08:26 AM   #2
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
I believe that the offset value is an indication of how long it takes to get to and from the server; i.e., if you were to
Code:
ping -c 5 10.211.0.246
and see what you get versus
Code:
ping -c 5 10.211.0.254
you'd probably see a significant difference in the time= value returned by ping.

My systems are set up
Code:
...
server  127.127.1.0     # local clock
fudge   127.127.1.0 stratum 10
#server  pool.ntp.org
server  0.us.pool.ntp.org
server  1.us.pool.ntp.org
server  2.us.pool.ntp.org
...
which, at this time, returns
Code:
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
 127.127.1.0     .LOCL.          10 l   8h   64    0    0.000    0.000   0.000
+69.65.40.29     192.43.244.18    2 u  751 1024  177  1328.12  114.156 133.471
+97.107.134.28   128.4.40.12      3 u  524 1024  377  1331.00   -0.803  40.868
*149.20.68.17    204.123.2.5      2 u  705 1024  377  1127.83   44.828  50.905

ind assid status  conf reach auth condition  last_event cnt
===========================================================
  1 38317  80e3   yes    no  none    reject unreachable 14
  2 38318  941a   yes   yes  none candidate    sys_peer  1
  3 38319  9414   yes   yes  none candidate   reachable  1
  4 38320  961a   yes   yes  none  sys.peer    sys_peer  1
And pinging
Code:
ping -c 5 69.65.40.29
PING 69.65.40.29 (69.65.40.29) 56(84) bytes of data.
64 bytes from 69.65.40.29: icmp_req=1 ttl=48 time=786 ms
64 bytes from 69.65.40.29: icmp_req=2 ttl=48 time=954 ms
64 bytes from 69.65.40.29: icmp_req=3 ttl=48 time=1097 ms
64 bytes from 69.65.40.29: icmp_req=4 ttl=48 time=1294 ms
64 bytes from 69.65.40.29: icmp_req=5 ttl=48 time=880 ms

--- 69.65.40.29 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 786.314/1002.781/1294.949/178.120 ms, pipe 2

ping -c 5 97.107.134.28
PING 97.107.134.28 (97.107.134.28) 56(84) bytes of data.
64 bytes from 97.107.134.28: icmp_req=2 ttl=47 time=980 ms
64 bytes from 97.107.134.28: icmp_req=3 ttl=47 time=1000 ms
64 bytes from 97.107.134.28: icmp_req=4 ttl=47 time=1327 ms
64 bytes from 97.107.134.28: icmp_req=5 ttl=47 time=690 ms

--- 97.107.134.28 ping statistics ---
5 packets transmitted, 4 received, 20% packet loss, time 3999ms
rtt min/avg/max/mdev = 690.951/999.724/1327.038/225.202 ms, pipe 2

ping -c 5 149.20.68.17
PING 149.20.68.17 (149.20.68.17) 56(84) bytes of data.
64 bytes from 149.20.68.17: icmp_req=1 ttl=50 time=701 ms
64 bytes from 149.20.68.17: icmp_req=2 ttl=50 time=1028 ms
64 bytes from 149.20.68.17: icmp_req=3 ttl=50 time=964 ms
64 bytes from 149.20.68.17: icmp_req=4 ttl=50 time=1322 ms
64 bytes from 149.20.68.17: icmp_req=5 ttl=50 time=681 ms

--- 149.20.68.17 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 3998ms
rtt min/avg/max/mdev = 681.071/939.626/1322.892/236.216 ms, pipe 2
tells the tale (I'm on a satellite and the turn-around isn't always the best -- there is always a delay -- plus it's raining today). The packet loss value tells a real tale -- that you don't ever want to see.

Basically, your intranet or internet connection causes this kind of stuff. Looking at the location of two servers, 10.211.0.246 (I suspect is your private network) and 57.7.62.97 is in France? Have you tried using three pool servers (as above, but rather than "0.us.pool" using "0.fr.pool" or perhaps "0.eu.pool" or whatever is electrically close to you). Also, you may try adding the
Code:
server  127.127.1.0     # local clock
fudge   127.127.1.0 stratum 10
entries for fall-back when or if the internet goes away.

NTP does a real good job of synchronizing with pool servers, dropping "bad" ones and adding new ones as time goes on (if you check your NTP log, you'll see that happen periodically). If you have a large intranet, it may be worthwhile to buy a GPS receiver (read: expensive) and use one of your servers as the time server for your intranet. Consider too whether your intranet may be a problem -- if your servers are far-flung, do you have highly reliable communications between them; i.e., what kind of ping times do you get, how much traffic is there and is that bogging you down, things like that.

So, to answer your two questions, the high offset values are network related -- try using three pool servers (not specific IP addresses) and see what results. Restarting the NTP daemon hourly or daily will probably not be too useful -- it's most likely a communications problem.

If you take a look at the lists of public NTP servers at http://www.ntp.org/ you could identify three servers that are electrically close to you (the best ping times) and use those (although using the pool servers would most likely be the best bet rather than specific IP addresses). You should look for stratum 2 servers, stratum 1 servers should only be used with permission (and using stratum 1 servers is discouraged in any event).

Hope this helps some.
 
1 members found this post helpful.
Old 05-20-2011, 04:53 AM   #3
newcranium
LQ Newbie
 
Registered: Mar 2011
Posts: 10

Original Poster
Rep: Reputation: 0
hi tronayne,

thanks for the reply. i think i can't really rule out if there is any network latency in the private network at which these servers are located.however the ping results do look good and i have pasted the tcpdump output ran on port udp/123. i have totally no clue what the output says. The offset values at times spikes up to 400-600 and returns normal after a restart of the NTP daemon. am really kinda confused and stressed out as i am performing an RCA for this thing...

Code:
root@server:/# ntpstat
synchronised to NTP server (10.211.0.246) at stratum 5
   time correct to within 51 ms
   polling server every 1024 s

root@server:/# more /etc/ntp.conf
server 10.211.0.246
server 10.211.0.254
authenticate no
driftfile /var/lib/ntp/drift

root@server:/# more /var/lib/ntp/drift
-20.844



root@server:/# ping -c 5 10.211.0.246
PING 10.211.0.246 (10.211.0.246) 56(84) bytes of data.
64 bytes from 10.211.0.246: icmp_seq=0 ttl=250 time=0.517 ms
64 bytes from 10.211.0.246: icmp_seq=1 ttl=250 time=0.510 ms
64 bytes from 10.211.0.246: icmp_seq=2 ttl=250 time=0.492 ms
64 bytes from 10.211.0.246: icmp_seq=3 ttl=250 time=0.471 ms
64 bytes from 10.211.0.246: icmp_seq=4 ttl=250 time=0.542 ms

--- 10.211.0.246 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4002ms
rtt min/avg/max/mdev = 0.471/0.506/0.542/0.031 ms, pipe 2
root@server:/#
root@server:/#
root@server:/# ping -c 5 10.211.0.254
PING 10.211.0.254 (10.211.0.254) 56(84) bytes of data.
64 bytes from 10.211.0.254: icmp_seq=0 ttl=250 time=73.7 ms
64 bytes from 10.211.0.254: icmp_seq=1 ttl=250 time=0.769 ms
64 bytes from 10.211.0.254: icmp_seq=2 ttl=250 time=0.737 ms
64 bytes from 10.211.0.254: icmp_seq=3 ttl=250 time=0.776 ms
64 bytes from 10.211.0.254: icmp_seq=4 ttl=250 time=0.945 ms

--- 10.211.0.254 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4005ms
rtt min/avg/max/mdev = 0.737/15.394/73.745/29.175 ms, pipe 2
root@server:/#
root@server:/# ping -c 5 10.211.0.254
PING 10.211.0.254 (10.211.0.254) 56(84) bytes of data.
64 bytes from 10.211.0.254: icmp_seq=0 ttl=250 time=0.891 ms
64 bytes from 10.211.0.254: icmp_seq=1 ttl=250 time=25.4 ms
64 bytes from 10.211.0.254: icmp_seq=2 ttl=250 time=4.04 ms
64 bytes from 10.211.0.254: icmp_seq=3 ttl=250 time=26.6 ms
64 bytes from 10.211.0.254: icmp_seq=4 ttl=250 time=45.5 ms

--- 10.211.0.254 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4006ms
rtt min/avg/max/mdev = 0.891/20.533/45.596/16.418 ms, pipe 2




07:01:54.579380 IP (tos 0x10, ttl  64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
        Client, Leap indicator:  (0), Stratum 5, poll 9s, precision -20
        Root Delay: 0.276870, Root dispersion: 0.033279, Reference-ID: ntpserver
          Reference Timestamp:  3514769981.448965013 (2011/05/19 06:59:41)
          Originator Timestamp: 3514769600.050399731 (2011/05/19 06:53:20)
          Receive Timestamp:    3514769600.060997001 (2011/05/19 06:53:20)
          Transmit Timestamp:   3514770114.579110026 (2011/05/19 07:01:54)
            Originator - Receive Timestamp:  +0.010597269
            Originator - Transmit Timestamp: +514.528710246

07:10:29.092099 IP (tos 0x10, ttl  64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
        Client, Leap indicator:  (0), Stratum 5, poll 10s, precision -20
        Root Delay: 0.276870, Root dispersion: 0.040985, Reference-ID: ntpserver
          Reference Timestamp:  3514769981.448965013 (2011/05/19 06:59:41)
          Originator Timestamp: 3514770114.563525557 (2011/05/19 07:01:54)
          Receive Timestamp:    3514770114.580267012 (2011/05/19 07:01:54)
          Transmit Timestamp:   3514770629.092008002 (2011/05/19 07:10:29)
            Originator - Receive Timestamp:  +0.016741460
            Originator - Transmit Timestamp: +514.528482437

07:27:34.122123 IP (tos 0x10, ttl  64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
        Client, Leap indicator:  (0), Stratum 5, poll 10s, precision -20
        Root Delay: 0.277374, Root dispersion: 0.040328, Reference-ID: ntpserver
          Reference Timestamp:  3514771009.478163987 (2011/05/19 07:16:49)
          Originator Timestamp: 3514770629.084662914 (2011/05/19 07:10:29)
          Receive Timestamp:    3514770629.093208000 (2011/05/19 07:10:29)
          Transmit Timestamp:   3514771654.122050002 (2011/05/19 07:27:34)
            Originator - Receive Timestamp:  +0.008545082
            Originator - Transmit Timestamp: +1025.037387084

07:44:40.151972 IP (tos 0x10, ttl  64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
        Client, Leap indicator:  (0), Stratum 5, poll 10s, precision -20
        Root Delay: 0.276748, Root dispersion: 0.051666, Reference-ID: ntpserver
          Reference Timestamp:  3514772036.506492018 (2011/05/19 07:33:56)
          Originator Timestamp: 3514771654.112666256 (2011/05/19 07:27:34)
          Receive Timestamp:    3514771654.123115003 (2011/05/19 07:27:34)
          Transmit Timestamp:   3514772680.151896998 (2011/05/19 07:44:40)
            Originator - Receive Timestamp:  +0.010448744
            Originator - Transmit Timestamp: +1026.039230745
08:01:46.188465 IP (tos 0x10, ttl  64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
        Client, Leap indicator:  (0), Stratum 5, poll 10s, precision -20
        Root Delay: 0.277664, Root dispersion: 0.055419, Reference-ID: ntpserver
          Reference Timestamp:  3514773062.539682984 (2011/05/19 07:51:02)
          Originator Timestamp: 3514772680.148708224 (2011/05/19 07:44:40)
          Receive Timestamp:    3514772680.152939006 (2011/05/19 07:44:40)
          Transmit Timestamp:   3514773706.188389003 (2011/05/19 08:01:46)
            Originator - Receive Timestamp:  +0.004230779
            Originator - Transmit Timestamp: +1026.039680778
08:18:50.217918 IP (tos 0x10, ttl  64, id 0, offset 0, flags [DF], proto 17, length: 76) server.ntp > pgx045.ntpserver.com.ntp: NTPv4, length 48
--More--(65%)
 
Old 05-20-2011, 08:53 AM   #4
tronayne
Senior Member
 
Registered: Oct 2003
Location: Northeastern Michigan, where Carhartt is a Designer Label
Distribution: Slackware 32- & 64-bit Stable
Posts: 3,541

Rep: Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065Reputation: 1065
Well, I haven't a clue about what the tcpdump output says either; however, your ping output looks pretty good -- I sure wouldn't complain about 0.xxx ms. Frankly, I wouldn't be too concerned about a 55 ms offset either. Even through your ping times on 10.211.0.254 do bounce around quite a bit, that's your secondary server and your primary looks rock solid.

Have you tried specifying a log file; e.g., defining the logfile in /etc/ntp.conf
Code:
logfile /var/log/ntpd.log
Or, in the system start-up that launches the daemon; e.g.,
Code:
/usr/sbin/ntpd -g -p /var/run/ntpd.pid -l /tmp/ntp.log
After NTPD runs for a while and synchronizes typical log entries will look something like
Code:
18 May 12:17:08 ntpd[1869]: proto: precision = 1.789 usec
18 May 12:17:08 ntpd[1869]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
18 May 12:17:08 ntpd[1869]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
18 May 12:17:08 ntpd[1869]: Listen and drop on 1 v6wildcard :: UDP 123
18 May 12:17:08 ntpd[1869]: Listen normally on 2 lo 127.0.0.1 UDP 123
18 May 12:17:08 ntpd[1869]: Listen normally on 3 eth0 192.168.1.10 UDP 123
18 May 12:17:08 ntpd[1869]: Listen normally on 4 lo ::1 UDP 123
18 May 12:17:08 ntpd[1869]: peers refreshed
18 May 12:17:08 ntpd[1869]: Listen normally on 5 multicast 224.0.1.1 UDP 123
18 May 12:17:08 ntpd[1869]: Joined 224.0.1.1 socket to multicast group 224.0.1.1
18 May 12:17:29 ntpd[1869]: Listen normally on 6 eth0 fe80::210:18ff:fe8a:82c1 UDP 123
18 May 12:17:29 ntpd[1869]: peers refreshed
18 May 12:17:29 ntpd[1869]: new interface(s) found: waking up resolver
Another thing you may want to look into is the statistics functions provided in the documentation directory that should be on your system (mine is found in /usr/doc/ntp-4.2.6p3/scripts/stats). A study of the README files found in that directory may make it worth a little time to implement the statistics functions -- you may need to fiddle with them to adapt to your system but they can be quite useful in tracking down problems.

This is just a thought -- your servers appear to be synchronizing with your time server (by the way, where are 10.211.0.246 and 10.211.0.254 getting their time synchronization from -- a reference clock, a stratum 2 server, a pool of outside servers, and what do their logs look like?).

If you're synchronized and you stay synchronized and you're not concerned with atomic-clock accuracy, is there really a problem?

And, one last thing, just for the sake of "maybe this'll work for you," this is what my /etc/ntp.conf files look like on all servers (using external pool servers rather than a local time server just because the boxes tend to get taken in and out of service periodically):
Code:
# Sample /etc/ntp.conf:  Configuration file for ntpd.
#
# Undisciplined Local Clock. This is a fake driver intended for backup
# and when no outside source of synchronized time is available. The
# default stratum is usually 3, but in this case we elect to use stratum
# 0. Since the server line does not have the prefer keyword, this driver
# is never used for synchronization, unless no other other
# synchronization source is available. In case the local host is
# controlled by some external source, such as an external oscillator or
# another protocol, the prefer keyword would cause the local host to
# disregard all other synchronization sources, unless the kernel
# modifications are in use and declare an unsynchronized condition.
#
server  127.127.1.0     # local clock
fudge   127.127.1.0 stratum 10
#server  pool.ntp.org
server  0.us.pool.ntp.org
server  1.us.pool.ntp.org
server  2.us.pool.ntp.org

#
# Drift file.  Put this in a directory which the daemon can write to.
# No symbolic links allowed, either, since the daemon updates the file
# by creating a temporary in the same directory and then rename()'ing
# it to the file.
#
driftfile /etc/ntp/drift
multicastclient 224.0.1.1
broadcastdelay  0.008

#
# Keys file.  If you want to diddle your server at run time, make a
# keys file (mode 600 for sure) and define the key number to be
# used for making requests.
# PLEASE DO NOT USE THE DEFAULT VALUES HERE. Pick your own, or remote
# systems might be able to reset your clock at will.
#
#keys           /etc/ntp/keys
#trustedkey     65535
#requestkey     65535
#controlkey     65535

# Don't serve time or stats to anyone else by default (more secure)
restrict default noquery nomodify
# Trust ourselves.  :-)
restrict 127.0.0.1
The above is bare-bones (it's the file that comes with the system) and, well, works just fine, been using pretty much the same thing for years now (the only thing I do is add the three pool servers).

Other stuff that may be worth a look is your network, particularly routers and switches and, you know, cables and plugs too. Bad connections or bad router channels can lead to a lot grief.

Bottom line, enable statistics gathering, edit the functions in the stats directory (above) and use them to analyze and correct problems; bear in mind that statistics are gathered periodically and you'll need to let thing run for a day or two the get meaningful information.

Hope this helps some.
 
  


Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off



Similar Threads
Thread Thread Starter Forum Replies Last Post
ntp drift file in /etc/ntp instead of /var/lib/ntp - suggestion for a patch in Slack niels.horn Slackware 16 05-07-2009 07:35 PM
NTP high offset noir911 Red Hat 1 01-22-2009 08:34 AM
NTP offset marstse Linux - General 1 01-14-2009 12:33 AM
NTP's ntp.conf pxumsgdxpcvjm Linux - Server 2 08-30-2007 09:34 PM
NTP cannot work with timeserver, NTP-d can jerryvapps Linux - Networking 0 08-04-2004 02:04 PM

LinuxQuestions.org > Forums > Linux Forums > Linux - Server

All times are GMT -5. The time now is 11:44 PM.

Main Menu
Advertisement
My LQ
Write for LQ
LinuxQuestions.org is looking for people interested in writing Editorials, Articles, Reviews, and more. If you'd like to contribute content, let us know.
Main Menu
Syndicate
RSS1  Latest Threads
RSS1  LQ News
Twitter: @linuxquestions
Open Source Consulting | Domain Registration