NTP redesign

This post is about a bug that affected NTP (Network Time Protocol) and our redesign of the environment bypass the issue.

In this environment the core Cisco 7604 IOS routers were the NTP stratum 2 servers (x.x.x.123 because fun with port numbers). The IP was an HSRP standby IP. There were several downstream Linux NTP servers and Window Domain Controllers serving NTP to Windows clients. As unsupported Linux servers died their IP’s were just added to servers that were still alive. Eventually this got messy.

After the 7604 routers were replaced with a pair of ASR1006X we ran into some interesting issues. Windows users we no longer able to log. Turns out the Domain Controllers were falling out of sync. My Infoblox DDI servers also showed stale time. Users were eventually able to log into the Domain either before or after the Windows team changed their NTP config. The sys admins were now syncing with one of the Stratum 3 Linux servers. Knowing that the only thing that had changed in our environment was the ASR routers I knew this wouldn’t be the end of the issue.

I opened a ticket with Cisco TAC to troubleshoot with the ASR’s. TAC thought maybe it was because we were using a standby IP. But I couldn’t get resources to help test so we got nowhere. Eventually a bug ID CSCsq31723 was made which I think is related. Fast forward 6 months and Windows users can’t log in again. This time we decided to go nuclear and redesign the whole NTP layout.

The new design removed all servers running on unsupported hardware and OS’s. It also made use of our Infoblox DDI grid which is a purpose made tool for DNS, DHCP, IPAM, NTP, and File Distribution.

We decided on this:

  • 3 internet Stratum 1 servers
  • 3 Stratum 2 servers. Infoblox DDI Grid Master at HQ, the Grid Master Candidate at our DR location, and a Linux server to diversify technologies.
  • Place the Stratum 2 servers in a mesh.
  • The Infoblox DDI Grid Master and Grid Master Candidate fed the 2 HA pair (4 servers with 2 VIP’s) Grid Members.
  • Create Access Lists on the Stratum 2 servers so only the Stratum 3 servers can sync with them.
  • All clients would then sync with the 3 HA pairs of Infoblox DDI Grid Members.
  • Set up NTP Authentication (https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-authenticated-ntp-service).

Both VIP’s of the HA paired Grid Members had a user friendly DNS record and some systems accept a hostname/DNS for the NTP config. That would be handy if we ever needed to change the IP’s again. For extra flexibility we could use F5 BIG-IP GTM (DNS load balancing). But network devices like routers and switches don’t support using DNS for NTP meaning there would be two sets of configs. One with NTP configs using DNS and another hard set to static IP’s. We wanted a global config so we went with static IP’s everywhere.

The IP’s were given out and people were told to migrate. We setup a span and periodically checked to see who was still pointed at the old servers before finally retiring the old IP’s/servers.

It wasn’t perfect but it was a big improvement. Long term I’d want to install our own Stratum 0 GPS antennas instead of using internet hosted servers. For a home project I’m thinking of using a Raspberry Pi to make one using this link as a guide (https://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html).

Leave a Reply

Your email address will not be published. Required fields are marked *