NAT’s interaction with DNS answers

Recently I was troubleshooting some odd DNS results between 2 customers that have a B2B connection. The DNS record in question existed in the wild on the internet and resolved to 113.129.255.98 (all IP’s have been randomized using https://onlinerandomtools.com/generate-random-ip for anonymity). Customer A resolved to 192.168.20.5 on their end of the link and Customer B resolved to 172.16.20.57 on the other end, where the server lived, which was the correct one. DNS admins were brought in on both sides. Customer A confirmed that they had Conditional Forwarders configured to query Customer B’s Name Servers for this Zone.

 

To the Packets! Captures were taken nearest each Name Server and nearest each end of the B2B connection. The change was happening on Customer A’s side of the link behind a outer doing NAT. We had brought up NAT a couple of times but thought “nah, that’s not what NAT does”. Guess what happened when we said “ok, let’s remove the NAT”. By the title of this you can probably guess what happened…Problem solved. We were flabbergasted.

 

I was familiar with DNS spoofing/poisoning/hijacking but had never thought that NAT could be leveraged in this way. Research brought me to DNS Doctoring which is the non malicious way to manipulate DNS for good rather than for evil. But this was different. Eventually I stumbled across RFC 2694 “DNS extensions to Network Address Translators (DNS_ALG)” https://tools.ietf.org/html/rfc2694 and Application Level Gateways.

 

These are a couple of the pages that I found on the subject.

https://blog.webernetz.net/cisco-router-disable-dns-rewrite-alg-for-static-nats/

https://www.cisco.com/c/en/us/td/docs/security/asa/asa95/configuration/firewall/asa-95-firewall-config/nat-reference.pdf 

 

It took longer than it should have to find the answer because I was using search terms that weren’t used much. I’ve included my verbiage so the next person will find this faster.

 

On a Cisco IOS device Application Level Gateway for NAT is enabled by default. Which means in a one to one NAT (not overloaded/PAT) the DNS answer within received packets is rewritten to the NAT IP.

 

So with this config:

ip nat outside source static <outside global> <outside local>

ip nat inside source static 172.16.20.57 192.168.20.5

 

When you do an nslookup from the receiving side of the NAT (in this case Customer A) you’ll resolve to the outside local instead of the outside global IP you’ll get the outside local one.

 

You can disable ALG with:

(no) ip nat service alg udp dns

(no) ip nat service alg tcp dns

 

Or change the NAT so that it doesn’t alter the payload of the packets that contain DNS answers, or the payload of any packets, who knows what other “features” lie in wait:

ip nat inside source static <outside global> <outside local> no-payload

ip nat inside source static 172.16.20.57 192.168.20.5 no-payload

 

One more reason to hate NAT, which would also have been an acceptable post name.

NTP redesign

This post is about a bug that affected NTP (Network Time Protocol) and our redesign of the environment bypass the issue.

In this environment the core Cisco 7604 IOS routers were the NTP stratum 2 servers (x.x.x.123 because fun with port numbers). The IP was an HSRP standby IP. There were several downstream Linux NTP servers and Window Domain Controllers serving NTP to Windows clients. As unsupported Linux servers died their IP’s were just added to servers that were still alive. Eventually this got messy.

After the 7604 routers were replaced with a pair of ASR1006X we ran into some interesting issues. Windows users we no longer able to log. Turns out the Domain Controllers were falling out of sync. My Infoblox DDI servers also showed stale time. Users were eventually able to log into the Domain either before or after the Windows team changed their NTP config. The sys admins were now syncing with one of the Stratum 3 Linux servers. Knowing that the only thing that had changed in our environment was the ASR routers I knew this wouldn’t be the end of the issue.

I opened a ticket with Cisco TAC to troubleshoot with the ASR’s. TAC thought maybe it was because we were using a standby IP. But I couldn’t get resources to help test so we got nowhere. Eventually a bug ID CSCsq31723 was made which I think is related. Fast forward 6 months and Windows users can’t log in again. This time we decided to go nuclear and redesign the whole NTP layout.

The new design removed all servers running on unsupported hardware and OS’s. It also made use of our Infoblox DDI grid which is a purpose made tool for DNS, DHCP, IPAM, NTP, and File Distribution.

We decided on this:

  • 3 internet Stratum 1 servers
  • 3 Stratum 2 servers. Infoblox DDI Grid Master at HQ, the Grid Master Candidate at our DR location, and a Linux server to diversify technologies.
  • Place the Stratum 2 servers in a mesh.
  • The Infoblox DDI Grid Master and Grid Master Candidate fed the 2 HA pair (4 servers with 2 VIP’s) Grid Members.
  • Create Access Lists on the Stratum 2 servers so only the Stratum 3 servers can sync with them.
  • All clients would then sync with the 3 HA pairs of Infoblox DDI Grid Members.
  • Set up NTP Authentication (https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-authenticated-ntp-service).

Both VIP’s of the HA paired Grid Members had a user friendly DNS record and some systems accept a hostname/DNS for the NTP config. That would be handy if we ever needed to change the IP’s again. For extra flexibility we could use F5 BIG-IP GTM (DNS load balancing). But network devices like routers and switches don’t support using DNS for NTP meaning there would be two sets of configs. One with NTP configs using DNS and another hard set to static IP’s. We wanted a global config so we went with static IP’s everywhere.

The IP’s were given out and people were told to migrate. We setup a span and periodically checked to see who was still pointed at the old servers before finally retiring the old IP’s/servers.

It wasn’t perfect but it was a big improvement. Long term I’d want to install our own Stratum 0 GPS antennas instead of using internet hosted servers. For a home project I’m thinking of using a Raspberry Pi to make one using this link as a guide (https://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html).