NAT’s interaction with DNS answers

Recently I was troubleshooting some odd DNS results between 2 customers that have a B2B connection. The DNS record in question existed in the wild on the internet and resolved to 113.129.255.98 (all IP’s have been randomized using https://onlinerandomtools.com/generate-random-ip for anonymity). Customer A resolved to 192.168.20.5 on their end of the link and Customer B resolved to 172.16.20.57 on the other end, where the server lived, which was the correct one. DNS admins were brought in on both sides. Customer A confirmed that they had Conditional Forwarders configured to query Customer B’s Name Servers for this Zone.

 

To the Packets! Captures were taken nearest each Name Server and nearest each end of the B2B connection. The change was happening on Customer A’s side of the link behind a outer doing NAT. We had brought up NAT a couple of times but thought “nah, that’s not what NAT does”. Guess what happened when we said “ok, let’s remove the NAT”. By the title of this you can probably guess what happened…Problem solved. We were flabbergasted.

 

I was familiar with DNS spoofing/poisoning/hijacking but had never thought that NAT could be leveraged in this way. Research brought me to DNS Doctoring which is the non malicious way to manipulate DNS for good rather than for evil. But this was different. Eventually I stumbled across RFC 2694 “DNS extensions to Network Address Translators (DNS_ALG)” https://tools.ietf.org/html/rfc2694 and Application Level Gateways.

 

These are a couple of the pages that I found on the subject.

https://blog.webernetz.net/cisco-router-disable-dns-rewrite-alg-for-static-nats/

https://www.cisco.com/c/en/us/td/docs/security/asa/asa95/configuration/firewall/asa-95-firewall-config/nat-reference.pdf 

 

It took longer than it should have to find the answer because I was using search terms that weren’t used much. I’ve included my verbiage so the next person will find this faster.

 

On a Cisco IOS device Application Level Gateway for NAT is enabled by default. Which means in a one to one NAT (not overloaded/PAT) the DNS answer within received packets is rewritten to the NAT IP.

 

So with this config:

ip nat outside source static <outside global> <outside local>

ip nat inside source static 172.16.20.57 192.168.20.5

 

When you do an nslookup from the receiving side of the NAT (in this case Customer A) you’ll resolve to the outside local instead of the outside global IP you’ll get the outside local one.

 

You can disable ALG with:

(no) ip nat service alg udp dns

(no) ip nat service alg tcp dns

 

Or change the NAT so that it doesn’t alter the payload of the packets that contain DNS answers, or the payload of any packets, who knows what other “features” lie in wait:

ip nat inside source static <outside global> <outside local> no-payload

ip nat inside source static 172.16.20.57 192.168.20.5 no-payload

 

One more reason to hate NAT, which would also have been an acceptable post name.

Securing the wired network with 802.1X

This post covers an innovation project I did to secure the wired network at a shared conf center with 802.1X.

Every few months we had to disable the wired network in order to prevent non-employees from being able to get online. This was not scalable, was prone to human error, and scheduling confusion. I planned to automate the process by enabling 802.1X aka dot1q on the switches using our Windows AD via Cisco ACS.

Any Domain joined devices that plugged in would get access to our corp VLAN, and unknown devices would go into a dead VLAN. Long term I planned to enable a wired guest VLAN and had it labbed out for non local switched wifi where the guest VLAN exists on the switch you’re connected to but didn’t around to labbing local switching using CAPWAP tunnels.

Wired Guest Access using Cisco WLAN Controllers Configuration Example:

http://www.cisco.com/c/en/us/support/docs/wireless-mobility/wireless-lan-wlan/99470-config-wiredguest-00.html

Phones would end up in the VoIP VLAN but they weren’t equipped for dot1x authentication. So I had two options either manually add each MAC address to a list in the ACS which is not scalable or supportable. Instead I removed the user VLAN from the ports with phones connected. The voice VLAN itself was locked down with a strict ACL that only allowed communication with the VoIP server.

Client configs:

In this environment there were only Windows clients. Windows needs to be configured to enable their supplicant for dot1x. Anything with Windows can and should be controlled by Group Policy. I researched the needed settings and how to set them via GPO. Then I worked with the Windows team to roll out the GPO to a pilot group and finally deploy globally.

Configuring 802.1X Wired Authentication on a Windows 7 Client:

https://documentation.meraki.com/MS/Access_Control/Configuring_802.1X_Wired_Authentication_on_a_Windows_7_Client

You can do the same thing with other versions of Windows just this was the one I worked with.

Windows AD GPO guide:

https://msdn.microsoft.com/en-us/library/dd759237.aspx

When these Win 7 machines were upgraded to Win 10 the GPO still worked.

 

Switch configs:

!Debug 802.1x all

!Debug radius all

conf t

aaa new-model

aaa authentication dot1x default group radius

aaa authorization network default group radius

aaa accounting network default start-stop group radius

dot1x system-auth-control

dot1x guest-vlan supplicant

radius server acs1.foo.com

 address ipv4 10.1.181.2 auth-port 1812 acct-port 1813

 key 0 This-IsTheSharedSecret123

exit

radius server acs2.foo.com

 address  ipv4 10.2.181.2 auth-port 1812 acct-port 1813

 key 0 This-IsTheSharedSecret123

exit

!User ports

interface <interface>

 authentication port-control auto

 authentication host-mode multi-domain

 dot1x pae authenticator

 authentication event no-response action authorize vlan 15

 authentication event fail action authorize vlan 15

!Phone ports

interface <interface>

no switchport access vlan 1010

end

copy run start

ACS Configs:

I already had the ACS configured to do dot1x auth for wifi clients so it was simple to just add the new switches to the rule set.

 

It was a success and opened the door to securing all wired networks.

NTP redesign

This post is about a bug that affected NTP (Network Time Protocol) and our redesign of the environment bypass the issue.

In this environment the core Cisco 7604 IOS routers were the NTP stratum 2 servers (x.x.x.123 because fun with port numbers). The IP was an HSRP standby IP. There were several downstream Linux NTP servers and Window Domain Controllers serving NTP to Windows clients. As unsupported Linux servers died their IP’s were just added to servers that were still alive. Eventually this got messy.

After the 7604 routers were replaced with a pair of ASR1006X we ran into some interesting issues. Windows users we no longer able to log. Turns out the Domain Controllers were falling out of sync. My Infoblox DDI servers also showed stale time. Users were eventually able to log into the Domain either before or after the Windows team changed their NTP config. The sys admins were now syncing with one of the Stratum 3 Linux servers. Knowing that the only thing that had changed in our environment was the ASR routers I knew this wouldn’t be the end of the issue.

I opened a ticket with Cisco TAC to troubleshoot with the ASR’s. TAC thought maybe it was because we were using a standby IP. But I couldn’t get resources to help test so we got nowhere. Eventually a bug ID CSCsq31723 was made which I think is related. Fast forward 6 months and Windows users can’t log in again. This time we decided to go nuclear and redesign the whole NTP layout.

The new design removed all servers running on unsupported hardware and OS’s. It also made use of our Infoblox DDI grid which is a purpose made tool for DNS, DHCP, IPAM, NTP, and File Distribution.

We decided on this:

  • 3 internet Stratum 1 servers
  • 3 Stratum 2 servers. Infoblox DDI Grid Master at HQ, the Grid Master Candidate at our DR location, and a Linux server to diversify technologies.
  • Place the Stratum 2 servers in a mesh.
  • The Infoblox DDI Grid Master and Grid Master Candidate fed the 2 HA pair (4 servers with 2 VIP’s) Grid Members.
  • Create Access Lists on the Stratum 2 servers so only the Stratum 3 servers can sync with them.
  • All clients would then sync with the 3 HA pairs of Infoblox DDI Grid Members.
  • Set up NTP Authentication (https://www.nist.gov/pml/time-and-frequency-division/time-services/nist-authenticated-ntp-service).

Both VIP’s of the HA paired Grid Members had a user friendly DNS record and some systems accept a hostname/DNS for the NTP config. That would be handy if we ever needed to change the IP’s again. For extra flexibility we could use F5 BIG-IP GTM (DNS load balancing). But network devices like routers and switches don’t support using DNS for NTP meaning there would be two sets of configs. One with NTP configs using DNS and another hard set to static IP’s. We wanted a global config so we went with static IP’s everywhere.

The IP’s were given out and people were told to migrate. We setup a span and periodically checked to see who was still pointed at the old servers before finally retiring the old IP’s/servers.

It wasn’t perfect but it was a big improvement. Long term I’d want to install our own Stratum 0 GPS antennas instead of using internet hosted servers. For a home project I’m thinking of using a Raspberry Pi to make one using this link as a guide (https://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html).

Guest wifi and branch backup VPN redo

This post is about a situation I ran into a while ago and records my configs and testing for converting from a PBR setup to VRF on a Cisco 881 router with a diagram at the end.

Through a combination of configs involving PBR (Policy Based Routing) AKA Source Routing (as opposed to standard Destination Routing), Proxy Server exceptions, and Default Route/missing Default Route it was impossible to get to internet facing apps/sites over guest wifi or branch backup VPN.

I knew I could use VRF’s (Virtual Routing and Forwarding) to separate the traffic and solve the issue, but had to prove it to my team as they weren’t familiar with VRF’s. A Cisco router without VRF’s built only has the “global routing table”. VRF’s create separate instances of routing tables; one for each VRF, while leaving the global in place.

IOS-XE comes with a mgmt-intf VRF by default for a separate management network. Carriers use VRF’s, or contexts in some non Cisco hardware, to keep customers traffic separate and allow for overlapping network schemes. If needed you can “leak routes” between VRF’s and/or the global routing table. This would be done if the carrier has something like a network monitoring server that needs to access customer devices.

I used a post by Jeremy Stretch at packetlife.net as a guide and to show that someone smarter than me confirmed the design. http://packetlife.net/blog/2012/sep/4/simultaneous-tunneled-and-native-internet-access/

Jeremy’s post goes into the details of building everything from the base up.

 

Testing:

Test from a computer on guest wifi:

C:\Users\>tracert google.com

Tracing route to google.com [172.217.5.78]

over a maximum of 30 hops:

  1 81 ms   139 ms 251 ms  172.17.1.1
  2  152 ms 35 ms 29 ms  [10.1.180.1]
  3  520 ms   173 ms 652 ms  [10.254.254.161]
  4  4 ms     6 ms  7 ms  [****]
  5  8 ms  5 ms    15 ms [****]
  6 12 ms 10 ms  6 ms  144.228.109.65
  7 34 ms   166 ms 219 ms  sl-mpe50-sea-.sprintlink.net [144.232.3.126]
  8  662 ms   638 ms 849 ms  72.14.242.31
  9  572 ms 45 ms  9 ms  108.170.245.115
10   578 ms   995 ms 370 ms  66.249.94.201
11   955 ms   618 ms 507 ms  209.85.240.228
12   268 ms   457 ms 447 ms  216.239.54.158
13   358 ms   342 ms 638 ms  216.239.51.124
14    61 ms 71 ms 51 ms  108.170.247.193
15    56 ms 88 ms 68 ms  108.170.237.113
16    55 ms 32 ms 44 ms  172.217.5.78

Trace complete.

Test from a computer on the regular corp wifi or LAN:

C:\Users\>tracert google.com

Tracing route to google.com [172.217.5.78]

over a maximum of 30 hops:
  1 12 ms  5 ms 11 ms  192.168.2.1
  2  5 ms  5 ms  3 ms  10.3.254.1 < --- Headend Tunnel interface
  3 10 ms  4 ms     5 ms [10.254.254.161]
  4 17 ms  2 ms     9 ms [***]
  5  6 ms  4 ms  4 ms  [****]
  6 10 ms  3 ms     7 ms 144.228.109.65
  7  6 ms  2 ms  2 ms  sl-mpe50-sea-.sprintlink.net [144.232.3.126]
  8  5 ms  4 ms     5 ms 72.14.242.31
  9  7 ms  4 ms  3 ms  108.170.245.115
10     9 ms 12 ms  9 ms  66.249.94.201
11    10 ms 29 ms 26 ms  209.85.240.228
12    34 ms 54 ms 59 ms  216.239.54.158
13    32 ms 32 ms 32 ms  216.239.51.124
14    30 ms 31 ms 30 ms  108.170.247.193
15    33 ms 30 ms 29 ms  108.170.237.113
16    38 ms 31 ms 40 ms  172.217.5.78

Trace complete.

 

Config differences:

ADD config:

First you have to create a named VRF. Some people use all caps for VRF names to make them stand out. I don’t because it’s a pain when you want to ping, trace, or use any VRF specific commands.

!The VRF names doesn’t matter but fdoor relates to Front Door VRF sometimes used with DMVPN. It separates the base routing and the “overlay routing” used by the VPN. You could also put all interfaces in VRF’s and not use the global routing table but I didn’t.

!Create the VRF.
!
ip vrf fdoor
!
!Required to support DHCP when using VRF’s.
no ip dhcp use vrf connected
!
!Place the internet interface in the VRF
interface <internet-interface>
ip vrf forwarding fdoor
!
!Place the guest interface in the VRF
interface vlan15
ip vrf forwarding fdoor
!
!Tell the tunnel to use the VRF for its source interface.
Interface <tunnel-number>
tunnel vrf fdoor

CHANGE config:

!The NAT needs to be told about the VRF.
!Change this:
ip nat inside source list 130 interface <internet-interface> overload
!
!To this:
ip nat inside source list 130 interface <internet-interface> vrf fdoor match-in-vrf fdoor overload
!The default route needs to be moved to the VRF.
!Change from this:
ip route 0.0.0.0 0.0.0.0 <internet-interface> <internet-gw>
!
!To this:
ip route vrf fdoor 0.0.0.0 0.0.0.0 <internet-interface> <internet-gw>

REMOVE config:

!
interface Vlan10
no ip policy route-map wifi-mgmt-route-map 
!
!
interface Vlan30
no ip policy route-map wifi-guest-route-map
!
!
interface Vlan36
no ip policy route-map LAN-route-map
!
!
no route-map wifi-mgmt-route-map permit 10
 !match ip address CAPWAP-traffic
 !set ip precedence flash-override
 !set ip next-hop <branch-LAN-gw>
!
no route-map LAN-route-map permit 10
 !match ip address all-except-localwifi
 !set interface Tunnel1999
!
no route-map wifi-guest-route-map permit 10
 !match ip address 130
!
!These routes are not needed as we have dynamic routes for the tunnel and a static default route for internet access in the fdoor VRF.
no ip route 0.0.0.0 0.0.0.0 vlan33 <branch-LAN-gw> 254
no ip route <DNS1> 255.255.255.255 <internet-interface> <internet-gw>
no ip route <DNS2> 255.255.255.255 <internet-interface> <internet-gw>

As you can see you end up with less configs and better routing. I was able to convert a few hundred of these setups over a few weeks with old school copy pasta. But it I had to do this again I’d spend time on a Python script that would convert, test, and document the results.