Back-to-Back vPC with HSRP

Configure a full mesh of links with vPC

Alt text

General

A Back-to-Back vPC consists of 2 pairs of Nexus Switches that run a different vPC Domain. One of the most important forwarding rules for vPC is that a frame that enters the vPC peer switch from the peer link cannot exit the switch from a vPC member port.


These 4 Nexus Switches are all connected together in the same Port-Channel so the result is a full mesh of links.

A use-case for this topology would be to connect two Access Switches to two Aggregate or Core Switches. Logically you have two switches instead of four.

Back-to-Back vPC

  • 1. Create the vPC domains
  • 2. Create the peer-keepalive
  • 3. Create the Peer-Link
  • 4. Create the Member Ports
  • 5. Create the back-to-back vPC

Create the vPC Domain

The vPC Domain binds two switches logically to one switch so the ID has to match on both switches.


SW1 and SW2

feature vpc
feature lacp
feature interface-vlan #optional if you want to test pings with l3 vlan
vpc domain 1

SW3 and SW4

feature vpc
feature lacp
feature interface-vlan
vpc domain 2

Create the Peer-Keepalive

The Peer-Keepalive link goes over the mgmt link. The Switches will send UDP pings to track the reachability of eachother.


SW1

int mgmt0
ip add 192.168.0.51/24
sh ip adjacency vrf management #to confirm peer IP is reachable

vpc domain 1
peer-keepalive destination 192.168.0.52

SW2

int mgmt0
ip add 192.168.0.52/24
sh ip adjacency vrf management

vpc domain 1
peer-keepalive destination 192.168.0.51

For SW3 I used IP 192.168.0.53 and SW4 192.168.0.54.

To synchronize control plane (CAM table, routing table) a separate Port-Channel is created called the Peer-Link. It is not used for data plane traffic unless there is a failure in one of the member ports. If the Peer-Link is broken the switch falls back to flood and learn for the control plane.


SW1,SW2,SW3,SW4

int po1
switchport mode trunk
vpc peer-link

int e1/1-2
switchport mode trunk
channel-group 1 mode active

sh port-channel sum #port-channel has to be in SU mode
sh ru int po 1 memb #verify port-config on all member-ports

Create the Member Ports

The member Ports are the ports that connect the Servers to the switch. The reason why not one port gets blocked by STP is due to vPC internal policy. If a packet from one member port goes over the peer-link it is not alloewd to go in another member port cause of loop avoidance.

When you connect a Windows Server to the member ports you have to enable NIC teaming and choose LACP as teaming mode.

In this case I used another switch with a layer 3 interface on it to test reachability with ping.


SW1,SW2,SW3,SW4

vlan 10

int po11
switchport access vlan 10
vpc 11

int e1/5
switchport access vlan 10
channel-group 11 mode active

vlan 20

int po12
switchport access vlan 20
vpc 12

int e1/6
switchport access vlan 20
channel-group 12 mode active

sh ru int po12 memb
sh vpc #checks if all Port-channels have formed.
sh vpc consistency-parameters vpc 12 #check if configs are the same on both switches

SRV1,SRV3

vlan 10
int vlan 10
ip add 10.0.0.10 255.255.255.0 #use another ip on SRV3
no sh

int po11
switchport access vlan 10

int range g0/0-1
switchport access vlan 10
channel-group 11 mode active
no shut

SRV2,SRV4

vlan 20
int vlan 20
ip add 20.0.0.10 255.255.255.0 #use another ip on SRV4
no sh

int po12
switchport access vlan 20

int range g0/0-1
switchport access vlan 20
channel-group 12 mode active
no shut

Create the back-to-back vPC

Now lets connect the vPC domain 1 to the vPC domain 2 by creating one LACP Channel for all links.


SW1,SW2,SW3,SW4

int po34
switchport mode trunk
vpc 34

int e1/3-4
switchport mode trunk
channel-group 34 mode active

We can issue a continous ping with ping 10.0.0.10 repeat 999999 from SRV3 to SRV1 and shut any link and it should still go through.

SRV3#ping 10.0.0.10 repeat 9999999
Type escape sequence to abort.
Sending 9999999, 100-byte ICMP Echos to 10.0.0.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Since both links on the SRV are exactly the same cost, the server will use its internal load balacing algorithm to load-balance the traffice between the ports.

Peer-Switch

Even though a vPC pair is logically one switch we still have a single root bridge. When the root bridge goes down, the other ports receive a new Bridge ID on their ports and automatically go into blocking state to elect a new root bridge. To avoid this circumstance wen can issue the command peer-switch.

vpc domain 1
peer-switch

Now if we issue the command sh spanning-tree vlan 10 | i root on both switches we will see that both are now the Root Bridge.

STP Bridge Assurance

When we create the Peer-Link the ports automatically become STP type Network ports and activate STP Bridge Assurance. Normally only the Root Switch sends BPDU's to announce itself withouth expecting an answer. But now all ports send BPDU's resulting in a two-way communication so if a BPDU is not returned the port gets in a BA-Incnsistent state and now blocks traffic.

By creating the Peer-Link, VLAN pruning (If SW1 has Vlan 10 and VLAN 20 not, SW1 does not forward VLAN 10 frames) is activated between the Peers to avoid VLAN inconsistency.

vPC with HSRP

We can configure HSRP on both Switches and have an active/active Gateway on the data plane. On the control plane we still have an active/standby Gateway so only the active Gateway answers all ARP request and the standby Gateway just relays the ARP request over the Peer-Link.


sw1

feature hsrp
int vlan 10
ip add 10.0.0.51/24
hsrp version 2
hsrp 10
ip 10.0.0.254
no shut

sw2

feature hsrp
int vlan 10
ip add 10.0.0.52/24
hsrp version 2
hsrp 10
ip 10.0.0.254 
no shut

When we display the CAM table we can see that the G bit is set for the HSRP vMAC 0000.0c9f.f00a and for the local MAC. That implies that both MACs are Gateways and can both route traffic so the result is an active/active HSRP Gateway.

SW2# sh mac address-table vlan 10
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G   10     0000.0c9f.f00a   static   -         F      F    sup-eth1(R)
*   10     5000.0001.0007   static   -         F      F    vPC Peer-Link(R)
G   10     5000.0002.0007   static   -         F      F    sup-eth1(R)

We can confirm that HSRP is working by going on a SRV.

SRV1#sh ip arp
Protocol  Address          Age (min)  Hardware Addr   Type   Interface
Internet  10.0.0.1                -   5000.0005.800a  ARPA   Vlan10
Internet  10.0.0.2               98   5000.0007.800a  ARPA   Vlan10
Internet  10.0.0.51             143   5000.0001.0007  ARPA   Vlan10
Internet  10.0.0.52             103   5000.0002.0007  ARPA   Vlan10
Internet  10.0.0.254            108   0000.0c9f.f00a  ARPA   Vlan10

It shows the VIP 10.0.0.254 of the HSRP with the vMAC 0000.0c9f.f00a. The last two digits 0a are Hexadecimal and stand for the vlan where HSRP is configured. In this case its vlan 10.

Peer-Gateway

When we use HSRP we should also use the Peer-Gateway command. With this command a switch can act as a gateway for packets that are destinated to the other switch local MAC. So we do not have to cross the peer-link for certain traffic which is important for some systems that don't rely on ARP.


SW1,SW2

vpc domain 1
peer-gateway

We can confirm the configuration by displaying the CAM table. Now we also have the G bit on the MAC of the peer switch.

SW1# sh mac address-table vlan 10
Legend: 
        * - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
        age - seconds since last seen,+ - primary entry using vPC Peer-Link,
        (T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
   VLAN     MAC Address      Type      age     Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G   10     0000.0c9f.f00a   static   -         F      F    vPC Peer-Link(R)
G   10     5000.0001.0007   static   -         F      F    sup-eth1(R)
G   10     5000.0002.0007   static   -         F      F    vPC Peer-Link(R)

When you use an IGP like OSPF you should exclude your SVI's that participate in the IGP to avoid LSA inconsistency. Issue the command peer-gateway exclude-vlan VLAN-NUMBER.

Manually choose vPC role

The Switch with the lowest MAC will get the vPC primary role and its also not preemptive so in case the secondary switch gets the primary role it will keep the primary role to prevent control plane disruptions caused by unnecessary role changes.

Only the switch with the secondary vPC role will disable its member ports when vpc inconsistency is detected so you might manually select which vPC switch gets the primary role. You can give the vPC domain a role priority and the lower priority will get the primary role.

sh vpc role
vpc domain 1
role priority 4000
vpc role preempt

VPC failure scenarios

The following describes what happens when the Keep-Alive and the Peer-Link goes down:

  • Keep-alive = vPC is still working, nothing happens
  • Peer-link = All member ports on secondary vPC will be suspended
  • Peer-link fails and then Keep-Alive = All member ports on secondary vPC will stay in suspended state. Bring the Keep-Alive up first before the Peer-link.
  • Keep-Alive fails and then Peer-Link = Split brain, Secondary vPC does not get any signal over Keep-Alive anymore and the Peer-Link dies too so the Secondary vPC thinks the Primary vPC is down and elects itself as Primary vPC and starts forwarding. Now both switches forward traffic. Take all member ports down from Secondary vPC and bring the Keep-alive Link and after it the Peer-Link up.

The last failure is the reason why you should never have the Keep-Alive and Peer go over the same link cause then the possibility of Split-brain is very high when a failure of that link occurs.

If you don't want to use the mgmt vrf you could also put the Keep-Alive link in his own VRF.


SW1,SW2

vrf context KEEPALIVE

int e1/7-8
  channel-group 100 mode active

int po100
  no switchport
  vrf member KEEPALIVE
  ip add 20.0.0.51/24 #on SW: 20.0.0.52/24
  no shut

vpc domain 1
  peer-switch
  peer-keepalive destination 20.0.0.52 vrf KEEPALIVE source 20.0.0.51 #swap IPs on SW2
  peer-gateway

Helpful Commands

Some commands that helped me during troubleshooting.

ip arp synchronize #synchronizes the ARP tables when the peer-link comes up. 
delay restore 360 #delay the restoration of vPC ports to avoid traffic blackholing after reboot
show vpc orphan-ports
show port-channel sum
show vpc
show vpc peer-keepalive 
show vpc consistency-parameters #check stp inconstencies and other mismatches
show vpc consistency-parameters global #for checking stp mode
show lacp interface
show lacp neigh
show spanning-tree
show mac address-table
sh run all | sec vpc
vpc orphan-port suspend #to also shut down the port when vpc goes down
show port-channel traffic #shows utilization of the physical ports
show port-channel load-balance #shows load-balance method
show port-channel usage #shows which port-channel numbers are in use

vPC Delay Restore

allows for Layer 3 routing protocols to converge before they allow any traffic on vPC leg. vPC restoration default timer of 30 seconds

vpc domain 1 
 delay restore 360 #allows router to converge routing protocols before using the vpc peer. default timer 30sec.

Type-1 vs Type-2

VPC differs between two types of mismatches that we can show with the command show vpc consistency-parameters global.

Type-1 mismatches are quite serious like an STP mismatch and should be solved immediately. When a Type-1 mismatch occurs the vPC links on standby vPC switch go down until inconsistency is solved. This minimizes disruption time but also increases the possibility of split brain since vpc link is still up on the active peer. If we want to be more on the safe site we can use the no form of the command so also the active vPC peer disables its links in case of a Type-1 mismatch.

vpc domain 1
 graceful consistency-check #suspends vpc links when type-1 mismatch occurs.

Thanks for reading my article. If you have any questions or recommendations you can message me via arvednetblog@gmail.com.