Back-to-Back vPC with HSRP
Configure a full mesh of links with vPC
General
A Back-to-Back vPC consists of 2 pairs of Nexus Switches that run a different vPC Domain. One of the most important forwarding rules for vPC is that a frame that enters the vPC peer switch from the peer link cannot exit the switch from a vPC member port.
These 4 Nexus Switches are all connected together in the same Port-Channel so the result is a full mesh of links.
A use-case for this topology would be to connect two Access Switches to two Aggregate or Core Switches. Logically you have two switches instead of four.
Back-to-Back vPC
- 1. Create the vPC domains
- 2. Create the peer-keepalive
- 3. Create the Peer-Link
- 4. Create the Member Ports
- 5. Create the back-to-back vPC
Create the vPC Domain
The vPC Domain binds two switches logically to one switch so the ID has to match on both switches.
SW1 and SW2
feature vpc
feature lacp
feature interface-vlan #optional if you want to test pings with l3 vlan
vpc domain 1
SW3 and SW4
feature vpc
feature lacp
feature interface-vlan
vpc domain 2
Create the Peer-Keepalive
The Peer-Keepalive link goes over the mgmt link. The Switches will send UDP pings to track the reachability of eachother.
SW1
int mgmt0
ip add 192.168.0.51/24
sh ip adjacency vrf management #to confirm peer IP is reachable
vpc domain 1
peer-keepalive destination 192.168.0.52
SW2
int mgmt0
ip add 192.168.0.52/24
sh ip adjacency vrf management
vpc domain 1
peer-keepalive destination 192.168.0.51
For SW3 I used IP 192.168.0.53 and SW4 192.168.0.54.
Create the Peer-Link
To synchronize control plane (CAM table, routing table) a separate Port-Channel is created called the Peer-Link. It is not used for data plane traffic unless there is a failure in one of the member ports. If the Peer-Link is broken the switch falls back to flood and learn for the control plane.
SW1,SW2,SW3,SW4
int po1
switchport mode trunk
vpc peer-link
int e1/1-2
switchport mode trunk
channel-group 1 mode active
sh port-channel sum #port-channel has to be in SU mode
sh ru int po 1 memb #verify port-config on all member-ports
Create the Member Ports
The member Ports are the ports that connect the Servers to the switch. The reason why not one port gets blocked by STP is due to vPC internal policy. If a packet from one member port goes over the peer-link it is not alloewd to go in another member port cause of loop avoidance.
When you connect a Windows Server to the member ports you have to enable NIC teaming and choose LACP as teaming mode.
In this case I used another switch with a layer 3 interface on it to test reachability with ping.
SW1,SW2,SW3,SW4
vlan 10
int po11
switchport access vlan 10
vpc 11
int e1/5
switchport access vlan 10
channel-group 11 mode active
vlan 20
int po12
switchport access vlan 20
vpc 12
int e1/6
switchport access vlan 20
channel-group 12 mode active
sh ru int po12 memb
sh vpc #checks if all Port-channels have formed.
sh vpc consistency-parameters vpc 12 #check if configs are the same on both switches
SRV1,SRV3
vlan 10
int vlan 10
ip add 10.0.0.10 255.255.255.0 #use another ip on SRV3
no sh
int po11
switchport access vlan 10
int range g0/0-1
switchport access vlan 10
channel-group 11 mode active
no shut
SRV2,SRV4
vlan 20
int vlan 20
ip add 20.0.0.10 255.255.255.0 #use another ip on SRV4
no sh
int po12
switchport access vlan 20
int range g0/0-1
switchport access vlan 20
channel-group 12 mode active
no shut
Create the back-to-back vPC
Now lets connect the vPC domain 1 to the vPC domain 2 by creating one LACP Channel for all links.
SW1,SW2,SW3,SW4
int po34
switchport mode trunk
vpc 34
int e1/3-4
switchport mode trunk
channel-group 34 mode active
We can issue a continous ping with ping 10.0.0.10 repeat 999999 from SRV3 to SRV1 and shut any link and it should still go through.
SRV3#ping 10.0.0.10 repeat 9999999
Type escape sequence to abort.
Sending 9999999, 100-byte ICMP Echos to 10.0.0.1, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Since both links on the SRV are exactly the same cost, the server will use its internal load balacing algorithm to load-balance the traffice between the ports.
Peer-Switch
Even though a vPC pair is logically one switch we still have a single root bridge. When the root bridge goes down, the other ports receive a new Bridge ID on their ports and automatically go into blocking state to elect a new root bridge. To avoid this circumstance wen can issue the command peer-switch.
vpc domain 1
peer-switch
Now if we issue the command sh spanning-tree vlan 10 | i root on both switches we will see that both are now the Root Bridge.
STP Bridge Assurance
When we create the Peer-Link the ports automatically become STP type Network ports and activate STP Bridge Assurance. Normally only the Root Switch sends BPDU's to announce itself withouth expecting an answer. But now all ports send BPDU's resulting in a two-way communication so if a BPDU is not returned the port gets in a BA-Incnsistent state and now blocks traffic.
By creating the Peer-Link, VLAN pruning (If SW1 has Vlan 10 and VLAN 20 not, SW1 does not forward VLAN 10 frames) is activated between the Peers to avoid VLAN inconsistency.
vPC with HSRP
We can configure HSRP on both Switches and have an active/active Gateway on the data plane. On the control plane we still have an active/standby Gateway so only the active Gateway answers all ARP request and the standby Gateway just relays the ARP request over the Peer-Link.
sw1
feature hsrp
int vlan 10
ip add 10.0.0.51/24
hsrp version 2
hsrp 10
ip 10.0.0.254
no shut
sw2
feature hsrp
int vlan 10
ip add 10.0.0.52/24
hsrp version 2
hsrp 10
ip 10.0.0.254
no shut
When we display the CAM table we can see that the G bit is set for the HSRP vMAC 0000.0c9f.f00a and for the local MAC. That implies that both MACs are Gateways and can both route traffic so the result is an active/active HSRP Gateway.
SW2# sh mac address-table vlan 10
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G 10 0000.0c9f.f00a static - F F sup-eth1(R)
* 10 5000.0001.0007 static - F F vPC Peer-Link(R)
G 10 5000.0002.0007 static - F F sup-eth1(R)
We can confirm that HSRP is working by going on a SRV.
SRV1#sh ip arp
Protocol Address Age (min) Hardware Addr Type Interface
Internet 10.0.0.1 - 5000.0005.800a ARPA Vlan10
Internet 10.0.0.2 98 5000.0007.800a ARPA Vlan10
Internet 10.0.0.51 143 5000.0001.0007 ARPA Vlan10
Internet 10.0.0.52 103 5000.0002.0007 ARPA Vlan10
Internet 10.0.0.254 108 0000.0c9f.f00a ARPA Vlan10
It shows the VIP 10.0.0.254 of the HSRP with the vMAC 0000.0c9f.f00a. The last two digits 0a are Hexadecimal and stand for the vlan where HSRP is configured. In this case its vlan 10.
Peer-Gateway
When we use HSRP we should also use the Peer-Gateway command. With this command a switch can act as a gateway for packets that are destinated to the other switch local MAC. So we do not have to cross the peer-link for certain traffic which is important for some systems that don't rely on ARP.
SW1,SW2
vpc domain 1
peer-gateway
We can confirm the configuration by displaying the CAM table. Now we also have the G bit on the MAC of the peer switch.
SW1# sh mac address-table vlan 10
Legend:
* - primary entry, G - Gateway MAC, (R) - Routed MAC, O - Overlay MAC
age - seconds since last seen,+ - primary entry using vPC Peer-Link,
(T) - True, (F) - False, C - ControlPlane MAC, ~ - vsan
VLAN MAC Address Type age Secure NTFY Ports
---------+-----------------+--------+---------+------+----+------------------
G 10 0000.0c9f.f00a static - F F vPC Peer-Link(R)
G 10 5000.0001.0007 static - F F sup-eth1(R)
G 10 5000.0002.0007 static - F F vPC Peer-Link(R)
When you use an IGP like OSPF you should exclude your SVI's that participate in the IGP to avoid LSA inconsistency. Issue the command peer-gateway exclude-vlan VLAN-NUMBER.
Manually choose vPC role
The Switch with the lowest MAC will get the vPC primary role and its also not preemptive so in case the secondary switch gets the primary role it will keep the primary role to prevent control plane disruptions caused by unnecessary role changes.
Only the switch with the secondary vPC role will disable its member ports when vpc inconsistency is detected so you might manually select which vPC switch gets the primary role. You can give the vPC domain a role priority and the lower priority will get the primary role.
sh vpc role
vpc domain 1
role priority 4000
vpc role preempt
VPC failure scenarios
The following describes what happens when the Keep-Alive and the Peer-Link goes down:
- Keep-alive = vPC is still working, nothing happens
- Peer-link = All member ports on secondary vPC will be suspended
- Peer-link fails and then Keep-Alive = All member ports on secondary vPC will stay in suspended state. Bring the Keep-Alive up first before the Peer-link.
- Keep-Alive fails and then Peer-Link = Split brain, Secondary vPC does not get any signal over Keep-Alive anymore and the Peer-Link dies too so the Secondary vPC thinks the Primary vPC is down and elects itself as Primary vPC and starts forwarding. Now both switches forward traffic. Take all member ports down from Secondary vPC and bring the Keep-alive Link and after it the Peer-Link up.
The last failure is the reason why you should never have the Keep-Alive and Peer go over the same link cause then the possibility of Split-brain is very high when a failure of that link occurs.
If you don't want to use the mgmt vrf you could also put the Keep-Alive link in his own VRF.
SW1,SW2
vrf context KEEPALIVE
int e1/7-8
channel-group 100 mode active
int po100
no switchport
vrf member KEEPALIVE
ip add 20.0.0.51/24 #on SW: 20.0.0.52/24
no shut
vpc domain 1
peer-switch
peer-keepalive destination 20.0.0.52 vrf KEEPALIVE source 20.0.0.51 #swap IPs on SW2
peer-gateway
Helpful Commands
Some commands that helped me during troubleshooting.
ip arp synchronize #synchronizes the ARP tables when the peer-link comes up.
delay restore 360 #delay the restoration of vPC ports to avoid traffic blackholing after reboot
show vpc orphan-ports
show port-channel sum
show vpc
show vpc peer-keepalive
show vpc consistency-parameters #check stp inconstencies and other mismatches
show vpc consistency-parameters global #for checking stp mode
show lacp interface
show lacp neigh
show spanning-tree
show mac address-table
sh run all | sec vpc
vpc orphan-port suspend #to also shut down the port when vpc goes down
show port-channel traffic #shows utilization of the physical ports
show port-channel load-balance #shows load-balance method
show port-channel usage #shows which port-channel numbers are in use
vPC Delay Restore
allows for Layer 3 routing protocols to converge before they allow any traffic on vPC leg. vPC restoration default timer of 30 seconds
vpc domain 1
delay restore 360 #allows router to converge routing protocols before using the vpc peer. default timer 30sec.
Type-1 vs Type-2
VPC differs between two types of mismatches that we can show with the command show vpc consistency-parameters global.
Type-1 mismatches are quite serious like an STP mismatch and should be solved immediately. When a Type-1 mismatch occurs the vPC links on standby vPC switch go down until inconsistency is solved. This minimizes disruption time but also increases the possibility of split brain since vpc link is still up on the active peer. If we want to be more on the safe site we can use the no form of the command so also the active vPC peer disables its links in case of a Type-1 mismatch.
vpc domain 1
graceful consistency-check #suspends vpc links when type-1 mismatch occurs.
Thanks for reading my article. If you have any questions or recommendations you can message me via arvednetblog@gmail.com.