VXLAN BGP evpn with L3Out and VPC

Configure VXLAN BGP evpn with L3OUT and VPC

Alt text

Theory

With VXLAN you can create Layer 2 Adjancencies over a Layer 3 Network. This increases flexibility in network design since you can span your network over multiple datacenters without paying too much attention on routing and segmentation.


The easiest approach is the flood and learn scenario but you still rely on dataplane learning so its not our first choice.

VxLan BGP evpn is the most popular option. Here the local leaf will learn a new MAC and advertise this mac over BGP evpn as l2route. All the other leafs participating in BGP evpn get the route advertisment rather than flooding the packet out of all ports. You don't rely on broadcast anymore and use PIM to create Multicast trees for your VLans. Instead of using ARP as Endpoint learning protocol, VxLan will encapsulate every packet and use the EVPN control plane, which is basically a routing table for MAC and IP traffic to find the destination.


Finally, when both leaves have the endpoint in their evpn routing table leaf A will encapsulate the packet by assigning the VNI (VxLan ID) and send the packet over a new interface called NVE to Leaf B.


We will first configure VxLan BGP EVPN with one Spine and two Leaves and after that we will configure L3OUT and VPC. I have done every configuration in this article with the Nexus version nxos.9.2.4.bin.



VXLAN BGP evpn

Alt text

Underlay routing with OSPF

You can use any IGP routing protocol to configure the underlay routing.


Leaf1

feature ospf 

router ospf UNDERLAY
router-id 10.0.0.1

int lo0
desc BGP_EVPN_PEERING
ip add 10.0.0.1/32
ip router ospf UNDERLAY area 0 

int eth1/1
desc TO_SPINE1
no switchport
ip add 192.168.1.1/30
ip router ospf UNDERLAY area 0
ip ospf network point-to-point
mtu 9216
no shut

Leaf2

feature ospf 

router ospf UNDERLAY
router-id 10.0.0.2

int lo0
desc BGP_EVPN_PEERING
ip add 10.0.0.2/32
ip router ospf UNDERLAY area 0 

int eth1/1
desc TO_SPINE1
no switchport
ip add 192.168.2.1/30
ip router ospf UNDERLAY area 0
ip ospf network point-to-point
mtu 9216
no shut

Spine

feature ospf 

router ospf UNDERLAY
router-id 10.0.0.3

int lo0
desc BGP_EVPN_PEERING
ip add 10.0.0.3/32
ip router ospf UNDERLAY area 0

int eth1/1
desc TO_LEAF1
no switchport
ip add 192.168.1.2/30
ip router ospf UNDERLAY area 0
ip ospf network point-to-point
mtu 9216
no shut

int eth1/2
desc TO_LEAF2
no switchport
ip add 192.168.2.2/30
ip router ospf UNDERLAY area 0
ip ospf network point-to-point
mtu 9216
no shut

Verify OSPF adjacencies with show ip ospf neighbor.



Multicast with PIM

We use PIM to create a Rendevouz Point which will be the spine. The leaf switches will meet there to form multicast trees.

We will configure the RP address manually since we only have one spine. Normally you would use a phantom RP since you have more than one spine that participates as Rendevouz Point. A phantom RP will choose the RP based on the IP with the highest prefix.


Here is an example of a Phantom RP:

rp-address = 11.11.11.2
Spine1 = 11.11.11.1/29
Spine2 = 11.11.11.3/28

Since both Spines have the rp-address in their subnet but Spine1 has a more specific subnet /29 than Spine2 with /28 Spine1 will be our RP. When Spine1 goes down PIM automatically choose Spine2 as new RP. Let's move on with our config.


Spine

feature pim
ip pim rp-address 11.11.11.2 group-list 239.0.0.0/24 

int lo1
desc RP
ip add 11.11.11.2/29 #could also be another IP like 11.11.11.1/29
ip router ospf UNDERLAY area 0
ip pim sparse-mode

int eth1/1-2
ip pim sparse-mode

Leaf1 and Leaf2

feature pim
ip pim rp-address 11.11.11.2 group-list 239.0.0.0/24 
int eth1/1
ip pim sparse-mode

Verify PIM adjacencies with show ip pim neighbor.



BGP evpn

We will configure the evpn address-family now to create the evpn control plane. Instead of repeating every command for every neighbor I will use templates for certain commands.


Spine

feature bgp
feature nv overlay
nv overlay evpn
feature fabric forwarding

router bgp 65100
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn

template peer TO_LEAFS
remote-as 65100
update-source lo0
address-family ipv4 unicast
send-community
send-community extended
route-reflector-client
address-family l2vpn evpn
send-community
send-community extended
route-reflector-client

neighbor 10.0.0.1
inherit peer TO_LEAFS
neighbor 10.0.0.2
inherit peer TO_LEAFS

Leaf1 and Leaf2

feature bgp
feature nv overlay
nv overlay evpn

router bgp 65100
log-neighbor-changes
address-family ipv4 unicast
address-family l2vpn evpn 

template peer TO_SPINE
remote-as 65100
update-source lo0
address-family ipv4 unicast 
send-community
send-community extended
address-family l2vpn evpn 
send-community
send-community extended

neighbor 10.0.0.3
inherit peer TO_SPINE

Verify basic IPv4 peering with show ip bgp summary.

Verify BGP evpn peering with show bgp l2vpn evpn summary.



VXLAN

Finally, we configure the VXLAN commands. Since the spine doesn't participate in VXLAN and just reflects the routes we do not have to configure anything VXLAN related on the spine.


The config below is the same for leaf1 and leaf2. For leaf2 you only have to use a different IP for the loopback 1 interface. E.g. 10.1.0.2/32.

It doesn't matter if the gateway is on Leaf1 and Leaf2 since everything is exactly two hops away in a Spine-Leaf Topology. That means we will configure the SVI as gateway on every Leaf and use an anycast-gateway-mac.


For inter-vlan traffic (L3 traffic) we will configure a L3VNI which is basically a separate Vlan that the switch will use if he needs to forward traffic to different VLans.

It is important to configure suppress-arp so you prevent the local leaf from using ARP for intra-vlan forwarding.


For inter-vlan traffic in the VRF section and intra-vlan traffic in the evpn section you have to configure Route-targets which are basically access-lists that tell the leaf which vlan he is allowed to import/export. You can always use route-target both auto when you are in the same AS. I commented the L3VNI out. If you also want to do intervlan routing you can configure the L3VNI part.


Leaf1 und Leaf2

feature vn-segment-vlan-based
feature fabric forwarding
feature interface-vlan

fabric forwarding anycast-gateway-mac 0000.1111.2222 #so every vlan has same gateway mac and ip on every switch

vlan 100
  vn-segment  10100

#vlan 200
#  vn-segment  10200

#vlan 999
#  name L3VNI
#  vn-segment 10999


vrf context TNT1
#  vni 10999
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn


interface Vlan100
  no shutdown
  vrf member TNT1
  no ip redirects
  ip address 172.16.100.1/24
  fabric forwarding mode anycast-gateway

#interface Vlan200
#  no shutdown
#  vrf member TNT1
#  no ip redirects
#  ip address 172.16.200.1/24
#  fabric forwarding mode anycast-gateway

#int vlan 999
#vrf member TNT1
#ip forward
#mtu 9216
#no shut


evpn
vni 10100 l2
rd auto
route-target import auto 
route-target export auto
#vni 10200 l2
#rd auto
#route-target import auto 
#route-target export auto


int lo1
ip add 10.1.0.1/32
ip router ospf UNDERLAY area 0
ip pim sparse-mode
ex


#you need the two tcam commands to carv the tcam table. We need to remove space from routing ACLs and give it to the arp-ether so arp suppression will work. Save the config and reload the switch. 
hardware access-list tcam region racl 512 
hardware access-list tcam region arp-ether 256 double-wide 
#show hardware access-list tcam region | grep racl
#show hardware access-list tcam region | grep arp

int nve1 #src interface where packets get sent out to the dest ip which is either the mcast-group IP for BUM traffic or the nve interface on the destination traffic for known unicast traffic
no shut
source-interface lo1
host-reachability protocol bgp 
  member vni 10100
    mcast-group 239.0.0.10
    suppress-arp
#  member vni 10200
#    mcast-group 239.0.0.20
#    suppress-arp
#member vni 10999 associate-vrf


int e1/2
switchport access vlan 100

Now create two endpoints and ping between them so the leafs will fill their evpn table. Verify BGP evpn table with show bgp l2vpn evpn and show nve peers. Also verify with show l2route evpn mac-ip all that the PCs are in the Endpoint table. WIth show ip route vrf TNT1 you can see the internal routes between the leaves.




L3OUT

Alt text

In order to reach external networks that aren't participating in your VXLAN Network you have to select a border leaf and use a route-map to redistribute your VXLAN routes to the external network.

I use OSPF to peer with the external device.


Leaf1

interface Ethernet1/4 #to peer with external network device
  no switchport
  mtu 9216
  vrf member TNT1
  ip address 10.14.15.14/24
  ip ospf network point-to-point
  ip router ospf 1 area 0.0.0.0
  no shutdown

route-map permit-bgp-ospf permit 10
route-map permit-ospf-bgp permit 10

router bgp 65100
  vrf TNT1
    address-family ipv4 unicast
      #advertise l2vpn evpn
      redistribute ospf 1 route-map permit-ospf-bgp #import ospf 1 routes from WAN device to bgp 65100 so that leaf2 has route in his evpn control plane and can reach 100.100.100.100

router ospf 1
  router-id 10.14.15.14
  vrf TNT1
    redistribute direct route-map permit-bgp-ospf #import direct routes for 172.16.100.0 vlan100 into ospf so you can reach vlan 100 through ospf from your external device
    redistribute bgp 65100 route-map permit-bgp-ospf

External device

feature ospf

interface Ethernet1/1
  description to Leaf-1
  no switchport
  mtu 9216
  ip address 10.14.15.15/24
  ip ospf network point-to-point
  no ip ospf passive-interface
  ip router ospf 1 area 0.0.0.0
  no shutdown

interface loopback 0 #only used to test ping
  ip address 100.100.100.100/32
  ip router ospf 1 area 0

router ospf 1
  router-id 10.14.15.15

On the external device use show ip route ospf to check that the evpn routes are reachable through ospf and on your leaves use sh bgp l2vpn evpn summary to check if the 100.100.100.100 route is reachable through evpn




VPC

Virtual Portchannel is a Cisco Nexus only feature where you can form a portchannel on two different device which is incredibly handy for redundancy. I used a Layer 2 switch that is connected to both leaves for this scenario.


Leaf1 and Leaf2

feature vpc
feature lacp

int mgmt0
vrf member management
ip add 172.16.1.1/24 #used for peer keep-alive packets

vpc domain 10
peer-switch
peer-keepalive destination 172.16.1.2 source 172.16.1.1 
peer-gateway
ip arp synchronize

int e1/5
switchport mode trunk
spanning-tree port type network
channel-group 1 mode active

int po1
vpc peer-link

int e1/6 
switchport mode trunk
channel-group 2 mode active

int po2
vpc 2

int lo1
ip add 10.2.2.10/32 secondary #used as VTEP IP in VPC, same on both leaves

route-map permit-static-bgp permit 10

router bgp 65100
vrf TNT1
address-family ipv4 unicast
redistribute static route-map permit-static-bgp
address-family l2vpn evpn
advertise-pip #this is important when you have a border leaf that is in VPC with a non-border leaf so you have to advertise the physical IP instead of the VTEP IP (secondary IP) so the other devices know exactly where the border leaf relies

int nve1
advertise virtual-rmac #must be configured when using the command advertise-pip

For Leaf2 the only config difference is that you configure 172.16.1.2 instead of 172.16.1.1.


L2switch

feature lacp

int e1/1
switchport mode trunk
channel-group 2 mode active
int e1/2
switchport mode trunk
channel-group 2 mode active

Verify with show vpc brief if vpc and port-channel has formed.

BGP Evpn Ingress Replication

NVE = VTEP

Ingress replication or headend replication for VXLAN EVPN is deployed when IP multicast underlay network is not used. It is a unicast approach to handle multi destination trafffic. Handling BUM traffic in a network using ingress replication involves an ingress device (The NVE interface) replicating every BUM packet and sending them as a separate unicast to the remote egress devices.

This approach also is used in Cisco ACI Multisite where the VTEPs will replicate every BUM packet and sending them as unicast to the remote VTEP. The VTEPs are basically the NVE interfaces in ACI.

l2vpn evpn
 replication-type ingress

l2vpn evpn instance 1 vlan-based 
 encapsulation vxlan

int nve1
 source-interface lo1
 host-reachability protocol bgp 
 member vni 10100 ingress-replication #instead of the multicast config

#show l2vpn evpn evi 1 detail 

Thanks for reading my article. If you have any questions or recommendations you can message me via arvednetblog@gmail.com.