VXLAN BGP evpn Multipod and Multisite

Configure VXLAN BGP evpn Multipod and Multisite

Alt text

This article uses the VXLAN BGP evpn config from my previous VXLAN BGP evpn article. I have done every configuration in this article with the Nexus version nxos.9.2.4.bin.

Multipod

Multipod is used when you want to connect two Datacenters together and use the same Control Plane. The connection is done from Spine to Spine.

You basically have to tell the Spines to not alter packets from the remote pod.


Spine1 and Spine2

int e1/3 #connection to remote spine
no switchport
mtu 9216
no ip redirects
ip add 100.200.0.1/30
ip ospf network point-to-point
ip router ospf UNDERLAY area 0
ip pim sparse-mode
no shut

route-map MultiPod-VXLAN permit 10 
set ip next-hop unchanged #to let the original leaf from the remote pod as next-hop and not the spine that forwards the packet to his own pod leaf

routber bgp 65100
  neighbor 10.0.0.13
    remote-as 65101
    update-source loopback0
    ebgp-multihop 10 #increase ttl to 10 so also the loopbacks interfaces of directly connected neighbors can peer
    address-family ipv4 unicast
    address-family l2vpn evpn
      send-community
      send-community extended
      route-map MultiPod-VXLAN out

Verify BGP evpn with show bgp l2vpn epvn summary.

The config for Spine1 and Spine2 is the same except the IPs for ospf/bgp peering.


The only thing you have to configure on the Leaves are the route-targets from the remote pod.


all Leaves

vrf context TNT1 #for l3vni (intervlan)
vni 10999
address-family ipv4 unicast
route-target import 65101:10999 #you want to import the route-targets of the remote AS, change to 65100 for AS65101
route-target import 65101:10999 evpn

evpn #for l2vni (intravlan)
vni 10100 l2
route-target import 65101:10100 #change to 65100 for AS65101

Now create an endpoint in each pod and try to ping eachother and verify the evpn routes with the command show bgp l2vpn evpn.




Multisite

Alt text

In Multisite each Pod has its own control plane (MP-BGP evpn) and uses unicast replication between them which means that instead of using a multicast tree to flood your packet you will encapsulate the multicast packet and forward it as unicast packet to the destination leaf.


Instead of connecting the Spines together as in Multipod you use BGW Leaf (Border Gateway) that connect to the DCI (Datacenter interconnector) which could be your ISP. You could connect the BGW Leaves directly together but normaly you have a DCI between them and use a separate device as BGW Leaf.


Underlay OSPF/MP-BGP/PIM

BGW1

feature bgp
feature ospf
feature pim
feature nv overlay
nv overlay evpn
feature interface-vlan
feature vn-segment-vlan-based

evpn multisite border-gateway 100 #all bgw on same site need same id

router ospf UNDERLAY
router-id 10.10.1.10/32

int e1/1 #OSPF peering intrasite
desc OSPF SITE-INTERNAL INTERFACE
  no switchport
  mtu 9216
  no ip redirects
  ip address 192.168.10.1/30
  ip ospf network point-to-point
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
    evpn multisite fabric-tracking
  no shutdown

int e1/2 #BGP peering intersite
desc BGP
no switchport
no shut
mtu 9216
ip add 192.168.101.1/30 tag 54321
evpn multisite dci-tracking #to monitor interface

int lo0 #ipv4/evpn bgp peering
desc BGP peering
ip add 10.10.1.10/32 tag 54321 #tag is used to advertise network over a route-map in your bgp proccess instead of declaring every single network you want to advertise
ip router ospf UNDERLAY area 0 
ip pim sparse-mode #only for intra communication of BUM traffic

router bgp 65100
router-id 10.10.1.10 #lo0
address-family ipv4 unicast
maximum-paths 4
address-family l2vpn evpn 

neighbor 192.168.101.2 #peer to DCI e1/1 ipv4
remote-as 65200
update-source e1/2
address-family ipv4 unicast

neighbor 10.0.0.3 #peer to spine1 lo0 evpn
remote-as 65100
update-source lo0
address-family l2vpn evpn
send-community
send-community extended

neighbor 10.10.2.10 #peer to BGW2 lo0 evpn
remote-as 65101
update-source lo0
ebgp-multihop 2
peer-type fabric-external #enables the next hop rewrite for multi-site
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn #this command changes the incoming route target’s AS number to match the BGP-configured neighbor’s remote AS number. So we don't have the AS of the DCI as RT.

ip pim rp-address 11.11.11.2 group-list 239.0.0.0/24 

The BGW2 config on the remote pod is basically the same but with different IPs/IDs.


BGW2

feature bgp
feature ospf
feature pim
feature nv overlay
nv overlay evpn
feature interface-vlan
feature vn-segment-vlan-based

evpn multisite border-gateway 200

router ospf UNDERLAY
router-id 10.10.2.10/32

int e1/1 #OSPF peering intrasite
desc OSPF SITE-INTERNAL INTERFACE
  no switchport
  mtu 9216
  no ip redirects
  ip address 192.168.20.1/30
  ip ospf network point-to-point
  ip router ospf UNDERLAY area 0
  ip pim sparse-mode
    evpn multisite fabric-tracking
  no shutdown

int e1/2
desc BGP
no switchport
no shut
mtu 9216
ip add 192.168.201.1/30 tag 54321
evpn multisite dci-tracking

int lo0 
desc BGP peering
ip add 10.10.2.10/32 tag 54321
ip router ospf UNDERLAY area 0 
ip pim sparse-mode

router bgp 65101
router-id 10.10.2.10 #lo0
address-family ipv4 unicast
maximum-paths 4
address-family l2vpn evpn

neighbor 192.168.201.2 #peer to DCI e1/1 ipv4
remote-as 65200
update-source e1/2
address-family ipv4 unicast

neighbor 10.0.0.5 #peer to spine1 lo0 evpn
remote-as 65101
update-source lo0
address-family l2vpn evpn
send-community
send-community extended

neighbor 10.10.1.10 #peer to BGW1 lo0 evpn
remote-as 65100
update-source lo0
ebgp-multihop 2
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended
rewrite-evpn-rt-asn

ip pim rp-address 11.11.11.2 group-list 239.0.0.0/24 

The Spine just need to peer to the new Leaf.


Spines

int e1/3
  description OSPF
  no switchport
  mtu 9216
  ip address 192.168.20.2/30
  ip ospf network point-to-point
  ip router ospf UNDERLAY area 0.0.0.0
  ip pim sparse-mode
  no shutdown
router bgp 65101 #65100 for spine1
  neighbor 10.10.2.10 #10.10.1.10 for spine1
    inherit peer TO_LEAFS

The DCI is the connector between both Pods and nothing more. The BGW Leaves basically need to see eachother directly as peer without the DCI between them which means that the DCI needs to keep all route-targets and send it to the remote pod and also not advertise itself as next hop in evpn.


DCI

feature bgp
nv overlay evpn

int e1/1
desc BGP
no switchport
no shut
mtu 9216
ip add 192.168.101.2/30

int e1/2
desc BGP
no switchport
no shut
mtu 9216
ip add 192.168.201.2/30
no shut

int lo0
ip add 100.100.100.100/32

route-map UNCHANGED permit 10
  set ip next-hop unchanged #without Next-Hop reachability, BGP-learned route will not be injected into BGP so in order that BGW1 and BGW2 see their routes in EVPN they need to see them directly as next-hop and not the DCI between them
 
router bgp 65200
address-family ipv4 unicast
network 100.100.100.100/32
maximum-paths 64 #allows to install multiple paths in the RIB for load-balancing.
maximum-paths ibgp 64
address-family l2vpn evpn
  retain route-target all #keep all route-targets when forwarding to BGW2

neighbor 192.168.101.1 #BGW1 ipv4 e1/2
remote-as 65100
update-source e1/1
address-family ipv4 unicast
next-hop-self #the router will advertise itself as next-hop cause by default bgp routes doesnt get changed and he would install a next-hop that he cant reach

neighbor 192.168.201.1 #BGW2 ipv4 e1/2
remote-as 65101
update-source e1/2
address-family ipv4 unicast
next-hop-self

template peer OVERLAY-PEERING
update-source loopback0
ebgp-multihop 5
address-family l2vpn evpn
send-community both
route-map UNCHANGED out #dont overwrite next hop

neighbor 10.10.1.10 #peering to BGW1 lo0 
remote-as 65100
inherit peer OVERLAY-PEERING
address-family l2vpn evpn
rewrite-evpn-rt-asn 

neighbor 10.10.2.10 #peering to BGW2 lo0 
remote-as 65101
inherit peer OVERLAY-PEERING
address-family l2vpn evpn
rewrite-evpn-rt-asn

Overlay VXLAN

Now most of the config is the same as in my VXLAN BGP evpn article. We have to create a new loopback lo100 that will be used as source IP for multisite traffic


BGW1

evpn storm-control broadcast level 10 #means that max. 10% of available bandwidth will be used for unknown unicast traffic
evpn storm-control multicast level 10
evpn storm-control unicast level 10

int lo1 #L3VNI scr ip intra traffic
desc NVE src ip intra traffic
ip add 10.200.200.21/32 tag 54321
ip router ospf UNDERLAY area 0 
ip pim sparse-mode

int lo100 #L3VNI src ip multisite
desc EVPN Multi-Site source interface 
ip add 10.111.111.1/32 tag 54321 #tag used together with route-map
ip router ospf UNDERLAY area 0
ip pim sparse-mode

route-map RMAP-REDIST-DIRECT permit 10 #to tag ips so you can declare networks easier in bgp process
match tag 54321

router bgp 65100
address-family ipv4 unicast
redistribute direct route-map RMAP-REDIST-DIRECT #redistribute the ips with tag to BGP
neighbor 100.100.100.100 #DCI lo0 bgp evpn peering
remote-as 65200
update-source lo0
ebgp-multihop 5 #ttl 5 in case dci or bgw is multiple hops away
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended 
rewrite-evpn-rt-asn 
vlan 100
  vn-segment 10100

vlan 999
  name L3VNI
  vn-segment 10999

vrf context TNT1
  vni 10999
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface Vlan999
  no shutdown
  mtu 9216
  vrf member TNT1
  ip forward
  no ip redirects #to prevent icmp redirects that are used to advertise a more optimal route

hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback1
  multisite border-gateway interface loopback100 #src int for multisite
  member vni 10100
    suppress-arp
    multisite ingress-replication #multicast to unicast, replicating every BUM packet and sending them as a separate unicast to the remote egress devices
    mcast-group 239.0.0.10
  member vni 10999 associate-vrf

evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target import 65101:10100
    route-target export auto


BGW2

evpn storm-control broadcast level 10
evpn storm-control multicast level 10
evpn storm-control unicast level 10

int lo1 #L3VNI scr ip intra traffic
desc NVE src ip intra traffic
ip add 10.200.200.22/32 tag 54321
ip router ospf UNDERLAY area 0 
ip pim sparse-mode

int lo100 #L3VNI src ip multisite
desc EVPN Multi-Site source interface 
ip add 10.111.111.2/32 tag 54321 #tag used together with route-map
ip router ospf UNDERLAY area 0
ip pim sparse-mode

route-map RMAP-REDIST-DIRECT permit 10 #to tag ips so you can declare networks easier in bgp process
match tag 54321

router bgp 65101
address-family ipv4 unicast
redistribute direct route-map RMAP-REDIST-DIRECT #redistribute the ips with tag to BGP
neighbor 100.100.100.100 #DCI lo0 bgp evpn peering
remote-as 65200
update-source lo0
ebgp-multihop 5 #ttl 5 in case dci or bgw is multiple hops away
peer-type fabric-external
address-family l2vpn evpn
send-community
send-community extended 
rewrite-evpn-rt-asn

vlan 100
  vn-segment 10100

vlan 999
  name L3VNI
  vn-segment 10999

vrf context TNT1
  vni 10999
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn

interface Vlan999
  no shutdown
  mtu 9216
  vrf member TNT1
  ip forward
  no ip redirects #It keeps the router from sending redirect messages to clients (ICMP). These are for when I router would know a more optimal path for a client to take rather than taking itself

hardware access-list tcam region racl 512
hardware access-list tcam region arp-ether 256 double-wide

interface nve1
  no shutdown
  host-reachability protocol bgp
  source-interface loopback1
  multisite border-gateway interface loopback100 #src int for multisite
  member vni 10100
    suppress-arp
    multisite ingress-replication #multicast to unicast, replicating every BUM packet and sending them as a separate unicast to the remote egress devices
    mcast-group 239.0.0.10
  member vni 10999 associate-vrf

evpn
  vni 10100 l2
    rd auto
    route-target import auto
    route-target import 65100:10100
    route-target export auto

Now create an endpoint in each pod and try to ping eachother and verify the evpn routes with the command show bgp l2vpn evpn.


Thanks for reading my article. If you have any questions or recommendations you can message me via arvednetblog@gmail.com.