Saturday, June 29, 2019

Spine and Leaf Practical Applications, The IP Portability Problem

So far, this is all great, but it IS missing something...

Networks are useless unless you do something with it. In most cases, a device (in this case, we'll use a server) needs to connect via redundant links to a common Layer 2 segment.

Why is this?


Well, most servers are incapable of dynamic routing. Instead, the server (which is a perfectly capable router as far as forwarding is concerned) simply has a static route (default gateway) that is used for all Layer 3 forwarding. This is not really a deal breaker for Clos fabrics - there are a few ways to solve this problem - and several of them intermix really well:

The VMWare Way

This is probably the most achievable. It's not really a Clos fabric, due to some deficiencies (ESXi doesn't do BGP yet) that will probably be resolved at some point, but it is close enough to achieve our goals. Let's review those
  • Make change frictionless and low risk so network changes can be on-demand (The Change Problem)
  • Ensure that the network utilizes all links, with consistent forwarding (The Loop Problem)
The primary value proposition with Layer 3 Leaf Spine (a Clos implementation) is to leverage a consistent 3-stage (leaf, spine, leaf) forwarding topology where all links have the same exact latency and link speed. This, along with some other features (ECMP support being the big one) allows for N-scaling leaf-to-leaf communications - you can have 1,2,..64 spines in a network. 

Cisco really pushed this to the limit, publishing a paper on a reference implementation where leveraging 16+ spines actually saved money versus using QSFP+ capable devices. The conclusion is somewhat dated due to QSFP28 coming out and being more affordable, but the takeaway should be the same - BGP/IS-IS are built to facilitate tens of thousands of network nodes in irregular topologies. Datacenter networks with hundreds of switches don't really hold a candle to that, but we can use this overkill to our advantage.

VMWare is also now on board with this topology, because they're starting to solve the routing problem with NSX. The currently published reference architecture (VMWare Validated Design 5.0 at the time of this post) featured a new compromise with version 4.0 - linking ToR switches into pairs more like a traditional switch deployment, with VLANs subtending the leafs to provide server reachability. 

There's a problem here - how do you get virtual machines to keep their IP address when moving between ToR pairs?

This is where NSX comes in. NSX-V/T both provide overlay networking, where dynamically pinned tunnel adapters (like GRE on ubersteroids) manage membership for virtual network segments inside of an encapsulation method (VXLAN/GENEVE) providing a fully virtualized Layer 2 segment, portable anywhere. This ensures that the only thing that isn't portable is the servers, which is good enough for now.

VMWare's approach isn't "pure" (whatever that means) but when revisiting our goals here (to provide an ultra-stable, change-friendly datacenter network), it does meet our needs and provides a demarcation point where the changes are substantially simpler as far as the datacenter fabric is concerned. If NSX breaks on a host, you may lose part of a vn-segment or a few VMs at worst. The fabric failing is far more disastrous.

Pros:
  • Change risk is low due to distributing the work
  • Highly flexible
  • NSX-T can run on things that aren't ESXi

Con:

You're going to need to blur the line between "Network" and "Systems". I've seen a pattern in many prod environments - where an organization's networking team will manage the vSphere Distributed/Standard Switches to ensure that switch and host are well-integrated. If this model is not one that is organizationally feasible, you'll have a difficult time with NSX. Even if it isn't, your Network/Systems team must cross-train.

The Cisco / Big Switch Way

Another option is to fully offload the responsibility of overlay networking onto the datacenter fabric, maintaining a "pure" Clos topology, and handling the ToR bonding in software with the same overlay technology. I'm keeping this at a high level because, honestly, I haven't worked anywhere large enough to benefit yet.

Pros:
  • Works on just about anything
  • Usually comes with an automated lifecycle management and provisioning platform
  • You can keep your network and systems teams separated

Cons:
  • If it's not a use case your vendor anticipated, you don't get the flexibility you need
  • Vendor lock-in is basically guaranteed
  • Doesn't run on generic hardware
  • $$$

The Mad Scientist Way

...just install a routing package on every virtual machine, docker host, or virtualization host. Run DHCP for the initial address issuance, and then run OSPF or BGP with a dynamic range, and then advertise a loopback address for the service you're offering. 

It's not actually all that hard. If you have a linux OS, you just need to install a software package to run a routing service. If you're a systems guy, this is easier than setting up a LEMP stack! Here are some examples of open-source, publically available software that will perform this task:
My new favorite is Free Range Routing. It's under the hood of VMWare's new version of NSX (NSX-T) Cumulus Networks, and a ton of other stuff. It's the most feature complete of this list, performing tasks that you'd normally pay far more for (Cisco still has BGP as an add-on license). 

One neat thing this can provide is the concept of anycast network services. For a stateless service like DHCP, DNS, etc it is possible to leverage one of these daemons to advertise a common address. Instead of searching for the correct server or assembling a shortlist of DNS services, clients can simply ask for the nearest DNS server - this is how many exascale DNS implementations work, like:
1.1.1.1 (CloudFlare)
8.8.8.8 (Google)
9.9.9.9 (Quad9)

The downside to this approach is that you have no formal support whatsoever - which is a pretty big con. The good news is that there are some commercially viable host-routing products out there like Cumulus' host pack (white paper here) Eventually, products like this will be run as a plug-in on common hypervisors.

Conclusion

Why and where should I try this?

Let's keep this simple - applied Clos datacenter fabrics will require some level of solution design - it cannot simply be forklifted in the datacenter, but many products are available today that will solve the near-term issues. Few of these implementations are perfect, so design for iterative improvement, leave extra ports at the datacenter perimeter for new versions, etc.

What routing protocol, technology should I use?

Use what you know. The lab examples I provided in this block were manufactured in 2002. If it's layer 3 and familiar, use it. We only have one hard requirement - fast Layer 3 switching.

With routing protocols, use what you know - if a protocol is unfamiliar, you'll have a difficult time supporting it. There's nothing wrong with running OSPF (or even RIP!) for these purposes. My personal favorite is actually running two - either IS-IS or OSPF combined with BGP - but this is driven by a few requirements I have for the future:
  • NSX-T only supports BGP
  • BGP is the way to go for highly scalable deployments
  • Any carrier network engineer will feel at home using it
This concludes this part on Clos networking. Later, I might even apply it!

Saturday, June 22, 2019

Spine and Leaf Practical Applications, EGP and IGP combined!

So far, all examples to date have been extremely simple, and stand rather well on their own (OSPF Clos would work just FINE in a campus network if you have cheap L3) but may not effectively address some more advanced use cases.

In short, we're about to enter niche territory.

Generally speaking, BGP is regarded as the most advanced networking topic out there, until it isn't. Most of the complexity lies in iBGP (same Autonomous System everywhere), because BGP's primary loop prevention mechanism is AS-Path (count on the number of AS this route transits).

To successfully implement iBGP (a staple of every carrier, so totally do-able), a network engineer must choose one of the following paths (non-inclusive list):

In this case, the more scalable option is to opt for a route reflector, but it isn't that easy. 

iBGP, when compared to eBGP:
  • Doesn't care about hop count to a peer speaker. As long as the route reflector is less than 255 hops away, there is no issue
  • Usually doesn't care about resolving paths to peer speakers - that's a problem for something else
To provide a good example of an expandable, scalable fabric that can offer eBGP as a service to subtending network devices, we will implement IS-IS as the intra-fabric routing protocol, and then leverage iBGP with the spine switches as route reflectors.

First things first, diagram is here: (YAML)

First, we'd configure the spines. Note that future releases (my home lab is rockin' IOS 12.2.55, too old for this) the BGP Dynamic Neighbor that comes with more modern network operating systems is really useful.

Note: route reflector client status is configured on the SERVER side:

router bgp 65000
 bgp log-neighbor-changes
 neighbor 10.6.0.0 remote-as 65000
 neighbor 10.6.0.0 update-source L0
 neighbor 10.6.0.0 route-reflector-client
 neighbor 10.6.0.1 remote-as 65000
 neighbor 10.6.0.1 update-source L0
 neighbor 10.6.0.1 route-reflector-client
 neighbor FD00:6::0 remote-as 65000
 neighbor FD00:6::0 update-source L0
 neighbor FD00:6::0 route-reflector-client
 neighbor FD00:6::1 remote-as 65000
 neighbor FD00:6::1 update-source L0
 neighbor FD00:6::1 route-reflector-client
 maximum-paths 2
 !
 address-family ipv4
  neighbor 10.6.0.0 activate
  neighbor 10.6.0.1 activate
  maximum-paths 2
  no auto-summary
  network 10.6.0.240 mask 255.255.255.254
  network 10.6.240.0 mask 255.255.255.254
  network 10.6.240.2 mask 255.255.255.254
 exit-address-family
 !
 address-family ipv6
  neighbor FD00:6::0 activate
  neighbor FD00:6::1 activate
  network FD00:6::240/127
  network FD00:6:240::/126
  network FD00:6:240::4/126
  no synchronization
  maximum-paths 2
 exit-address-family
!
And then the leaf configuration:

router bgp 65000
 bgp log-neighbor-changes
 neighbor 10.6.0.240 remote-as 65000
 neighbor 10.6.0.240 update-source L0
 neighbor 10.6.0.241 remote-as 65000
 neighbor 10.6.0.241 update-source L0
 neighbor FD00:6::240 remote-as 65000
 neighbor FD00:6::240 update-source L0
 neighbor FD00:6::241 remote-as 65000
 neighbor FD00:6::241 update-source L0
 maximum-paths 2
 !
 address-family ipv4
  neighbor 10.6.0.240 activate
  neighbor 10.6.0.241 activate
  maximum-paths 2
  no auto-summary
  network 10.6.0.240 mask 255.255.255.254
  network 10.6.240.0 mask 255.255.255.254
  network 10.6.240.2 mask 255.255.255.254
 exit-address-family
 !
 address-family ipv6
  neighbor FD00:6::240 activate
  neighbor FD00:6::241 activate
  network FD00:6::240/127
  network FD00:6:240::/126
  network FD00:6:240::4/126
  no synchronization
  maximum-paths 2
 exit-address-family
!

Note that no BGP peers are up yet - and BGP knows what the problem is, too!

bgp-rr0-s0#show ip bgp
06:12:15: %SYS-5-CONFIG_I: Configured from console by consoleneigh
BGP neighbor is 10.6.0.0,  remote AS 65000, internal link
  BGP version 4, remote router ID 0.0.0.0
  BGP state = Active
  Last read 00:03:02, last write 00:03:02, hold time is 180, keepalive interval is 60 seconds
  Message statistics:
    InQ depth is 0
    OutQ depth is 0
                         Sent       Rcvd
    Opens:                  0          0
    Notifications:          0          0
    Updates:                0          0
    Keepalives:             0          0
    Route Refresh:          0          0
    Total:                  0          0
  Default minimum time between advertisement runs is 0 seconds

 For address family: IPv4 Unicast
  BGP table version 1, neighbor version 0/0
  Output queue size : 0
  Index 1, Offset 0, Mask 0x2
  Route-Reflector Client
  1 update-group member
                                 Sent       Rcvd
  Prefix activity:               ----       ----
    Prefixes Current:               0          0
    Prefixes Total:                 0          0
    Implicit Withdraw:              0          0
    Explicit Withdraw:              0          0
    Used as bestpath:             n/a          0
    Used as multipath:            n/a          0

                                   Outbound    Inbound
  Local Policy Denied Prefixes:    --------    -------
    Total:                                0          0
  Number of NLRIs in the update sent: max 0, min 0

  Address tracking is enabled, the RIB does not have a route to 10.6.0.0
  Address tracking requires at least a /0 route to the peer
  Connections established 0; dropped 0
  Last reset never
  Transport(tcp) path-mtu-discovery is enabled
  No active TCP connection
Note how it says the RIB does not have a route to 10.6.0.0 - that's because iBGP doesn't resolve next-hops for us. Let's fix it by rolling out an Interior Gateway Protocol (IGP) to support iBGP here. I'm using IS-IS for a few reasons - namely:

  • Like BGP, one routing protocol for both IPv4 and IPv6
  • Selective flooding with ISPF
  • I'm too hip for OSPF now


router isis CLOS-1
 net 42.0000.0000.0000.0240.00
 is-type level-2-only
 ispf level-2
 log-adjacency-changes
!
interface 
ip router isis CLOS-1
interface Loopback0
ip router isis CLOS-1
This is applied to every router, while changing the net-ID for each device.It's fun watching the adjacencies pop up, so I'll add that here too.

*Mar  1 06:31:34.670: %CLNS-5-ADJCHANGE: ISIS: Adjacency to 0000.0000.0240 (FastEthernet0/22) Up, new adjacency
*Mar  1 06:31:34.670: %CLNS-5-ADJCHANGE: ISIS: Adjacency to 0000.0000.0241 (FastEthernet0/23) Up, new adjacency 
*Mar  1 06:31:41.565: %BGP-5-ADJCHANGE: neighbor 10.6.0.240 Up
*Mar  1 06:31:47.169: %BGP-5-ADJCHANGE: neighbor 10.6.0.241 Up
Note how BGP pops up immediately after IS-IS resolves the next-hop for the loopback in this case. Sadly, it doesn't look like my ancient lab switches support IS-IS for IPv6 - so I'll add OSPFv3 for the next topic- actually using Clos in a datacenter network

bgp-rr0-s0#show ip bgp sum
BGP router identifier 10.6.0.240, local AS number 65000
BGP table version is 3, main routing table version 3
2 network entries using 234 bytes of memory
4 path entries using 208 bytes of memory
3/1 BGP path/bestpath attribute entries using 420 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 862 total bytes of memory
BGP activity 4/0 prefixes, 6/0 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.6.0.0        4 65000       8       7        3    0    0 00:03:03        1
10.6.0.1        4 65000       6       5        3    0    0 00:01:51        1
FD00:6::        4 65000       0       0        0    0    0 never    Active
FD00:6::1       4 65000       0       0        0    0    0 never    Active
Configurations generated by this lab, if you want to replicate are here.

Saturday, June 8, 2019

Spine and Leaf Practical Applications, eBGP

Overview

First off, here's the reference diagram (YAML):

Assumptions about difficulty

Most people I've met outside of the carrier space are pretty intimidated by BGP, as it is truly impressive in scope. Here, we're going to break-out BGP usage into two categories:


  • iBGP: This is where all nodes have the same Autonomous system number. A great deal of complexity exists with this deployment model, because BGP's primary loop prevention mechanism is a string with all of the autonomous system numbers to that route, counting each entry as a "hop" as it were.
  • eBGP: Every single device has its own ASN. Loops are easy to prevent by simply reading the AS-Path.
eBGP is not very difficult to learn.

This is worthwhile, because BGP has a pretty substantial strength within data center networks, and that is an emphasis on reliability.

I'm not going to be doing a deep-dive on BGP here - but can recommend some truly excellent resources on the subject:

How is BGP different from IGPs like OSPF, EIGRP?

First, we must examine some key differences between BGP and IGPs:
  • IGPs are multicast-based, and dynamically generate peers. BGP is TCP-based and needs statically defined peers (note: you can define a dynamic range, which in a future example will be truly valuable)
  • EIGRP has one area, OSPF generally supports up to 16 without getting specific hardware. BGP supports 65,536 with 2-byte ASNs, or 4,294,967,295
  • IGPs are designed to trust their routing protocol peers to prevent loops, while BGP is designed to control route advertisement
  • IGPs (other than IS-IS, of course) only support IP-based address families, while MP-BGP can support any number of units defined as "Network Layer Reachability Information," making it extensible in numerous ways like EVPN or Segment Routing, or even MPLS. The key thematic point here is that BGP behaves more like a distributed database than a routing protocol would normally.
  • IGPs value fast reconvergence, while BGP values reliable reconvergence. It's slow moving, but is extremely change-friendly.

Applying Concepts

In a controlled environment, like a Clos fabric, eBGP is pretty easy to setup, troubleshoot, and maintain. So let's get started!

First, we configure the spines with the appropriate AS and neighbors. It looks like there's a lot going on here, but that's simply because we're running two address-families: IPv4 and IPv6:


bgp-as65000-s0#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
bgp-as65000-s0(config)#
router bgp 65000
 bgp log-neighbor-changes
 neighbor 10.6.240.1 remote-as 64900
 neighbor 10.6.240.1 update-source FastEthernet0/24
 neighbor 10.6.240.3 remote-as 64901
 neighbor 10.6.240.3 update-source FastEthernet0/22
 neighbor FD00:6:240::2 remote-as 64900
 neighbor FD00:6:240::2 update-source FastEthernet0/24
 neighbor FD00:6:240::6 remote-as 64901
 neighbor FD00:6:240::6 update-source FastEthernet0/22
 maximum-paths 2
 !
 address-family ipv4
  neighbor 10.6.240.1 activate
  neighbor 10.6.240.3 activate
  no neighbor FD00:6:240::2 activate
  no neighbor FD00:6:240::6 activate
  maximum-paths 2
  no auto-summary
  no synchronization
 exit-address-family
 !
 address-family ipv6
  neighbor FD00:6:240::2 activate
  neighbor FD00:6:240::6 activate
 exit-address-family

bgp-as65001-s1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
bgp-as65001-s1(config)#
router bgp 65001
 bgp log-neighbor-changes
 neighbor 10.6.241.1 remote-as 64900
 neighbor 10.6.241.1 update-source FastEthernet0/21
 neighbor 10.6.241.3 remote-as 64901
 neighbor 10.6.241.3 update-source FastEthernet0/23
 neighbor FD00:6:241::2 remote-as 64900
 neighbor FD00:6:241::2 update-source FastEthernet0/21
 neighbor FD00:6:241::6 remote-as 64901
 neighbor FD00:6:241::6 update-source FastEthernet0/23
 maximum-paths 2
 !
 address-family ipv4
  neighbor 10.6.241.1 activate
  neighbor 10.6.241.3 activate
  no neighbor FD00:6:241::2 activate
  no neighbor FD00:6:241::6 activate
  maximum-paths 2
  no auto-summary
  no synchronization
 exit-address-family
 !
 address-family ipv6
  neighbor FD00:6:241::2 activate
  neighbor FD00:6:241::6 activate
 exit-address-family

And then the leafs:

bgp-as64900-l0#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
bgp-as64900-l0(config)#
router bgp 64900
 bgp log-neighbor-changes
 neighbor 10.6.240.0 remote-as 65000
 neighbor 10.6.240.0 update-source FastEthernet1/0/24
 neighbor 10.6.241.0 remote-as 65001
 neighbor 10.6.241.0 update-source FastEthernet1/0/21
 neighbor FD00:6:240::1 remote-as 65000
 neighbor FD00:6:240::1 update-source FastEthernet1/0/24
 neighbor FD00:6:241::1 remote-as 65001
 neighbor FD00:6:241::1 update-source FastEthernet1/0/21
 maximum-paths 2
 !
 address-family ipv4
  neighbor 10.6.240.0 activate
  neighbor 10.6.241.0 activate
  no neighbor FD00:6:240::1 activate
  no neighbor FD00:6:241::1 activate
  maximum-paths 2
  no auto-summary
  no synchronization
 exit-address-family
 !
 address-family ipv6
  neighbor FD00:6:240::1 activate
  neighbor FD00:6:241::1 activate
 exit-address-family

bgp-as64901-l1#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
bgp-as64901-l1(config)#
router bgp 64901
 bgp log-neighbor-changes
 neighbor 10.6.240.2 remote-as 65000
 neighbor 10.6.240.2 update-source FastEthernet0/22
 neighbor 10.6.241.2 remote-as 65001
 neighbor 10.6.241.2 update-source FastEthernet0/23
 neighbor FD00:6:240::5 remote-as 65000
 neighbor FD00:6:240::5 update-source FastEthernet0/22
 neighbor FD00:6:241::5 remote-as 65001
 neighbor FD00:6:241::5 update-source FastEthernet0/23
 maximum-paths 2
 !
 address-family ipv4
  neighbor 10.6.240.2 activate
  neighbor 10.6.241.2 activate
  no neighbor FD00:6:240::5 activate
  no neighbor FD00:6:241::5 activate
  maximum-paths 2
  no auto-summary
  no synchronization
 exit-address-family
 !
 address-family ipv6
  neighbor FD00:6:240::5 activate
  neighbor FD00:6:241::5 activate
 exit-address-family

We can now verify that all peers are up with both stacks:

bgp-as65000-s0#show ip bgp sum
BGP router identifier 10.6.0.240, local AS number 65000
BGP table version is 1, main routing table version 1

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.6.240.1      4 64900      23      23        1    0    0 00:20:39        0
10.6.240.3      4 64901      19      18        1    0    0 00:17:04        0
bgp-as65000-s0#show bgp ipv6 unicast summary
BGP router identifier 10.6.0.240, local AS number 65000
BGP table version is 1, main routing table version 1

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
FD00:6:240::2   4 64900      13      12        1    0    0 00:10:17        0
FD00:6:240::6   4 64901       9       9        1    0    0 00:06:30        0
We do still have a problem - there are no prefixes received! Let's fix that by adding network statements to all relevant devices. In the demo equipment, the network statement must be an exact match to advertise.
Network statements are not required for interfaces, as in this case, multicast is not used for peer discovery:

bgp-as64900-l0(config)#router bgp 64900
bgp-as64900-l0(config-router)#address-family ipv4
bgp-as64900-l0(config-router-af)#network 10.6.0.0 mask 255.255.255.255
After this is completed, we'll see more routes - note that the above step must be repeated on the spines for all applicable networks, to ensure end to end reachability. This hardware does not appear to support ECMP for IPv6.

bgp-as64900-l0#show ip bgp sum
BGP router identifier 10.6.0.0, local AS number 64900
BGP table version is 13, main routing table version 13
8 network entries using 936 bytes of memory
9 path entries using 468 bytes of memory
8/4 BGP path/bestpath attribute entries using 1120 bytes of memory
6 BGP AS-PATH entries using 144 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 2668 total bytes of memory
BGP activity 16/0 prefixes, 22/1 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.6.240.0      4 65000      47      47       13    0    0 00:40:51        4
10.6.241.0      4 65001      47      45       13    0    0 00:40:03        4

bgp-as64900-l0#show bgp ipv6 unicast summary
BGP router identifier 10.6.0.0, local AS number 64900
BGP table version is 10, main routing table version 10
8 network entries using 1128 bytes of memory
12 path entries using 912 bytes of memory
8/4 BGP path/bestpath attribute entries using 1120 bytes of memory
6 BGP AS-PATH entries using 144 bytes of memory
0 BGP route-map cache entries using 0 bytes of memory
0 BGP filter-list cache entries using 0 bytes of memory
BGP using 3304 total bytes of memory
BGP activity 16/0 prefixes, 22/1 paths, scan interval 60 secs

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
FD00:6:240::1   4 65000      44      43       10    0    0 00:37:56        5
FD00:6:241::1   4 65001      43      43       10    0    0 00:37:18        6


bgp-as64900-l0#show ipv6 ro
IPv6 Routing Table - Default - 11 entries
Codes: C - Connected, L - Local, S - Static, U - Per-user Static route
       B - BGP, R - RIP, D - EIGRP, EX - EIGRP external
       ND - Neighbor Discovery
       O - OSPF Intra, OI - OSPF Inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2
       ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2
LC  FD00:6::/128 [0/0]
     via Loopback0, receive
B   FD00:6::1/128 [20/0]
     via FE80::216:C8FF:FE04:4742, FastEthernet1/0/24
B   FD00:6::240/128 [20/0]
     via FE80::216:C8FF:FE04:4742, FastEthernet1/0/24
B   FD00:6::241/128 [20/0]
     via FE80::223:4FF:FE42:F3C1, FastEthernet1/0/21
C   FD00:6:240::/126 [0/0]
     via FastEthernet1/0/24, directly connected
L   FD00:6:240::2/128 [0/0]
     via FastEthernet1/0/24, receive
B   FD00:6:240::4/126 [20/0]
     via FE80::216:C8FF:FE04:4742, FastEthernet1/0/24
C   FD00:6:241::/126 [0/0]
     via FastEthernet1/0/21, directly connected
L   FD00:6:241::2/128 [0/0]
     via FastEthernet1/0/21, receive
B   FD00:6:241::4/126 [20/0]
     via FE80::223:4FF:FE42:F3C1, FastEthernet1/0/21
L   FF00::/8 [0/0]
     via Null0, receive
bgp-as64900-l0#show ip ro
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
C       10.6.0.0/32 is directly connected, Loopback0
B       10.6.0.1/32 [20/0] via 10.6.240.0, 00:12:49
C       10.6.240.0/31 is directly connected, FastEthernet1/0/24
B       10.6.0.240/32 [20/0] via 10.6.240.0, 00:10:03
C       10.6.241.0/31 is directly connected, FastEthernet1/0/21
B       10.6.0.241/32 [20/0] via 10.6.241.0, 00:07:40
B       10.6.240.2/31 [20/0] via 10.6.240.0, 00:08:47
B       10.6.241.2/31 [20/0] via 10.6.241.0, 00:07:40
I have posted the base configs here.

Sunday, June 2, 2019

Switching Lab Topology Diagram

Here's the example topology used in the Spine-and-Leaf labs: (YAML)

Spine and Leaf Practical Applications, OSPF

As covered in the previous post, base configuration of a spine-and-leaf fabric is actually pretty simple. This will be pretty short, but we'll cover the conversion of the previously built fabric to OSPF.

Here's the updated diagram: (YAML). As we move to a more full-fledged implementation, we'll do dual-stack.


The cleanup for this is as follows:


no router rip

From here, we can configure the router statements on all devices. It can be the same for all, because of the summarization performed while planning out the network.


router ospf 1
 ispf
 log-adjacency-changes
 nsf cisco
 network 10.6.0.0 0.0.0.255 area 0
 network 10.6.240.0 0.0.1.255 area 0
In a production environment you should add passive-interface default on the leafs if the ToR does not peer dynamic routing with anything sub-tending it.

Unsurprisingly, this just works. Now, to setup IPv6!

ospf-s0#conf t
Enter configuration commands, one per line.  End with CNTL/Z.
ospf-s0(config)#ipv6?
% Unrecognized command
Well, it looks like IPv6 is not available until IOS 12.2.55. Let's use this network to upgrade it, by hooking up a TFTP server to leaf-1:

interface FastEthernet0/14
 no switchport
 ip address 10.66.0.1 255.255.255.0
!
router ospf 1
 network 10.66.0.1 0.0.0.0 area 0

We test reachability from the other leaf - this is a fully layer 3 switched path:

ospf-l0#ping 10.66.0.180

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.66.0.180, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/2/8 ms
And then we copy it over TFTP:

ospf-s1#copy tftp flash:
Address or name of remote host []? 10.66.0.180
Source filename []? c3560-ipservicesk9-mz.122-55.SE6.bin
Destination filename [c3560-ipservicesk9-mz.122-55.SE6.bin]?
Accessing tftp://10.66.0.180/c3560-ipservicesk9-mz.122-55.SE6.bin...
Loading c3560-ipservicesk9-mz.122-55.SE6.bin from 10.66.0.180 (via FastEthernet0/23): !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
[OK - 12752912 bytes]

12752912 bytes copied in 201.133 secs (63405 bytes/sec)
Note that this, while a practical application, is still non-redundant.

UPGRADING INTENSIFIES

Now to implement IPv6 as follows: (YAML)

Note: We used ; instead of : due to a feature issue with drawthe.net. We're using /126 prefixes because this is on older equipment, which may not support /127 prefixes reliably.
On all devices, we need to enable ipv6 routing / OSPFv3:


ipv6 unicast-routing
ipv6 router ospf 2
 log-adjacency-changes

We then configure each device:

ospf-l0# configure terminal
interface Loopback0
 ip address 10.6.0.0 255.255.255.255
 ipv6 address FD00:6::/128
 ipv6 ospf 2 area 0
interface FastEthernet1/0/21
 no switchport
 ip address 10.6.241.1 255.255.255.254
 ipv6 address FD00:6:241::2/126
 ipv6 enable
 ipv6 ospf 2 area 0
interface FastEthernet1/0/24
 no switchport
 ip address 10.6.240.1 255.255.255.254
 ipv6 address FD00:6:240::2/126
 ipv6 enable
 ipv6 ospf 2 area 0

ospf-l1# configure terminal
interface Loopback0
 ip address 10.6.0.1 255.255.255.255
 ipv6 address FD00:6::1/128
 ipv6 ospf 2 area 0
interface FastEthernet0/22
 no switchport
 ip address 10.6.240.3 255.255.255.254
 ipv6 address FD00:6:240::6/126
 ipv6 enable
 ipv6 ospf 2 area 0
interface FastEthernet0/23
 no switchport
 ip address 10.6.241.3 255.255.255.254
 ipv6 address FD00:6:241::6/126
 ipv6 enable
 ipv6 ospf 2 area 0

ospf-s0# configure terminal
interface Loopback0
 ip address 10.6.0.240 255.255.255.255
 ipv6 address FD00:6::240/128
 ipv6 ospf 2 area 0
interface FastEthernet0/22
 no switchport
 ip address 10.6.240.2 255.255.255.254
 ipv6 address FD00:6:241::1/126
 ipv6 enable
 ipv6 ospf 2 area 0
interface FastEthernet0/24
 no switchport
 ip address 10.6.240.0 255.255.255.254
 ipv6 address FD00:6:240::1/126
 ipv6 enable
 ipv6 ospf 2 area 0

ospf-s1# configure terminal
interface Loopback0
 ip address 10.6.0.241 255.255.255.255
 ipv6 address FD00:6::241/128
 ipv6 ospf 2 area 0
interface FastEthernet0/21
 no switchport
 ip address 10.6.241.0 255.255.255.254
 ipv6 address FD00:6:241::1/126
 ipv6 enable
 ipv6 ospf 2 area 0
interface FastEthernet0/23
 no switchport
 ip address 10.6.241.2 255.255.255.254
 ipv6 address FD00:6:241::5/126
 ipv6 enable
 ipv6 ospf 2 area 0

From here, we test by initiating traffic from a subtending network on Leaf-1 to Leaf-0, and checking the routing tables:

ospf-l1#ping ipv6 fd00:6::

Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to FD00:6::, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 0/1/8 ms
ospf-l1#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
       D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
       N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
       E1 - OSPF external type 1, E2 - OSPF external type 2
       i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
       ia - IS-IS inter area, * - candidate default, U - per-user static route
       o - ODR, P - periodic downloaded static route

Gateway of last resort is not set

     10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
O       10.6.0.0/32 [110/3] via 10.6.241.2, 00:03:58, FastEthernet0/23
                    [110/3] via 10.6.240.2, 00:03:58, FastEthernet0/22
C       10.6.0.1/32 is directly connected, Loopback0
O       10.6.240.0/31 [110/2] via 10.6.240.2, 00:03:58, FastEthernet0/22
O       10.6.0.240/32 [110/2] via 10.6.240.2, 00:03:58, FastEthernet0/22
O       10.6.241.0/31 [110/2] via 10.6.241.2, 00:03:58, FastEthernet0/23
O       10.6.0.241/32 [110/2] via 10.6.241.2, 00:03:58, FastEthernet0/23
C       10.6.240.2/31 is directly connected, FastEthernet0/22
C       10.6.241.2/31 is directly connected, FastEthernet0/23
ospf-l1#show ipv6 route
IPv6 Routing Table - Default - 11 entries
Codes: C - Connected, L - Local, S - Static, U - Per-user Static route
       B - BGP, R - RIP, D - EIGRP, EX - EIGRP external
       ND - Neighbor Discovery
       O - OSPF Intra, OI - OSPF Inter, OE1 - OSPF ext 1, OE2 - OSPF ext 2
       ON1 - OSPF NSSA ext 1, ON2 - OSPF NSSA ext 2
O   FD00:6::/128 [110/2]
     via FE80::216:C8FF:FE04:4741, FastEthernet0/22
     via FE80::223:4FF:FE42:F3C2, FastEthernet0/23
LC  FD00:6::1/128 [0/0]
     via Loopback0, receive
O   FD00:6::240/128 [110/1]
     via FE80::216:C8FF:FE04:4741, FastEthernet0/22
O   FD00:6::241/128 [110/1]
     via FE80::223:4FF:FE42:F3C2, FastEthernet0/23
O   FD00:6:240::/126 [110/2]
     via FE80::216:C8FF:FE04:4741, FastEthernet0/22
C   FD00:6:240::4/126 [0/0]
     via FastEthernet0/22, directly connected
L   FD00:6:240::6/128 [0/0]
     via FastEthernet0/22, receive
O   FD00:6:241::/126 [110/1]
     via FastEthernet0/22, directly connected
C   FD00:6:241::4/126 [0/0]
     via FastEthernet0/23, directly connected
L   FD00:6:241::6/128 [0/0]
     via FastEthernet0/23, receive
L   FF00::/8 [0/0]
     via Null0, receive
Note: technically we don't have to number the leaf-spine-leaf links in IPv6 with OSPFv3/RIP-ng/EIGRP, but that is a personal preference of mine to keep it consistent with future designs, and to allow for ease of troubleshooting.

As always, example configurations are here.

Saturday, June 1, 2019

Spine and Leaf Practical Applications, RIPv2

This is only slightly trolling, but is primarily to outline the topological simplicity of Spine-and-Leaf networking, in a way that is suspiciously similar to Cisco classes.

First things first, here's the diagram. This is performed using a set of four Cat3560s, enterprise licensed and wired in a redundant square topology to simulate a wide variety of topologies with minimal modification. At some point I'll post this setup as well, it was recommended in the book CCIE Routing and Switching v5.1, Bridging the Gap Between CCNP and CCIE
YAML Link

So this is actually pretty simple - as everything shouldbe Layer 3. We begin by configuring the Spines:


hostname rip-s0
interface Loopback0
 ip address 10.6.0.240 255.255.255.255
interface FastEthernet0/22
 no switchport
 ip address 10.6.240.2 255.255.255.254
interface FastEthernet0/24
 no switchport
 ip address 10.6.240.0 255.255.255.254

hostname rip-s1
interface Loopback0
 ip address 10.6.0.241 255.255.255.255
interface FastEthernet0/21
 no switchport
 ip address 10.6.241.0 255.255.255.254
interface FastEthernet0/23
 no switchport
 ip address 10.6.241.2 255.255.255.254

Some explanation here:
  • We're using /31s to save address space as leaf-spine-leaf links are numerous and chew through address space like no tomorrow. If you'd like to know more about /31 usage, it's here.
  • I focused on IP Address Management (IPAM) before the actual network design, assigning pre-planned prefixes. In this example, each switch has a virtual number, making it easy to pre-provision and organize network topologies for scale. Remember, this is all to handle frequent loop-free changes at scale - this is important!
    • S0: 240 (10.6.240.x/31, 10.6.0.240)
    • S1: 241 (10.6.241.x/31, 10.6.0.241)
    • L0: 0 (10.6.0.0)
    • L1: 1 (10.6.0.1)
  • No switchport forces ports into Layer 3 mode.
And then the Leafs:

hostname rip-l0
interface Loopback0
 ip address 10.6.0.0 255.255.255.255
interface FastEthernet1/0/21
 no switchport
 ip address 10.6.241.1 255.255.255.254
interface FastEthernet1/0/24
 no switchport
 ip address 10.6.240.1 255.255.255.254

hostname rip-l1
 interface Loopback0
 ip address 10.6.0.1 255.255.255.255
interface FastEthernet0/22
 no switchport
 ip address 10.6.240.3 255.255.255.254
interface FastEthernet0/23
 no switchport
 ip address 10.6.241.3 255.255.255.254

Normally, you'd add interconnection on these devices, but loopbacks suffice for this example.
This doesn't support routing but is a functional base configuration - so let's turn on routing (all switches):

ip routing
router rip
 version 2
 network 10.0.0.0
 no auto-summary
Poof! It's working!

rip-l0#show ip route
Codes: C - connected, S - static, R - RIP, M - mobile, B - BGP
   D - EIGRP, EX - EIGRP external, O - OSPF, IA - OSPF inter area
   N1 - OSPF NSSA external type 1, N2 - OSPF NSSA external type 2
   E1 - OSPF external type 1, E2 - OSPF external type 2, E - EGP
   i - IS-IS, su - IS-IS summary, L1 - IS-IS level-1, L2 - IS-IS level-2
   ia - IS-IS inter area, * - candidate default, U - per-user static route
   o - ODR, P - periodic downloaded static route
Gateway of last resort is not set
  10.0.0.0/8 is variably subnetted, 8 subnets, 2 masks
C   10.6.0.0/32 is directly connected, Loopback0
R   10.6.0.1/32 [120/2] via 10.6.241.0, 00:00:06, FastEthernet1/0/21
          [120/2] via 10.6.240.0, 00:00:06, FastEthernet1/0/24
C   10.6.240.0/31 is directly connected, FastEthernet1/0/24
R   10.6.0.240/32 [120/1] via 10.6.240.0, 00:00:12, FastEthernet1/0/24
C   10.6.241.0/31 is directly connected, FastEthernet1/0/21
R   10.6.0.241/32 [120/1] via 10.6.241.0, 00:00:17, FastEthernet1/0/21
R   10.6.240.2/31 [120/1] via 10.6.240.0, 00:00:13, FastEthernet1/0/24
R   10.6.241.2/31 [120/1] via 10.6.241.0, 00:00:17, FastEthernet1/0/21

Oddly enough, RIPv2 isn't supposed to support ECMP, but appears to be doing so here.
Hopefully, this was enlightening - because in this case, this topology is incredibly simple when involving an IGP. There are a few downsides to RIP deployed in this manner:
  • It's chatty and floods all the time, so all changes (network additions) will cause a reconvergence.
  • Link-state failure won't trigger a path re-route
  • It's RIP.
Configurations generated are here, for anyone who would want to experiment with them.

Using VM Templates and NSX-T for Repeatable Virtual Network Deployments

So far, we've provided the infrastructure for continuous delivery / continuous integration, but it's been for those other guys . Is ...