Data Center Fabric Architectures (Descriptions)
Data Center Fabric Architectures
The Data Center Fabric term is as meaningless as switching or cloud – every major networking vendor is announcing or selling data center fabric solution and no two vendors have something remotely similar in mind. Even worse, all fabric architectures announced so far are proprietary.
To choose the best solution for your data center, you must look beyond white papers and marketectures and figure out what’s really going on behind the scenes. The in-depth understanding of how different fabric architectures work will help you identify their benefits, drawbacks and potential pitfalls.
This article describes five classes of fabric architectures based on how they use management, control and data (forwarding) plane. Throughout the article we’ll use the generic term switch to describe a forwarding device that can forward either Ethernet frames (layer 2 switch) or IP datagrams (layer 3 switch).
Every forwarding device has three “planes”:
Management plane interacts with the network operator (usually via CLI or web interface), network management systems (usually via SNMP) and other external entities (sometimes using NETCONF or other XML- or REST-based API). Management plane software configures the device and keeps track of current and saved device configuration.
Control plane runs control protocols (LACP, LLDP, CDP, STP, ARP/ND, OSPF, BGP ...), exchanges network topology and reachability information with adjacent devices and uses the exchanged information (or information gleaned from the forwarded packets) to build forwarding tables, which are used to forward the frames/packets traversing the device.
Data (forwarding) plane uses the forwarding tables to forward layer-2 frames or layer-3 datagrams. The data plane might send packets it cannot process to the control plane (example: IP datagrams violating access lists) or extract information from forwarded packets and send that information to the control plane (dynamic MAC address learning).
Impact of Multi-Chassis Link Aggregation
Link Aggregation Group (LAG, defined in 802.3ad and 802.1AX) is a mechanism that allows Ethernet devices to use parallel links deployed for redundancy or load balancing purposes in an active-active configuration (without LAG, STP would disable some of the links). Redundant uplinks are commonly used for server attachments; multiple links between adjacent network devices are often used to increase the available bandwidth.
The 802.1AX standard defines link bundles between two adjacent devices. All networking vendors have implemented proprietary extensions to support multi-chassis link aggregation (MLAG).
Every fabric architecture should support MLAG. Architectures using a single control plane support MLAG by definition; all other architectures have to use proprietary mechanisms between devices terminating a MLAG bundle.
- Multi-chassis Link Aggregation Basics
- MLAG and hot potato switching
- MLAG and load balancing
- More link-aggregation related blog posts
Business as Usual
Summary: Each device has its own independent management, control and data planes.
Each switch operates independently and remains a separate management and configuration entity. This approach has been used for decades in building the global Internet and thus has proven scalability. It also has well-known drawbacks (large number of managed devices) and usually requires thorough design to scale well.
A business-as-usual approach using Spanning Tree Protocol (STP) does not work well with large layer-2 domains commonly deployed in virtualized data centers. A data center fabric using Business as Usual architecture has to replace STP with a more scalable alternative. TRILL and SPB (802.1aq) are the standard candidates; Cisco’s FabricPath and Brocade’s VCS are proprietary alternatives.
As long as access-layer switches are not TRILL/SPB-enabled, the data center fabric must support multi-chassis link aggregation (MLAG) to optimize bandwidth utilization.
- Cisco’s Nexus 5000 and Nexus 7000 switches
- Brocade’s VCS fabric
- Force 10 Z-series switches will support TRILL with a software upgrade
- STP integration in TRILL- or FabricPath-based fabrics
- STP integration with Brocade’s VCS fabric
- Brocade’s VCS uses TRILL data plane and proprietary control plane
- Load balancing in Brocade’s VCS fabric
- Force 10 announced ZettaScale switches
Summary: Devices in a Borg fabric have a single (usually redundant) management and control plane. Each device has independent data plane managed by the single centralized control plane.
In the Borg architecture (sometimes known as stacking on steroids) numerous switches form a cluster collective and elect a single control plane (or outsource the control plane functions to an external device) that controls the whole cluster. The cluster of devices appears as a single control- and management-plane entity to the outside world. It’s managed as a single device, has a single configuration, single instance of STP and one set of routing adjacencies with the outside world.
Examples: stackable switches, Juniper’s virtual chassis, HP’s IRF, Cisco’s VSS
Like the original Borg, the switch cluster architectures cannot cope well with splits from the central brains. Cisco’s VSS reloads the primary switch when it detects a split brain scenario; HP’s IRF and Juniper’s virtual chassis disable the switches that lose cluster quorum.
While vendors like to talk about all-encompassing fabrics, the current implementations usually limit the number of high-end devices in the cluster to two (Cisco’s VSS, Juniper’s EX8200+XRE200 and HP’s IRF), reducing the Borg architecture to a Siamese twin one.
Furthermore, most implementations of the Borg architecture still limit the switch clusters to devices of the same type (exception: you can build mixed-model virtual chassis with EX4200 and EX4500 switches from Juniper).
As you cannot combine access- and core-layer switches into the same fabric, you still need MLAG between the access and the core layer.
At the moment, all Borg-like implementations are proprietary.
- Multi-chassis link aggregation with Borg architecture (also describes the split brain problems)
- Analysis of HP’s IRF
- Analysis of Juniper’s Virtual Chassis with XRE200 control engine
The Big Brother
Also known as controller-based fabric, this architecture uses dumb(er) switches that perform packet forwarding based on instructions downloaded from the central controller(s). The instructions might be control-plane driven (L3 routing tables downloaded into the switches) or data-plane driven (5-tuples downloaded into the switches to enable per-flow forwarding).
The controller-based approach is ideal for protocol- and architecture prototyping (which is the primary use case for OpenFlow) and architectures with hub-and-spoke traffic flow (wireless controllers), but has yet to be seen to scale in large any-to-any networks.
Some implementations appear to be using one of the architectures from the outside but actually use a different architecture internally. For example, the stackable switches from Juniper use VCCP (an IS-IS-like protocol) internally to distribute MAC address reachability information.
A stack of Juniper switches is thus a Quilt (each device has its own control plane; the whole stack has a single management plane), but appears as a Borg from the outside (single STP and LACP instance).
|2011-03-11||Original text published in @ioshints blog|
|2011-05-24||Document migrated to www.ioshints.info|
Added Principles section and numerous links to in-depth articles
|2011-05-25||Juniper supports mixed-model virtual chassis with EX4200 / EX4500 switches|
- Yearly webinar subscription
- ExpertExpress consulting service
- Customized webinars
- Live webinar sessions
- Webinar recordings
Related blog posts