Data Center Fabric Architectures (Descriptions)
Data Center Fabric Architectures
The Data Center Fabric term is as meaningless as switching or cloud – every major networking vendor is announcing or selling data center fabric solution and no two vendors have something remotely similar in mind. Even worse, all fabric architectures announced so far are proprietary.
To choose the best solution for your data center, you must look beyond white papers and marketectures and figure out what’s really going on behind the scenes. The in-depth understanding of how different fabric architectures work will help you identify their benefits, drawbacks and potential pitfalls.
This article describes five classes of fabric architectures based on how they use management, control and data (forwarding) plane. Throughout the article we’ll use the generic term switch to describe a forwarding device that can forward either Ethernet frames (layer 2 switch) or IP datagrams (layer 3 switch).
Every forwarding device has three “planes”:
Management plane interacts with the network operator (usually via CLI or web interface), network management systems (usually via SNMP) and other external entities (sometimes using NETCONF or other XML- or REST-based API). Management plane software configures the device and keeps track of current and saved device configuration.
Control plane runs control protocols (LACP, LLDP, CDP, STP, ARP/ND, OSPF, BGP ...), exchanges network topology and reachability information with adjacent devices and uses the exchanged information (or information gleaned from the forwarded packets) to build forwarding tables, which are used to forward the frames/packets traversing the device.
Data (forwarding) plane uses the forwarding tables to forward layer-2 frames or layer-3 datagrams. The data plane might send packets it cannot process to the control plane (example: IP datagrams violating access lists) or extract information from forwarded packets and send that information to the control plane (dynamic MAC address learning).
Impact of Multi-Chassis Link Aggregation
Link Aggregation Group (LAG, defined in 802.3ad and 802.1AX) is a mechanism that allows Ethernet devices to use parallel links deployed for redundancy or load balancing purposes in an active-active configuration (without LAG, STP would disable some of the links). Redundant uplinks are commonly used for server attachments; multiple links between adjacent network devices are often used to increase the available bandwidth.
The 802.1AX standard defines link bundles between two adjacent devices. All networking vendors have implemented proprietary extensions to support multi-chassis link aggregation (MLAG).
Every fabric architecture should support MLAG. Architectures using a single control plane support MLAG by definition; all other architectures have to use proprietary mechanisms or standard protocols like ICCP or EVPN between devices terminating a MLAG bundle.
- Multi-chassis Link Aggregation Basics
- MLAG and hot potato switching
- MLAG and load balancing
- More link-aggregation related blog posts
Summary: Each device has its own independent management, control and data planes.
Each switch operates independently and remains a separate management and configuration entity. This approach has been used for decades in building the global Internet and thus has proven scalability. It also has well-known drawbacks (large number of managed devices) and usually requires thorough design to scale well.
A business-as-usual approach using Spanning Tree Protocol (STP) does not work well with large layer-2 domains commonly deployed in virtualized data centers. A data center fabric using independent devices has to replace STP with a more scalable alternative. Solutions using exclusively layer-2 technologies include standard technologies (TRILL and SPB – 802.1aq); Cisco’s FabricPath and Brocade’s VCS are proprietary alternatives. Recently vendors focused on MAC-over-IP transport using VXLAN encapsulation and proprietary or standard EVPN-based control plane.
As long as access-layer switches use STP to discover forwarding loops, the data center fabric must support multi-chassis link aggregation (MLAG) to optimize bandwidth utilization.
- Cisco’s Nexus switches (including 3000, 5500, 7x00 and 9000 series)
- Arista EOS
- Juniper switches when not configured as a Virtual Chassis or Virtual Chassis Fabric
- Brocade’s VCS fabric
- Dell switches
- Switches running Cumulus Linux
- STP integration in TRILL- or FabricPath-based fabrics
- STP integration with Brocade’s VCS fabric
- Brocade’s VCS uses TRILL data plane and proprietary control plane
- Load balancing in Brocade’s VCS fabric
Centralized control plane
Summary: These solutions use a single (usually redundant) management and control plane. Each device has independent data plane managed by the single centralized control plane.
In this architecture (sometimes known as stacking on steroids) numerous switches form a cluster collective and elect a single control plane (or outsource the control plane functions to an external device) that controls the whole cluster. The cluster of devices appears as a single control- and management-plane entity to the outside world. It’s managed as a single device, has a single configuration, single instance of STP and one set of routing adjacencies with the outside world.
Examples: stackable switches, Juniper’s virtual chassis, HP’s IRF, Cisco’s VSS, well-implemented OpenFlow controllers like Big Cloud Fabric
The switch cluster architectures cannot cope well with splits from the central brains. Cisco’s VSS reloads the primary switch when it detects a split brain scenario; HP’s IRF and Juniper’s virtual chassis disable the switches that lose cluster quorum.
While vendors like to talk about all-encompassing fabrics, the current implementations usually limit the number of high-end devices in the cluster to two or four (Cisco’s VSS and HP’s IRF).
Furthermore, most implementations of this architecture still limit the switch clusters to devices of the same type (exception: Juniper Virtual Chassis and Virtual Chassis Fabric can use different switch models in the same fabric).
As you cannot combine access- and core-layer switches into the same fabric, you still need MLAG between the access and the core layer.
At the moment, all centralize control plane implementations (including OpenFlow controller-based ones) are either proprietary or use non-standard protocol extensions implemented by a single vendor.
- Multi-chassis link aggregation with Borg architecture (also describes the split brain problems)
- Analysis of HP’s IRF
- Analysis of Juniper’s Virtual Chassis with XRE200 control engine
Also known as controller-based fabric, this architecture uses dumb(er) switches that perform packet forwarding based on instructions downloaded from the central controller(s). The instructions might be control-plane driven (L3 routing tables downloaded into the switches) or data-plane driven (5-tuples downloaded into the switches to enable per-flow forwarding).
The controller-based approach is ideal for protocol- and architecture prototyping (which is the primary use case for OpenFlow) and architectures with hub-and-spoke traffic flow (wireless controllers), but has yet to be seen to scale in large any-to-any networks.
Some implementations appear to be using one of the architectures from the outside but actually use a different architecture internally. For example, the stackable switches from Juniper use VCCP (an IS-IS-like protocol) internally to distribute MAC address reachability information.
A stack of Juniper switches is thus a combination of independent devices (linecard protocols like LACP are run locally) and centralized control- and management plane (routing protocols are run on one switch in the cluster).
|2011-03-11||Original text published in @ioshints blog|
|2011-05-24||Document migrated to www.ioshints.info|
Added Principles section and numerous links to in-depth articles
|2011-05-25||Juniper supports mixed-model virtual chassis with EX4200 / EX4500 switches|
- Yearly webinar subscription
- ExpertExpress consulting service
- Customized webinars
- Live webinar sessions
- Webinar recordings
Related blog posts