Scale-Out Private Cloud Infrastructure

ipSpace.net » Case Studies » Scale-Out Private Cloud Infrastructure

ACME Inc. is building a large fully redundant private infrastructure-as-a-service (IaaS) cloud using standardized single-rack building blocks. They plan to use several geographically dispersed data centers with each data center having one or more standard infrastructure racks.

The document describes a summary of design challenges sent by readers of ipSpace.net blog and discussed in numerous ExpertExpress engagements. It’s based on real-life queries and network designs but does not represent an actual customer network. Complete document is available as downloadable PDF to ipSpace.net subscribers.

Start now

Design Guidelines

A standard infrastructure rack in each data center will have:

  • Two ToR switches providing intra-rack connectivity and access to the corporate backbone;
  • Dozens of high-end servers, each server capable of running between 50 and 100 virtual machines;
  • Storage elements, either a storage array, server-based storage nodes, or distributed storage (example: VMware VSAN, Nutanix, Ceph…).

Figure 1: Standard cloud infrastructure rack

Racks in smaller data centers (example: colocation) connect straight to the WAN backbone, racks in data centers co-resident with significant user community connect to WAN edge routers, and racks in larger scale-out data centers connect to WAN edge routers or internal data center backbone.

Figure 2: Planned WAN connectivity

The cloud infrastructure design should:

  • Guarantee full redundancy;
  • Minimize failure domain size - a failure domain should not span more than a single infrastructure rack, making each rack an independent availability zone.
  • Enable unlimited workload mobility.
This case study focuses on failure domain analysis and workload mobility challenges. Typical rack design is described in the Redundant Server-to-Network Connectivity case study, WAN connectivity aspects in Redundant Data Center Internet Connectivity one, and security aspects in High-Speed Multi-Tenant Isolation.

Cloud Infrastructure Failure Domains

A typical cloud infrastructure has numerous components, including:

  • Compute and storage elements;
  • Physical and virtual network connectivity within the cloud infrastructure;
  • Network connectivity with the outside world;
  • Virtualization management system;
  • Cloud orchestration system;
  • Common network services (DHCP, DNS);
  • Application-level services (example: authentication service, database service, backup service) and associated management and orchestration systems.

Figure 3: Cloud infrastructure components

Cloud infrastructure environments based on enterprise server virtualization products commonly use separate virtualization management systems (example: VMware vCenter, Microsoft System Center Virtual Machine Manager) and cloud orchestration systems (example: VMware vCloud Automation Center, Microsoft System Center Orchestrator). Single-purpose IaaS solutions (example: OpenStack, CloudStack on Xen/KVM) include the functionality typically provided by a virtualization management system in a cloud orchestration platform.

ACME Inc. wants each infrastructure rack to be an independent failure domain. Each infrastructure rack must therefore have totally independent infrastructure and should not rely on critical services, management or orchestration systems running in other racks.

There’s More

The rest of this case study discusses these topics:

  • Impact of shared management or orchestration systems
  • Impact of long-distance virtual subnets and layer-2 connectivity requirements
  • Workload mobility considerations, including hot and cold VM mobility
  • Impact of static IP addressing
  • Disaster recovery and workload migration scenarios

Get the complete document

Complete case study, including design and deployment guidelines and sample configuration snippets is available to ipSpace.net subscribers. Select the Case studies tab after logging into the webinar management system.

Start now