Embrane heleos: scale-out distributed virtual appliance

Embrane’s heleos is scale-out distributed virtual appliance architecture running on commonly used hypervisor platforms. Cloud builders can use Embrane’s load balancers and firewall/VPN appliances to implement flexible and scalable multi-tenant L4-7 services.


Virtual appliances: The Problem

Let’s recap the virtues of physical appliances:

  • They can use dedicated hardware to perform simple high-bandwidth tasks (encryption, IP/TCP header rewrites, packet filters);
  • They could run data path routines in the kernel, minimizing per-packet overhead;
  • They can run optimized operating systems to further reduce per-packet overhead.

Virtual appliances are usually burdened with virtualization tax (they work with virtualized NIC adapters that are emulated within the hypervisor kernel) and high per-packet overhead. Every packet processed by a VM-based appliance has to travel from hypervisor kernel to user space (VM), where it’s processed and sent back through the hypervisor kernel. VM-based appliances could also be limited by the hypervisor resource management rules.

Typical VM appliance architecture
Looking for more information? Read my Virtual Network Appliances: benefits and drawbacks article.

On the other hand, it’s easy to deploy the virtual appliances whenever and wherever you need them, and they can sit on top of whatever virtual networking technology you choose (including VXLAN or MAC-over-GRE); physical appliances most often support only VLANs.

Virtual appliances: Performance issues

Virtual appliances are always slower than their physical counterparts, be it due to lack of dedicated packet-processing hardware or due to virtualization overhead. An interesting data point is the relative performance of VMware’s vShield App (Linux-based VM using iptables) and Juniper’s VGW. vShield App which processes all packets in a VM tops out at around 4 Gbps, while VGW (running in vSphere kernel with significantly lower overhead) gives you tens of gigabits.

Now imagine your virtual appliance performs CPU-intensive processing (VPN termination, L7 load balancing, WAF); you won’t get anywhere close to the 3-4 Gbps you could expect to get from a VM-based appliance because the appliance is CPU-bound.

You could assign more virtual CPUs to the appliance; a solution that might work well in a private data center, but not in cloud environments where you’re charged by the resources you are allowed to consume. The only solution that works well in cloud environments is a scale-out appliance – an architecture where you can add data path processing modules on demand.

The Embrane architecture

Embrane has created interesting scale-out distributed virtual appliance architecture. They took the architecture of a typical appliance (input NIC and dispatcher, data path processors, output NIC) and separated the three components into individual virtual machines:

Embrane's DVA architecture

Their solution consists of Elastic Services Manager (ESM) and Distributed Virtual Appliances (DVA). ESM manages the interaction with the underlying IaaS cloud infrastructure and launches new distributed virtual appliances as requested by the users or cloud orchestration platform (all communication with ESM is through REST API).

Within a virtual appliance, Data Planes Manager (DPM) provides the management and control plane of the DVA; Data Planes Dispatchers (DPD) act like external interfaces and link the DVA to the outside world, and the Data Planes (DP) do the actual work.

All DVA elements are virtual machines provisioned by ESM from a common resource pool on as-needed basis.

Starting and configuring a DVA

When you want to deploy a new DVA, be it a firewall, load balancer or VPN termination point, ESM launches a Data Planes Manager (DPM) and provisions all other VMs in the DVA (DPDs and DPs). DPM has a virtual disk to store configuration and logging information; DPDs and DPs are diskless and use static PXE boot (without DHCP) to download their images from the DPM.

Even though there are numerous VMs within a DVA, it’s deployed as a single image (DP and DPD images are within the DPM image), simplifying the deployment process.

REST API is used to configure the distributed virtual appliance through DPM. The inside complexity of the DVA is hidden; the object model used by the DPM’s REST API exposes objects relevant to DVA’s functionality: interfaces, routes, ACLs, virtual servers. After receiving configuration requests, DPM distributes the configuration changes to DPs and DPDs.

You don’t have to change DVA configuration when you increase or decrease the number of DPs within the DVA to cope with dynamic changes in its load; DPM automatically reacts to the internal topology changes and reconfigures the DVA elements.

There is a clean separation between provisioning and configuration processes: ESM manages all VM provisioning (including dynamic provisioning of new DP VMs); DPM configures all DVA elements.

DVA external and internal connectivity

DVA uses IP to transport control- and data-path information between its elements (DPM, DPD, DP). You don’t have to provision additional virtual networks to support the internal communication between DVA elements; you should, however, isolate the network (or VLAN) over which the internal DVA communication is running from the rest of the data center and particularly from the users.

All DVA components run as virtual machines and use standard virtual NICs; you can thus use any transport infrastructure (from VLANs to VXLAN) to connect them to the outside world or to interconnect the DVA components.

Each DPD within the DVA architecture represents an external interface (and an external IP address), making it possible to create appliances with more than two interfaces, or to scale the external connectivity by adding new interfaces and IP addresses.

Last but definitely not least, Embrane claims that they have a stateless solution, where the dispatcher elements send packets to data plane elements without keeping per-flow state. The scalability of the architecture is thus not limited by the flow tables in the dispatchers.

Use cases

There are two perfect use cases for the Embrane’s architecture (I’m positive you’ll identify a few more):

Large-scale CPU-intensive appliances. You can deploy DVAs in environments that require massive throughput that’s hard to achieve with traditional single-VM virtual appliances. As the load increases, you simply keep adding DPs and the DPM reconfigures DVA on the fly.

Even when the load exceeds the throughput of a single input or output DPD (~ 3Gbps at the moment), you could use a combination of multiple DPDs with external load balancing solution (for example, DNS-based load balancing).

Flexibly multi-tenant/cloud deployments. Cloud operators usually use virtual appliances to provide easy-to-deploy L4-7 services... and face an interesting roadblock once the required throughput increases beyond the capabilities of a single VM, when they can usually use one of the following two solutions to increase a tenant appliance's throughput:

  • Add multiple parallel virtual appliances to cope with increased throughput requirements, and deal with multiple configuration/management points and and interesting load balancing/routing issues;
  • Migrate the user from a VM-based solution to a physical appliance, potentially disrupting the traffic and changing the IP addresses while doing so.

Using Embrane’s DVA, the cloud operators can offer their customers a VM-based appliance that easily scales up to the limits of a single DPD, and goes beyond that limit with a DNS-based load balancing solution.

Unless your goal is maximum flexibility, Embrane’s solution won’t reach its maximum potential if you use it to implement simple firewalls or hash-based load balancers; the true benefits of their architecture become evident when the DP VMs perform CPU-intensive data processing that limits the throughput of a single-vCPU VM to a few hundred Mbps or less.

Benefits and drawbacks

The benefits of Embrane’s solution are obvious:

  • VM form factor that allows you to use DVA in any virtualized environment;
  • Truly distributed scale-out architecture;
  • Dynamic on-demand DVA provisioning with REST API;
  • DVA scaling without configuration changes;
  • Clean separation of responsibilities between configuration (DPM) and provisioning (ESM) that allows you to implement a variety of workflows that suit your processes or service definitions.
For more information, read the Embrane’s white paper.

Even the potential drawbacks that a rifleman like me could quickly identify looking at the DVA architecture are not as bad as they seem.

Data Plane Dispatcher elements are single points of failure. ESM could use hypervisor-based high availability solutions (VMware HA, for example), or restart DVA components once DPM discovers they (or the physical server on which they’re running) have failed. In both cases, the outage would last a few seconds (the time it takes to detect a failed VM and reboot it from the DPM).

The Embrane’s DVA architecture also makes it possible to implements an even faster-converging solution using multiple parallel dispatchers and traditional FHRP protocols (HSRP, GLBP or VRRP) on the outside interfaces.

Data Plane Dispatchers are choke points. According to discussions we had with Embrane during the Networking Tech Field Day event, DPD tops out at approximately 3 Gbps. They could go a bit beyond that by optimizing the VM packet processing; reaching way higher per-DPD throughput would probably require bare-metal implementation or a custom-built hypervisor (like F5’s vCMP).

However, you can always scale a DVA by deploying multiple input or output DPDs. The solution gets a bit more complex (you have to combine intra-DVA load balancing with DNS-based external load balancing), but it should be possible to pre-provision it (using multiple IP addresses on a single DPD) for customer expecting huge surges.

The initial heleos release supports 8 DPs and 4 DPDs per appliance

The architecture is VM-intensive. You need at least three VMs for a single DVA – DPM, at least one DP, and at least one DPD. However, the total footprint of all three VMs is not huge – a combined 4 GB or RAM, 1 GB disk space, 1.5 GHz of one CPU core – and the flexibility and scalability benefits in most cases more than outweigh the somewhat larger initial footprint.

The summary

There’s a definite need for scalable virtual appliances in public and private cloud environments, and Embrane’s architecture is one of the best proposed solutions I’ve seen so far. You should definitely consider them as part of your IaaS cloud solution – talk to them and ask for an evaluation.

Disclosure: Embrane indirectly covered some of my travel expenses during the Networking Tech Field Day, but nobody has ever asked me to write about their products or solutions. Read the full disclosure.