Optimize Data Center Infrastructure/Use Distributed File System

Building the Next-Generation Data Center
6 week advanced interactive online course
Next course starting in early 2018

Use Distributed File System

ipSpace.net » Documents » Optimize Your Data Center Infrastructure » Use Distributed File System

The previous data center infrastructure optimization steps (from massive virtualization to two 10GE uplinks per server) are well accepted in the industry. The use of distributed file system (VMware VSAN, Nutanix, Ceph or GlusterFS) is still hotly debated.

The underlying idea is very simple. Instead of using traditional storage arrays, organize the local storage in hypervisor hosts (hard disks or SSDs) in a distributed global file system or object store with replication between hypervisor nodes running across the data center IP network.

The benefits of this approach are obvious:

  • Reduction in the number of different hardware components;
  • Improved resilience – highly-available centralized components (storage arrays) are replaced with distributed network of lower-cost devices;
  • Reduced failure domain – failure of a single node in a distributed file system has less impact than the failure of a storage array;
  • Increased overall performance… assuming hypervisors access primarily their local file system.
  • Linearly scalable performance – the more nodes you have in a distributed file system, the better the overall performance is.

Some drawbacks of distributed file systems are also obvious, others less so:

  • Data replication between storage nodes requires higher-performance data center network;
  • 3-way replication used by many distributed files has a 200% overhead, as compared to 20-25% overhead of RAID-6 (assuming you use a single storage array);
  • Network infrastructure becomes crucial – a network failure quickly results in a total storage meltdown;
  • Distributed file systems are not as mature as storage array, and thus considered less reliably by conservative storage administrators.

A few years back distributed file systems had no chance of being considered in most enterprise environments even though Hyper-V and Linux had distributed file system support for years.

Today, some very large deployments successfully use distributed file systems. For example, public clouds built using OpenStack often use Ceph or GlusterFS, or even Swift (OpenStack’s object store) to store VM images.

vSphere environments were the last bastion of traditional storage. The first distributed file system offered on vSphere was built by Nutanix, followed by VMware VSAN a few years later (VSAN is available from late 2013).

Obviously, it will be hard to persuade anyone to store an Oracle database on a distributed file system, but you might find a distributed file system good enough for VM virtual disks, resulting in lower storage array requirements.