What hyperscale storage really means

Commodity-based and software-defined, hyperscale infrastructure picks up where hyperconvergence leaves off

Let’s be clear: Hyperscale isn’t about how large you are.

Organizations don’t have to be huge to leverage hyperscale solutions. But that’s exactly what many IT infrastructure, operations, and devops pros think when they first learn about hyperscale.

The prevailing belief is that hyperscale architecture is meant for extremely large infrastructures -- like those operated by LinkedIn, Amazon, or Netflix -- because it scales to thousands of instances and petabytes of data. As it turns out, it’s better to think of hyperscale as describing an approach rather than size. It’s about automation, orchestration, and building IT that intelligently scales as and when the business needs it. Hyperscale deployments can and should start small, then scale indefinitely. They should also allow you to independently scale only the portion of the infrastructure that needs it, which is counter to another emerging enterprise data center trend, hyperconvergence.

Confused yet? If so, you’re not alone. Let’s dive in a bit deeper.

Defining hyperscale

The concept of building a hyperscale architecture is muddied by many tangential terms. In particular, we see customers confused about hyperconverged, hyperscale (or Web-scale), converged, software-defined, and commodity-based infrastructure.

Let’s take a moment to clarify definitions on these ingredient terms:

  • Software-defined: Infrastructure where the functionality is completely decoupled from the underlying hardware and is both extensible and programmatic. Read this post for our elaboration on software-defined storage in particular.
  • Commodity-based: Infrastructure built atop commodity or industry-standard infrastructure, usually an x86 rack-mount or blade server. As we’ve written in the past, don’t conflate commodity with cheap.
  • Converged: A scale-out architecture where server, storage, network, and virtualization/containerization components are tied together as a pretested, pre-integrated solution. Components are still distinct in this architecture.
  • Hyperconverged: A scale-out architecture that takes converged infrastructure one step further by combining software-defined components atop commodity hardware, packaged as a single solution -- often a single appliance. Components are no longer distinct.
  • Hyperscale: A scale-out architecture that is also software-defined and commodity-based, but where the server, storage, network, and virtualization/containerization resources remain separate. Each component is distinct and can be independently scaled.

In summary, think of hyperconverged infrastructure as the modern, logical extreme of converged systems, whereas hyperscale is the modern, logical extreme of how we’ve been building data centers for 30 years. Both make sense for specific environments, as shown below.

hyperconverged vs hyperscale

Hyperscale and hyperconverged

At Hedvig, we strive to deliver a storage solution that can be flexibly tailored for any workload, from private clouds, including Docker and OpenStack, to big data deployments running Hadoop or NoSQL to more traditional server virtualization, disaster recovery, backup, and archiving. The Hedvig Distributed Storage Platform virtualizes and aggregates flash and spinning disk in a server cluster or cloud, presenting it as a single, elastic storage system that can be accessed by file, block, or object interfaces.

The Hedvig Distributed Storage Platform consists of three components:

  • Hedvig Storage Service: A patented distributed-systems engine that scales storage performance and capacity with off-the-shelf x86 and ARM servers. The Hedvig Storage Service can be run on-premises or on public clouds like AWS, Azure, and Google. It delivers all of the storage options and capabilities required for an enterprise deployment, including inline deduplication, inline compression, snapshots, clones, thin provisioning, autotiering, and caching.
  • Hedvig Storage Proxy: A lightweight VM or container that enables access to the Hedvig Storage Service via industry-standard protocols. Hedvig currently supports NFS for file and iSCSI for block, as well as OpenStack Cinder and Docker drivers. The Hedvig Storage Proxy also enables client-side caching and deduplication with local SSD and PCIe flash resources for fast local reads and efficient data transfers.
  • Hedvig APIs: REST and RPC-based APIs for both object storage and Hedvig operations. Hedvig currently supports Amazon S3 and Swift for object storage. Developers and IT operations admins can use the management APIs to enable access to all Hedvig storage features to automate provisioning and management with self-service portals, applications, and clouds.
hedvig architecture

Hedvig supports hyperconvergence by bundling the Hedvig Storage Proxy and the Hedvig Storage Service as virtual appliances running on a commodity server with a hypervisor or container OS. For hyperscale, the Hedvig Storage Service is deployed on bare-metal servers to form a dedicated storage tier while the Hedvig Storage Proxy is deployed as a VM or container on each server at the compute tier.

hedvig cluster

Why choose hyperscale for storage

Data is growing far faster than storage budgets. The economics are crippling for enterprises that do not have the resources of Internet goliaths like Amazon, Google, and Facebook. Thus, enterprises must embrace software-defined and commodity-based storage to reduce costs and maintain the flexibility and scalability needed to keep up with business requirements.

At Hedvig, we’ve noticed that about 80 percent of the time, customers choose a hyperscale architecture rather than hyperconverged, despite the fact we support both. What’s even more interesting is that many of our customers come to us thinking the exact opposite. About 80 percent initially request a hyperconverged solution, but after they do their homework, they opt for the hyperscale approach.

Why? In a nutshell, because they favor flexibility (or agility, if you must use that term) above all else when architecting their infrastructure. Consider the following:

  • A hyperconverged system offers a simplified “building block” approach to IT. For lean IT organizations looking to lower the overhead of deploying and expanding a cloudlike infrastructure, hyperconvergence provides a good solution. But it requires a relatively predictable set of workloads where “data locality” is a top priority, meaning that the application or VM must be located as close to the data as possible. This is why VDI has been a poster child for hyperconvergence. Users want their “virtual C: drive” local. But it’s not flexible, as it involves scaling all elements in lockstep.
  • A hyperscale system keeps storage independent of compute, enabling enterprise IT to scale capacity when the business requires. The hyperscale approach to data center and cloud infrastructure offers a high level of elasticity, helping organizations rapidly respond to changing application and data storage needs. It’s also an architecture that better matches modern workloads like Hadoop and NoSQL, as well as those architected with cloud platforms like OpenStack and Docker. All of these are examples of distributed systems that benefit from independently scaled shared storage.

What we’ve experienced with our customers is a gathering confirmation of what we’ve been noting for a while now: that hyperconverged is an answer and not the answer when exploring modern storage architectures. To be sure, the industry is seeing a big pendulum swing to hyperconverged because of its simplicity. But if your data is growing exponentially and your compute needs are not, then you have an impedance mismatch that is not well suited for hyperconvergence.

Hyperscale or hyperconverged?

Hyperconverged can be a simpler, more cost-effective approach. However, what our customers discover with Hedvig is that we support a feature that makes hyperscale appropriate for almost all workloads: client-side caching. Hedvig can take advantage of local SSD and PCIe devices in your compute tier to build a write-through cache. This significantly improves read performance and, more important, solves the data locality challenge. Storage is still decoupled and runs in its own dedicated, hyperscale tier, but applications, VMs, and containers can benefit from data cached locally at the compute tier. This also solves the problem of how to grow your caching tier, but that’s a topic for another article.

As an example of this benefit, one customer chose Hedvig’s hyperscale approach for VDI, a workload traditionally reserved for hyperconverged solutions as discussed above. In this instance, the customer had “power users” that required 16 vCPUs and 32GB of memory to be dedicated to each hosted desktop. As a result, the company was forced to deploy a large number of hyperconverged nodes to support the processing and memory requirements, while unnecessarily increasing storage capacity in lockstep.

With the Hedvig platform, the customer was able to create dedicated nodes to run the Citrix XenDesktop farm on beefy blade servers with adequate CPU and RAM. The data was kept on a separate hyperscale Hedvig cluster on rack-mount servers, with data cached back on the XenDesktop servers in local SSDs. The result? A dramatically less expensive solution (60 percent less). More significant, it also provided a more flexible environment where the company could ride Moore’s Law and buy the most powerful servers needed to upgrade their desktop performance without having to upgrade storage servers.

Based on our experience, there are some easy rules of thumb to determine which architecture is right for you.

  • Choose hyperscale when… your organization has 5,000 employees or more, more than 500 terabytes of data, more than 500 applications, or more than 1,000 VMs.
  • Choose hyperconverged when… you’re below these watermark numbers, have five or fewer staff managing your virtual infrastructure, or you’re in a remote or branch office.

The good news is that it doesn’t have to be an either/or decision. You can start in a hyperconverged environment, then switch to hyperscale, or you can mix and match the two. Our philosophy is that your applications dictate which one you should use. And as your application needs will change over time, so should your deployment.

In modern businesses, change and growth are mandatory. Increasingly, there’s no way to solve this conundrum without the hyperscale architecture that the Web giants pioneered. What’s changed is that any enterprise can now benefit from the hyperscale approach.

Rob Whiteley is the VP of marketing at Hedvig.

New Tech Forum provides a venue to explore and discuss emerging enterprise technology in unprecedented depth and breadth. The selection is subjective, based on our pick of the technologies we believe to be important and of greatest interest to InfoWorld readers. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Send all inquiries to newtechforum@infoworld.com.

Copyright © 2016 IDG Communications, Inc.