Data storage in Azure: Everything you need to know

Microsoft Azure has many data storage options, so how do you choose what to use? This guide explains the options

Data storage in Azure: Everything you need to know
Stephen Sauer/IDG

Choosing the right cloud storage option is never as straightforward as you think it might be. You end up having to juggle prices, for both saving and reading your data, for bandwidth, and even for the class of server that’s hosting your bits. And there are different storage technology options as well.

The first question you need to ask is “What kind of data am I trying to store?” Cloud services have the opportunity to step beyond the tiering model we often use in on-premises infrastructures, using storage models that are more suited to cloud applications and their particular needs. They may look like disks to the outside world, but you’re going to be working against specialized code that won’t offer the same features as a general-purpose disk file system.

But don’t fear that specialized focus. Modern disk file systems are complex tools, designed to handle anything you might do with a PC or a server. By focusing on a specific task, cloud file systems can tune performance and reliability features, building on underlying hardware and on newer, reliable file systems that are only now starting to roll out in the wider, on-premises world.

Understanding Azure’s blob stores

Microsoft tried to deliver an object file system for Windows—and failed. There’s too much overhead in building and managing an index for all the many different types of files stored on a PC.

On Azure, things are different. Instead of having to manage data at an operating-system level, Azure’s object file system leaves everything up to your code. After all, you’re storing and managing the data that’s needed by only one app, so the management task is much simpler.

That’s where Azure’s blob storage comes in. Blobs are binary large objects, any unstructured data you want to store. With a RESTful interface, Azure’s blob store hides much of the underlying complexity of handling files, and the Azure platform ensures that the same object is available across multiple storage replicas, using strong consistency to ensure that all versions of a write are correct before objects can be read. Data can be tiered, depending on how often you expect it to be read, with hot, cold, and preview-of-archive options available.

Using blob storage makes sense when you’re writing code that needs to access media or other content; a blob can easily be an image from a catalog, or a document in an enterprise content management system. All you need to do is build an index and then populate the store. It’ll be replicated to all the regions you’re using to host your app.

Understanding files in Azure

Not every piece of code is born in the cloud, and not every project needs rewriting for Azure. If you’re hosting Windows Server and Linux apps, either in containers or in IaaS virtual servers, you likely need some level of file-based storage that works with familiar protocols.

That’s where the SMB 3.0-based Azure File storage comes in to play, using a protocol that’s familiar to both Windows and Linux platforms. It’s also got a REST API, so you can write code against it, so new and old apps can share the same storage and even use VPNs to connect to an Azure storage instance from on-premises applications.

Using blobs and files together in Azure

It’s important to note that an Azure storage account can contain instances of both blob and file storage, and it’s possible to programmatically copy data from one store to another.

That gives you the opportunity to use cloud-hosted storage as a boundary between legacy on-premises applications and cloud code, uploading files and data to file-based cloud storage before automatically copying it across to a blob store running on the same account, using Azure’s REST APIs and storage SDKs from your code.

Using physical disks in Azure

Of course, you’re not limited to managed storage options on Azure. If you’re writing high-performance code, you may well need access to actual physical disks. With Azure Disk storage, you can quickly provision either SSD or hard-drive storage. SSD gives you low latency and high throughput, ideal for use with high-performance Azure VMs. That way, you can lift and shift hefty on-premises apps running on SQL Server or Dynamics CRM to Azure, without affecting performance. Alternatively, you can use slower hard drives to host data that’s needed for a test environment, keeping data and test machines separate, and connecting new test machines to previously provisioned storage—that approach speeds getting data into test environments.

Disk storage on Azure is like disk storage anywhere: It’s fixed and doesn’t scale with your application. If you need more storage, you’ll have to provision it and add mounts to your code to use it. If you’re planning on automatically scaling applications, you need to remember that disk storage can be a bottleneck, especially if you’re accessing it from many containers or VMs simultaneously.

Getting specialized in the cloud

Although the basic storage options in Azure are fine for most purposes, Azure also offers specialized storage services. One supports massive amounts of unstructured data, ideal for hosting data lakes for large-scale analysis. There’s also queue storage for handling high-volume message queues, managing asynchronous interapplication communications (IAC) by handling large message queues that can be arbitrarily deep to handle spikes in data traffic.

Some of Azure’s storage services are designed to extend and protect on-premises data. StorSimple storage appliances look like storage servers, with physical hard-disk and SSD storage arrays. But those arrays are best thought of as a cache for data that’s being transferred to and from cloud storage. What would have been a rack full of disks is now a few rack units (Us) of space, with data replicated in multiple Azure datacenters. Similarly, Azure offers both a backup service for on-premises desktops and servers, and a larger-scale disaster-recovery option that not only backs up servers but also can run them in the event of an outage.

Good storage is, to be honest, the barest minimum of table stakes in the modern public cloud marketplace. But what’s interesting about Azure is the breadth of its offering, supporting both cloud-native applications and on-premises code that’s moving out of existing datacenters, as well as enabling hybrid scenarios.

Recent changes to pricing models are making cloud storage more and more attractive, with high volumes at low cost. Now you know what factors to consider in choosing the one that’s right for you and your code.

Copyright © 2017 IDG Communications, Inc.