How and why to use blob storage in Azure

Work with unstructured and specialized binary data without the overhead of a file system

When it comes to building applications on the public cloud, you’re spoiled for storage. You’ve got SQL, NoSQL, graph databases, document databases, and even good old-fashioned file systems. Choosing the right storage is usually a simple decision, because familiar databases and file systems are only a click in a console away.

Much of how you think of about storage is dictated by the on-premises world, where you needed to consider the underlying architecture and the servers you use for applications. There, licensing economics often dictated using existing SQL storage, using the same database to hold everything from product data to images, and even videos.

Introducing cloud blobs

But with the advent of the public cloud, and services like Azure, aggregated storage stopped being necessary. Instead of one store and a file system for all your code, you have the option of using specific optimized stores that handled one type of content—and handle it well. Part of that transition was led by the NoSQL stores, with quick key-value lookups and no need for complex relational structures. With cloud services, economics drive technology choices for both users and providers, and what had been monolithic structures have been broken up and delivered as individual services.

Way back when, I was the architect of an early photo-sharing site. We built it around a large-scale distributed file system, storing user photos in a hierarchical store that mixed spinning disk and fast access tape. That model wouldn’t work today; the scale of modern systems would quickly overwhelm the file structure. Today, by breaking storage into task-specific modules, you can build systems that can quickly scale, because they’re agnostic to the underlying hardware.

That’s where blobs—binary large object storage—come in. Originally a technique for storing binary content in relational databases, standalone blob storage was part of the original release of Microsoft’s Azure Data Services. Intended to support early cloud-native applications, Azure’s blob support was designed to host application content for both mobile and desktop. That initial release has grown significantly, adding support for tiered storage and for different blob types.

Much of what developers do with storage is focused on working with unstructured binary data. Azure’s blob storage is one way of handling this without the overhead of a file system. It’s a quick and easy way of working with binary data in your apps, and as an added bonus, it’s also integrated with Azure’s Data Lake analytic tools, letting you get insights on what your users or devices are storing and how it’s being used.

Using blobs in Azure

Like all Azure services, blob storage needs to be part of a resource group and associated with an account for billing. You start with a general-purpose Azure storage account, which hosts all Azure’s core storage services. Once you’ve created an account from the Azure Portal or the Azure command-line interface, you’re ready to provision blob storage.

Azure uses a hierarchical model to manage blobs. You first need a container to host your blobs. Again, you use either the Azure Portal or the command-line interface to create your container, following the standard Azure naming rules. Storage accounts can host multiple containers, so you can create separate containers for separate types of content, or for handling content from specific users. There are no limits to how many containers you can have in an account, or how many blobs can be stored in a container.

Once your first container is in place, it’s ready to receive data. You can upload an initial blob through the Azure Portal, and you can see it and download it using the same tools.

Three types of blobs are supported: block, append, and page blobs.

  • Most of your applications will likely use block blobs, which handle text and other binary data.
  • Append blobs are a specialized form of block blob that can be extended by writing more data to a blob.
  • Page blobs are massive random-access files, which can be up to 8TB in size. They’re what Azure uses to host its own VHD files for virtual servers, and you can use them to build your own cloud file systems (and even use them as addressable storage by your on-premises applications).

Uploading binary data to Azure

Getting data into blob storage can be complex if you’re not using it for application-generated data. Although there are tools for uploading blob content in the Azure Portal, uploading files one at a time is at best tedious. Instead you should use tools like the AzCopy command-line tool for Linux and Windows. It can copy data to blobs from on-premises, across blob containers in an Azure storage account, and even between different storage accounts. If you’re using Linux as your development platform, Blobfuse mounts Azure blob storage as a remote file system, letting you copy data over a VPN.

If you’ve got a lot of data to load, network bandwidth can be a limiting factor. Instead of using VPNs, you can use Azure’s Data Box service to copy data onto a set of managed disks. Once shipped back to Microsoft, Data Box contents are loaded into your storage as blobs, ready for use. The similar Azure Import/Export service uses your own hard drives for data transfer.

Microsoft gives you plenty of tools for writing code that uses Azure Blob Storage. There are SDKs for most popular languages and frameworks, with REST APIs for general access. Like all Azure client code, you start by authenticating with Azure. Once connected, you can either create a new container or add blob content to an existing container. The APIs in the SDKs are designed to be used asynchronously to handle any latencies from mobile devices.

There are limits to the SDK. For example, you can only retrieve up to 5000 blob IDs at a time, so large collections need some paging code to handle the retrieval of all the data.

Administrators wanting to monitor usage can use the Azure Storage Explorer app. Available for Windows, MacOS, and Linux, it’s a quick way of understanding how your Azure Storage accounts are being used and for seeing the data that’s been uploaded.

Working with hierarchical blob storage tiers

More complex blob-based applications can take advantage of Azure features like redundancy and tiering. You can provision redundant storage across multiple Azure regions, with strong consistency. There is an additional replication cost that needs to be taken into account, but it’s of the order of a few cents per gigabyte. Tiered storage options let you choose between access speeds and storage costs, with high-speed premium blob storage at one end of the scale and archive storage at the other.

Architecting your apps around tiering can make them more cost-effective. A photo service might use a hot storage layer for recent image thumbnails, with older thumbnails in cool storage. High-resolution images can be stored in an archive tier, so that they’re retrieved only when specifically requested. As a user scrolls through their images, your code can prefetch thumbnails to avoid latency issues, with background processes handling aging thumbnails from one tier to the next.

Copyright © 2019 IDG Communications, Inc.