6 hidden bottlenecks in cloud data migration

From navigating cloud storage options to verifying the data post-transfer, follow these steps to avoid the hazards of cloud migration

Moving terabytes or even petabytes of data to the cloud is a daunting task. But it is important to look beyond the number of bytes. You probably know that your applications are going to behave differently when accessed in the cloud, that cost structures will be different (hopefully better), and that it will take time to move all that data.

Because my company, Data Expedition, is in the business of high-performance data transfer, customers come to us when they expect network speed to be a problem. But in the process of helping companies overcome that problem, we have seen many other factors that threaten to derail cloud migrations if left overlooked.

Collecting, organizing, formatting, and validating your data can present much bigger challenges than moving it. Here are some common factors to consider in the planning stages of a cloud migration, so you can avoid time-consuming and expensive problems later.

Cloud migration bottleneck #1: Data storage

The most common mistake we see in cloud migrations is pushing data into cloud storage without considering how that data will be used. The typical thought process is, “I want to put my documents and databases in the cloud and object storage is cheap, so I’ll put my document and database files there.” But files, objects, and databases behave very differently. Putting your bytes into the wrong one can cripple your cloud plans.

Files are organized by a hierarchy of paths, a directory tree. Each file can be quickly accessed, with minimal latency (time to first byte) and high speed (bits per second once the data begins flowing). Individual files can be easily moved, renamed, and changed down to the byte level. You can have many small files, a small number of large files, or any mix of sizes and data types. Traditional applications can access files in the cloud just like they would on premises, without any special cloud awareness.

Advertisement

All of these advantages make file-based storage the most expensive option, but storing files in the cloud has a few other disadvantages. To achieve high performance, most cloud-based file systems (like Amazon EBS) can be accessed by only one cloud-based virtual machine at a time, which means all applications needing that data must run on a single cloud VM. To serve multiple VMs (like Azure Files) requires fronting the storage with a NAS (network attached storage) protocol like SMB, which can severely limit performance. File systems are fast, flexible, and legacy compatible, but they are expensive, useful only to applications running in the cloud, and do not scale well.

Objects are not files. Remember that, because it is easy to forget. Objects live in a flat namespace, like one giant directory. Latency is high, sometimes hundreds or thousands of milliseconds, and throughput is low, often topping out around 150 megabits per second unless clever tricks are used. Much about accessing objects comes down to clever tricks like multipart upload, byte range access, and key name optimization. Objects can be read by many cloud-native and web-based applications at once, from both within and outside the cloud, but traditional applications require performance crippling workarounds. Most interfaces for accessing object storage make objects look like files: key names are filtered by prefix to look like folders, custom metadata is attached to objects to appear like file metadata, and some systems like FUSE cache objects on a VM file system to allow access by traditional applications. But such workarounds are brittle and sap performance. Cloud storage is cheap, scalable, and cloud native, but it is also slow and difficult to access.

Databases have their own complex structure, and they are accessed by query languages such as SQL. Traditional databases may be backed by file storage, but they require a live database process to serve queries. This can be lifted into the cloud by copying the database files and applications onto a VM, or by migrating the data into a cloud-hosted database service. But copying a database file into object storage is only useful as an offline backup. Databases scale well as part of a cloud-hosted service, but it is critical to ensure that the applications and processes that depend on the database are fully compatible and cloud-native. Database storage is highly specialized and application-specific.

Balancing the apparent cost savings of object storage against the functionality of files and databases requires careful consideration of exactly what functionality is required. For example, if you want to store and distribute many thousands of small files, archive them into a ZIP file and store that as a single object instead of trying to store each individual file as a separate object. Incorrect storage choices can lead to complex dependencies that are difficult and expensive to change later.

1 2 Page 1
Page 1 of 2