Don't get hung up on best practices

When it comes to designing real apps in the real world, best practices aren't always applicable

Despite what many startups may tell you, application and API design isn’t automatic and doesn't "just happen." There are best practices and important guidelines to follow, but they all make necessary assumptions about the application or service you might be building. The best practices for a social media mobile app don’t apply to an app that collects back-end performance metrics, and vice versa.

The ultimate goal of designing and building an application or infrastructure is to look as far ahead as possible and build a foundation that doesn’t handcuff future expansion and heretofore unknown plans. When we build IT infrastructures, we need to account for the unexpected as much as possible. We don’t always succeed but we have to try.

That’s a significant departure from other types of architecture and design. For instance, architects designing a house don’t have to design a foundation for an addition that might or might not happen a few decades later; they can focus on designing the house. We typically don’t have that luxury in IT.

Generally speaking, however, we have some idea of scale and usage that our app or infrastructure will support, and that will help us make many of the initial design decisions. If we’re talking about that back-end metric collection app, then we probably have a finite number of collection points. If that will expand, we probably have a reasonable idea of the expansion plans and rates.

If we’re talking about a social media app, we have no firm idea of how many users we may need to support, but we’d better design the API to scale massively by adding resources, not by adding more code. We also need to plan to make that scalability as simple and seamless as possible. Again, the precise nature of the app we are building makes all the difference in terms of design on nearly every level.

As an example, perhaps we’re building a service that needs to reference a fixed set of data to be used for data interpretation and action logic. Let's say this data is published by a third party and laid out in a well-known format. It might change only once or twice over the course of a year. Our back-end analysis code needs to use that data in order to make decisions on what actions to take when inspecting data flowing in from collection points -- be they human input or machine metrics. Where should we put that fixed data set?

To answer that, we need to answer the question of scale. If we’re talking about a relatively small number of endpoints that will be sending data, we can include this file with the back-end app code itself, to be read at startup and used for all subsequent transactions. If the file changes, the code would also need to change, and thus, the new file would be integrated into the code base. If we’re talking about potentially massive scale, and data within the file that might fluctuate without accompanying code changes, then it might be best to load the file into a central database and cache it, returning the data to requesting servers as needed.

Then again, if we go the second route, we’d definitely need to make sure we’re loading the reference data only once, at the start of the process, and not requesting it every time it needs to be referenced. This might sound obvious, but you’d be surprised at how many performance bottlenecks are caused by this type of design flaw.

There’s another option that combines the two, using data center orchestration tools like Puppet and Ansible to distribute updated reference data and code as files to groups of servers. This removes the need to maintain the reference data in the database, but adds what may be unnecessary complexity to the infrastructure.

There are arguments on both sides of this design question:

  1. Those that claim that there’s no need to take the database hit for fixed data that can be included from a file
  2. Those that say the database is the ultimate authority and reading data from files is liable to cause problems

Both are right, in the right context, but both are wrong in another. (Note: I don’t understand the notion that reading fixed data from files is somehow problematic. The code itself is a file. The required modules/packages/gems/etc are files. If it’s a Web app, the Web server and database configuration files are, well, files.)

This type of problem blends devops and pure development, and it can cause friction. In many situations like this, some best practices don’t apply to the situation because it’s a square peg in a round hole. The trick is to recognize the round hole and adjust your peg accordingly.