6 data sources you should secure for your digital business

Freight train

Freight train

Credit: Kabelleger (CC BY-SA 3.0)

What would happen to your digital business if the data that feeds it would suddenly be unavailable? Let’s look at the various sources you may use, and how to secure them.


Your data-driven business is fueled by insights obtained from the processing and analysis of data from various sources. Have you considered what would happen if one -- or several -- of the data sources that feed this digital engine, were to dry up? If you were suddenly unable to access the critical data that make your business run?

Let's look at where your data comes from, and consider which concrete actions you can take to secure its supply.

Internal transactional data

Internal data, and especially data which primary purpose is distinct than the usage your digital business makes, is both the easiest and the trickiest to secure. It's easy because you don't have to negotiate a formal contract with a third party, and if there is executive buy-in for what you do, then getting the data owner to provide access should not be a problem. But it's also tricky precisely because of this lack of formal contract, because people change, because priorities shift. Whether accidental or not, you may find your access cut off overnight, and the restoration of this access not being a top priority for the data owner. Or data schemas may change and require that you rebuild you entire collection processes.

Action: make sure the proper processes and SLAs are in place, and follow very closely organization and staff movements to inform new stakeholders of why your access to data must remain safe.

Connected objects data

If you process data from the Internet of Things, and especially consumer connected devices, your challenge to securing access is primary legal. There are two questions you need to consider:

  • Who owns the data? Does it belong to the owner of the device, the account holder, or to your organization?
  • What can you do with the data? Surely, you can use it to render a service to your subscriber, but can you aggregate it with data from other subscribers? Can you resell this data (anonymized or not)? Can you derive insights, and resell this insight?

Action: review your terms of use and ensure these questions are being addressed. Also consider whether privacy laws and customs in various countries or regions may have an impact.

Syndicated data

Syndicated data is usually the easiest to control. Because you are paying a service provider to deliver data to you, you have a contract with this provider. This contract will cover service level agreements, licensing and usage limitations, and should ensure continued access.

However, you still need to consider what will happen if the service provider goes out of business, or changes its business model (like Twitter's recent announcement that they are shutting down their firehose to better control their supply chain).

Action: review if alternate sources are available, and keep these options at hand in case you need them.

Trading partners data

The case of trading partners data is very similar to the one of syndicated data, except that the data is usually not provided as a standalone service but as part of a broader relationship -- for example between a retailer and a manufacturer. Enforcing service level agreements can become tricky, if it puts at risk an otherwise profitable relationship.

Action: like you do for syndicated data, always have in mind alternate sources, if applicable.

Open data

The good news with open data is that it's free -- but it's also the bad news. Assuming you study carefully the terms of use and licensing agreement for the data, you should be safe legally. But there is no guarantee that this service will be provided in the long run, or that it will be provided consistently. The risks of changes in the data structures and the access methods provided, is very high. And if the service is not responding, you have no recourse.

Action: find multiple sources, and do not build your business on the assumption that open data feeds will remain available in the long run.

Harvested data

Harvesting data from web sites (screen scraping) or public APIs is common practice, but it is also the least secure source of data you can consider.

From the legal standpoint, this practice is often borderline since there is no licensing agreement that permits you to use the data harvested in such ways.

From the data availability standpoint, web sites change all the time, and your scraping routines will become obsolete in no time.

Action: stay away from data harvesting! And if data harvesting is your only option, be prepared to suffer outages, and to have to redevelop your routines all the time. And maybe get a lawyer....

This article is published as part of the IDG Contributor Network. Want to Join?

From CIO: 8 Free Online Courses to Grow Your Tech Skills
View Comments
Join the discussion
Be the first to comment on this article. Our Commenting Policies