How to query and extract data from SaaS applications

Almost every department can derive value from SaaS data. Here are 3 ways to integrate it more effectively

How to query and extract data from SaaS applications

Behind every SaaS application are databases storing business information about employees, suppliers, customers, and other partners. SaaS applications support workflows such as CRM for sales and marketing, cloud ERPs for financials, workforce management for human resource functions, and other enterprise and departmental services. Today, many businesses use a wide range of SaaS applications—from mainstream products such as Salesforce, Slack, Workday, and Atlassian, to many smaller SaaS tools.

SaaS applications shouldn’t operate in silos, and most organizations need to integrate capabilities across them and with other enterprise applications managed in private or public clouds.

If a workflow across multiple applications requires application integration, then development teams can leverage a SaaS platform’s APIs to trigger events from one platform to another. Enterprise integration platforms such as Boomi, SnapLogic, or MuleSoft are options when many applications and services need integration. If lighterweight integrations that follow the If This, Then That pattern are required, then an IFTTT platform may provide sufficient integration. Development teams should also explore low-code platforms such as Appian, OutSystems, and PowWow if they are developing new applications that connect to multiple SaaS and enterprise workflows.

Leveraging SaaS data for different business needs

What if you need to integrate the data from a SaaS platform with other data sources? There are a few reasons why data integration across SaaS tools may be required:

  • Business analysts want to develop reports and dashboards using this data.
  • Data science teams want the data for machine learning experiments.
  • Business teams want to centralize the data to support workflows and other types of applications. For example, marketing teams often use customer data platforms or master data platforms to centralize data on customers, products, and other business entities.
  • IT teams should extract the data for backups or enable transitioning data to other platforms.
  • Legal teams sometimes need to perform legal discovery on the underlying data.
  • Data stewards often want to cleanse, transform, or enrich the underlying data.

Sure, you can leverage the SaaS platforms’ APIs to extract data, but this may require a significant development effort to learn the APIs, understand the SaaS platform’s data model, create data stores for any new data, write the code to load the data, and develop the logic for any transformations. In addition, IT teams have to define cloud or data center infrastructure to host this application or service. Lastly, ongoing support is required for any data integrations designed to run on a schedule or on demand. Developing the integration from scratch may be expensive for development teams and IT organizations with other, more strategic priorities.

Another methodology is to consider data integration, data streaming, ETL (extraction, transformation, and loading), or other data prep platforms. Using a data integration platform may be the optimal method when working with large volumes of data that frequently change, since these platforms enable flexible extraction and transformation. However, they also require upfront development for the integration before end-users get access and utilize the information.

Lighterweight means of querying and managing SaaS data may be desirable. Sometimes, these are useful to experiment, discover, and prototype rapidly. Other times these approaches can easily be used for operation or production needs, especially when data volumes are low and query throughput isn’t significant. Here are three options.

1. BI platforms that directly query SaaS applications

If your primary requirement is reporting, then many self-service BI and data visualization platforms have direct connectors to the more popular SaaS applications.

At a minimum, these integrations provide an easy way to query and discover the underlying SaaS data sources. At best, the out-of-the-box integration is sufficient for end-users to create the required data blending, reports, and dashboards.

There are some considerations.

  • These platforms enable joins and data blends when columns have matching keys. They become harder to use if significant data transformation is required before integrating the data source or blending it with other data sources.
  • Review whether SaaS data integrations are performed with real-time queries, or whether the data is extracted or cached.
  • Performance may be a factor if the SaaS application contains large data volumes, if there are complex joins with many other data sources, or if dashboards will be utilized concurrently by many users.

2. Platforms that emulate ODBC, JDBC, OData, or other drivers

If the business needs to go beyond reporting and dashboarding, and a lightweight integration approach is still desirable, then some commercial tools convert SaaS APIs into standard database drivers such as ODBC, JDBC, or OData. Two options for drivers to common SaaS platforms are Progress DataDirect and CData Driver Technologies.

The driver method may be most useful to data science teams who want to perform ad hoc queries into SaaS databases before pulling the data into their analysis. It’s also a good option for application developers who require real-time querying of SaaS application data.

Development and data science teams should investigate the performance of this integration, especially if high query volumes, large data sets, or low latency is required. Also, many SaaS applications throttle or charge customers based on API usage, so this may be a factor if higher query or data volumes are needed.

3. Lightweight ETL platforms that sync SaaS data to cloud databases

One final idea is to instrument a data integration out of the SaaS application into a cloud database that your organization sets up and manages. This strategy adds some operational complexity and costs, and it may not be ideal if real-time querying of the SaaS application data is required. But it does have several advantages:

  • It provides more control over the database platform and data architecture that business users, data scientists (including citizen data scientists), and application developers utilize. The platform and architecture should meet the volume, performance, and latency requirements.
  • Storing the data independent of the SaaS database provides greater flexibility to transform, join, cleanse, cube, or aggregate data as required by downstream users and applications.
  • If data security, data privacy, or other data governance controls for querying this data are different from the access and entitlement controls available in the SaaS applications, then hosting the data in a separate database may be required.
  • Hosting the data independent of the SaaS platform may be more cost-effective for higher data and query volume needs.

Although you could instrument this integration with data integration or data prep platforms, there are SaaS data integration platforms with out-of-the-box connectors to many SaaS applications. Stitch, a Talend company, is a plug-and-play solution if your objective is to stream data from SaaS applications to cloud databases. You can select what data to replicate and the replication frequency, but it does not provide any tools for transforming or filtering the data. Skyvia offers a similar product, and both have free tiers to let development teams try out integrations. Alooma, part of Google Cloud, focuses on moving data into big data platforms such as Google BigQuery, Amazon Redshift, and Snowflake, and provides some data transformation capabilities.

If your organization is utilizing many SaaS platforms, then a one-size-fits-all strategy may not work. Each integration path supports different SaaS integrations, and the type of integration must align with anticipated business needs. Reviewing the tools and considering multiple options is a best practice, especially when data integration needs vary.

Copyright © 2020 IDG Communications, Inc.