Bridging the developer and data scientist gap with cloud, notebooks and PixieDust

When developers and data scientists work together, the benefits abound

data science certification face in profile with heat map
Thinkstock

A wealth of information hides in the vast amount of data produced every day—roadside sensors measuring traffic volume, medical imaging for rapid diagnosis, and satellites circling overhead analyzing weather patterns. In nearly every industry, cloud enables exponential growth by providing cheap, remote storage of data, access through a variety of devices, and elastic compute for data processing at scale. But, how can we capture the full potential of this data? 

To do so requires a closer collaboration between data scientists and developers. As data-driven intelligence becomes a more integral component of nearly every function—from inventory management to personalized customer marketing—these two roles are finding the need to work together in tandem. Yet many teams today still struggle with doing so, as they continue to work with different tools and in separate languages.

Notebooks, for example, are powerful, cloud-ready tools that often require experience with programming languages that are popular among data scientists, like Python, for their strength in numerical analysis. Because of their Python base, in particular are often overlooked by developers, who typically prefer working in languages such as Java or Node.js.  

However, notebooks can offer tremendous potential to help bridge the gap between developers and data scientists, and can bring collaboration and benefits to both sides. Notebooks allow users to write and share code and rich text, all in one environment understood by both data scientists and developers. This allows them to work on the same data sets simultaneously, instead of the traditional process in which developers hand off raw data to data scientists, who translate it into languages like Python for analysis and then give findings and models back to developers - who must translate it yet again into their preferred language, such as Java or HTML.  

While this approach has worked in the past, today’s era of constant iteration and the continuous demand for new and competitive features requires a more agile and connected approach—instead of passing data around in a relay hand-off.

Let’s consider an example. If a marketing department wants to quickly build an application that generates real-time sentiment analysis from Twitter, team members can turn to their data science and development teams for support. More likely than not, the data scientist ingesting the social data and the developer building the dashboard will be working in different languages, which can cause friction, bottlenecks and time to market delays. 

This is where notebooks come in, combined with the magic of PixieDust. PixieDust is an open source helper library for Jupyter notebooks that allows developers to explore data analysis models without having to learn or code in statistical languages. Fueled by the collaborative power of the cloud, PixieDust enables users to visualize data, build dashboards, and more efficiently share data findings within notebooks.  

By using notebooks and PixieDust together, the data scientist and developer can work in their preferred language in the same notebook. This means a developer can obtain early insights into raw data at the same time a data scientist begins working with the same sets—allowing both sides to immediately view trends worth exploring, as well as communicate feedback around potential new features, without waiting for the typical translation to be completed first.

PixieDust allows different languages to be used in tandem by abstracting out trends and patterns in data, and turns these insights into understandable visualizations which can be interpreted by almost any user, even non-technical line-of-business users—instead of lines of code.

PixieDust supports a data strategy tuned for cloud and AI by improving developer productivity, and quickly turns data into logic that users across a business can understand. It offers value for developers and data scientists alike by tapping into the power of the cloud to understand data, and helps them to work together to quickly identify business opportunities through data visualization. PixieDust works to derive meaning from numbers and allows intelligence to be delivered to developers, data scientists and business users with clarity.

When developers and data scientists work together, the benefits abound. Just imagine what you and your teams could build once you remove the barriers that stand in the way of what’s possible with data.

This article is published as part of the IDG Contributor Network. Want to Join?