Data is the currency of the new millennium. But distilling and simplifying that data to gain real insight requires an increasingly complex skill set and equally sophisticated tools. Gartner researcher Peter Sondergaard sums up the power of data analysis in the context of other notable innovations in history stating, “Information is the oil of the 21st century, and analytics is the combustion engine.”
If you break down the origins of data, you’ll find that 20 percent of the world’s data is public, while the other 80 percent is proprietary. But, like any powerful tool, it needs a lead or guiding light, and data scientists have quickly risen to this challenge with developers and data engineers as their collaborators. These groups join virtually and physically to learn, curate, build and deploy analytic solutions to help extract insights from their vast data stores. This new cross functional collaboration has helped the data unit function as one, elevating the role of the data science team within the larger enterprise.
As organizations become increasingly data-driven and the influence of the data scientist skyrockets, what are the soft skills that determine success or failure for the new A-Team?
- Creativity and imagination: The best data science teams are patient, persistent and focused. They understand how data pipelines function and are able to identify alternate solutions if something goes awry. The members of a data science team also love to learn, and their curiosity helps them come up with unexpected fixes to problems. When the different roles in a data science team come together, the result is a combined knowledge of numerous types of data sets and different programming languages.
- Rigor and discipline: Data science teams manage enormous amounts of data every day. A good understanding of procedures and standards is crucial to stay on top of it all. When each member of the data science team is clear on best practices, the data management process is streamlined, therefore making life easier for those who rely on data to do their jobs. With a firm grasp on algorithms, code and how it benefits the infrastructure, data science teams have exponential power within their organizations.
- Business acumen: Data science teams are the foundation of any data-driven organization. As such, they need to have a holistic view into how the business operates and what problems the company is looking to solve. A successful data science team has this information at their fingertips so they know how the data will ultimately be used to propel the organization towards the larger organization’s goals.
Once the data science team is built with these traits and proper guidelines are in place, a technological infrastructure with flexibility at its core must be created. Today’s organizations store data on a combination of public cloud, private cloud and on-premise hardware. Data science teams must be able to consistently manage data no matter where it is stored. In addition, because every industry has its own unique processes and compliance standards that data science tools must incorporate, the platforms themselves should be easily customizable.
Consider an actual example from IBM, NASA and the SETI Institute. These organizations are working together to analyze more than six terabytes of complex deep space radio signals to hunt for patterns that might identify the presence of intelligent extraterrestrial life. With the proper tools—IBM Analytics on Apache Spark, part of the Data Science Experience—SETI has been able to embark on its Stellar Pair Eavesdropping campaign, which enables the organization to look for potential communications between planets that might be orbiting in double star systems. More than half of all stars are, in fact, these types of planets. By extracting new features from millions of observations, researchers are able to use machine-learning techniques to classify signals and sharpen their focus for subsequent deep analysis on clusters of signals which are anomalous or outliers.
Without high-performing data science professionals and the right collaboration tools, organizations like SETI would not be able to handle and ultimately realize the full potential of their data. Just as an artist requires different tools for different creations, a data scientist needs a palette of capabilities to resolve the different problems they need to solve. IBM’s data science environment offers the most advanced analytics, open source technology and integrated development community, all built to encourage creativity and collaboration.
This article is published as part of the IDG Contributor Network. Want to Join?