Data Science

Data Science | News, how-tos, features, reviews, and videos

data scientist analytics cybersecurity
Data streams through a businessman's head. / mindset / analysis / strategy / skills / knowledge

colorful numbers

Use Cython to accelerate array iteration in NumPy

NumPy is known for being fast, but there's always room for improvement. Learn how to use Cython to iterate over NumPy arrays at the speed of C.

laptop analytics data scientist analytics process doctor electronic medical records remote physician

IT career roadmap: Data scientist

Reading Freakonomics awakened his passion for data science. Here's how further education and thoughtful career moves led to becoming a data scientist.

Digital Transformation / spur change / provoke / incite / stir up / throw sand in the gears

RStudio changes name to Posit, expands focus to include Python and VS Code

RStudio is updating its name as it aims to expand use of its commercial products among data science teams using both Python and R.

DataStax

3 data quality metrics dataops should prioritize

Data-driven decisions require data that is trustworthy, available, and timely. Upping the dataops game is a worthwhile way to offer business leaders reliable insights.

fail frustration laptop user head desk

Why do businesses suck at using data?

Few enterprises can effectively leverage their data inside or outside of the cloud, and a new study says that's still the case. It's time to make a plan.

rock concert audience party music celebrate

How to attend RStudio Conference 2022 remotely for free

Keynotes and presentations will be streamed live. Plus, there will be a Discord server for virtual attendees.

look inside data analytics magnifying glass exploding data speed photo by mari lezhava on unsplash

What is behavioral analytics and when is it important?

The ability to mine large amounts of data to study how users act offers long-reaching business benefits and risk reduction opportunities.

neural network

What is TensorFlow? The machine learning library explained

TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning and developing neural networks faster and easier.

digital abstract financial numbers floating on screen

As data science goes mainstream, so does its language

Python may be the second choice to R, but its popularity and ease of use positions it to dominate data science.

cliff diving taking the plunge dive into a project ocean swimming by aydinmutlu getty 2400x1600

What is a data lake? Massively scalable storage for big data analytics

Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.

career roadmap primary

Career roadmap: Machine learning scientist

Data scientists and machine learning scientists have similar roles, but a machine learning scientist specializes in researching and implementing complex algorithms.

dock on lake at sunset

Review: Databricks Lakehouse Platform

Databricks Lakehouse Platform combines cost-effective data storage with machine learning and data analytics, and it's available on AWS, Azure, and GCP. Could it be an affordable alternative for your data warehouse needs?

data pipeline primary

Databricks targets data pipeline automation with Delta Live Tables

The company’s new ETL framework aims to cut down the time taken by data scientists and engineers setting up reliable data pipelines and managing infrastructure.

data scientist career rm

Career roadmap: Machine learning engineer

As organizations worldwide adopt machine learning across virtually every industry, the demand for machine learning engineers is on the rise.

financ table spreadsheet team collab

5 ways spreadsheets kill your business

Potentially error-prone, unsecured, and hard to maintain, spreadsheets create data silos and discourage collaboration.

big data blue

Use synthetic data for continuous testing and machine learning

Where real data is unethical, unavailable, or doesn’t exist, synthetic data sets can provide the needed quantity and variety.

Private file card drawer

Google releases differential privacy pipeline for Python

PipelineDP allows datasets containing personal information to be aggregated in a way that preserves the privacy of individuals.

illuminated network

Best practices for developing governable AI

Focus on these engineering best practices to build high-quality models that can be governed effectively.

Load More