Big Data

Big Data | News, how-tos, features, reviews, and videos

man on mountain top winner leader alone
machine learning


Delta Lake gives Apache Spark data sets new powers

A new open source project from Databricks adds ACID transactions, versioning, and schema enforcement to Spark data sources that don't have them

big data blue

Pub/sub messaging: Apache Kafka vs. Apache Pulsar

Apache Kafka set the bar for large-scale distributed messaging, but Apache Pulsar has some neat tricks of its own

A human profile containing digital wireframe of technology connections.

The best machine learning and deep learning libraries

Why TensorFlow, Spark MLlib, Scikit-learn, PyTorch, MXNet, and Keras shine for building and training machine learning and deep learning models

container ship storage transport colorful containers diversity outsourcing

IBM preps Watson AI services to run on Kubernetes

IBM Watson services arrive in versions that can run on the public cloud or on privately hosted container infrastructure

cloud connect comput woman carry lights

How to use Azure Data Explorer for large-scale data analysis

Microsoft’s tool for querying terabytes of data finally arrives for everyone to use

sparkler / firework / sparks / celebration / hands

Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

bullseye target with 3 arrows

Why you should use Gandiva for Apache Arrow

An execution engine for Arrow-based in-memory processing, Gandiva brings dramatic performance improvements to analytical workloads

Artificial intelligence computer brain circuits electronics grid

Review: MXNet deep learning shines with Gluon

With the addition of the high-level Gluon API, Apache MXNet rivals TensorFlow and PyTorch for developing deep learning models

clouds cloud cloudy mccloudster

The future is cloudy, with a chance of success

True cloud computing metaphors: not every cloud is a rain cloud, and too much rain is disastrous for the unprepared

one yellow arrow moving opposite a stream of white arrows

Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

bos 2018 main rev

Bossies 2018: The Best of Open Source Software Awards

InfoWorld recognizes the leading open source projects for software development, cloud computing, big data, and machine learning

bos 2018 data

The best open source software for data storage and analytics

InfoWorld’s 2018 Best of Open Source Software Award winners in databases and data analytics

data lake

What is a data lake? Flexible big data management explained

A data lake can be a much more flexible repository than a data warehouse. Or it can be a trash dump that grows and grows

template c100.00 01 15 18.still001

Matei Zaharia, creator of the Apache Spark project, on the big data framework | True Technologist Ep 2

In this episode of True Technologist, host Eric Knorr talks with Matei Zaharia, chief technologist at Databricks and an assistant professor of computer science at Stanford, about the Apache Spark and Apache Mesos projects

shortcut through a maze

Why there are no shortcuts to machine learning

As long as companies understand that good data science takes time in an enterprise, and give these people room to learn and grow, they won’t need shortcuts

sort filter group birds on a wire

Why we lose out if we leave everything to algorithms

If we trust a measurement system wholly to data and algorithms, will it inevitably be gamed by the humans it measures?

heart monitor rate ekg hospital medical

How to build stateful streaming applications with Apache Flink

Take advantage of Flink’s DataStream API, ProcessFunctions, and SQL support to build event-driven or streaming analytics applications

blockchain big data

Introducing BigQuery ML for building predictive models with SQL

Google’s beta extension performs linear regression forecasting and binary logistic classification in the BigQuery data warehouse

Load More