Big Data

Big Data | News, how-tos, features, reviews, and videos

Exploding binary numbers

Machine learning algorithms explained

Machine learning uses algorithms to turn a data set into a model. Which algorithm works best depends on the problem

Sparks

Delta Lake gives Apache Spark data sets new powers

A new open source project from Databricks adds ACID transactions, versioning, and schema enforcement to Spark data sources that don't have them

big data blue

Pub/sub messaging: Apache Kafka vs. Apache Pulsar

Apache Kafka set the bar for large-scale distributed messaging, but Apache Pulsar has some neat tricks of its own

big data blue

Apache Kafka vs. Apache Pulsar: How to choose

Apache Kafka set the bar for large-scale distributed messaging, but Apache Pulsar has some neat tricks of its own

container ship storage transport colorful containers diversity outsourcing

IBM preps Watson AI services to run on Kubernetes

IBM Watson services arrive in versions that can run on the public cloud or on privately hosted container infrastructure

cloud connect comput woman carry lights

How to use Azure Data Explorer for large-scale data analysis

Microsoft’s tool for querying terabytes of data finally arrives for everyone to use

sparkler / firework / sparks / celebration / hands

Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

sparkler / firework / sparks / celebration / hands

Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

bullseye target with 3 arrows

Why you should use Gandiva for Apache Arrow

An execution engine for Arrow-based in-memory processing, Gandiva brings dramatic performance improvements to analytical workloads

Artificial intelligence computer brain circuits electronics grid

Review: MXNet deep learning shines with Gluon

With the addition of the high-level Gluon API, Apache MXNet rivals TensorFlow and PyTorch for developing deep learning models

sparkler / firework / sparks / celebration / hands

Microsoft revamps machine learning tools for Apache Spark

The new open source release integrates Spark with Cognitive Toolkit and other Microsoft machine learning offerings

clouds cloud cloudy mccloudster

The future is cloudy, with a chance of success

True cloud computing metaphors: not every cloud is a rain cloud, and too much rain is disastrous for the unprepared

big data messaging system / information architecture / mosaic infrastructure

Built for realtime: Big data messaging with Apache Kafka, Part 2

Learn how to use Apache Kafka's partitions, message offsets, and consumer groups to distribute load and scale your applications horizontally, handling up to millions of messages per day

big data messaging system / information architecture / mosaic infrastructure

Built for realtime: Big data messaging with Apache Kafka, Part 1

Apache Kafka scales horizontally and offers much higher throughput than some traditional messaging systems. Get started with installation, then build your first Kafka messaging system

one yellow arrow moving opposite a stream of white arrows

Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

one yellow arrow moving opposite a stream of white arrows

Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

bos 2018 main rev

Bossies 2018: The Best of Open Source Software Awards

InfoWorld recognizes the leading open source projects for software development, cloud computing, big data, and machine learning

bos 2018 data

The best open source software for data storage and analytics

InfoWorld’s 2018 Best of Open Source Software Award winners in databases and data analytics

data lake

What is a data lake? Flexible big data management explained

A data lake can be a much more flexible repository than a data warehouse. Or it can be a trash dump that grows and grows

alarm clock time deadline schedule time management made in usa by ryan mcguire gratisography

LinkedIn open-sources a tool to run TensorFlow on Hadoop

The Tony project uses Hadoop's native scheduler to run TensorFlow jobs, making fault tolerance and GPU usage easier

Load More