Big Data

Big Data | News, how-tos, features, reviews, and videos

sparkler / firework / sparks / celebration / hands

Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

sparkler / firework / sparks / celebration / hands

Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

bullseye target with 3 arrows

Why you should use Gandiva for Apache Arrow

An execution engine for Arrow-based in-memory processing, Gandiva brings dramatic performance improvements to analytical workloads

Artificial intelligence computer brain circuits electronics grid

Review: MXNet deep learning shines with Gluon

With the addition of the high-level Gluon API, Apache MXNet rivals TensorFlow and PyTorch for developing deep learning models

clouds cloud cloudy mccloudster

The future is cloudy, with a chance of success

True cloud computing metaphors: not every cloud is a rain cloud, and too much rain is disastrous for the unprepared

big data messaging system / information architecture / mosaic infrastructure

Built for realtime: Big data messaging with Apache Kafka, Part 2

Learn how to use Apache Kafka's partitions, message offsets, and consumer groups to distribute load and scale your applications horizontally, handling up to millions of messages per day

big data messaging system / information architecture / mosaic infrastructure

Built for realtime: Big data messaging with Apache Kafka, Part 1

Apache Kafka scales horizontally and offers much higher throughput than some traditional messaging systems. Get started with installation, then build your first Kafka messaging system

one yellow arrow moving opposite a stream of white arrows

Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

one yellow arrow moving opposite a stream of white arrows

Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

bos 2018 main rev

Bossies 2018: The Best of Open Source Software Awards

InfoWorld recognizes the leading open source projects for software development, cloud computing, big data, and machine learning

bos 2018 data

The best open source software for data storage and analytics

InfoWorld’s 2018 Best of Open Source Software Award winners in databases and data analytics

data lake

What is a data lake? Flexible big data management explained

A data lake can be a much more flexible repository than a data warehouse. Or it can be a trash dump that grows and grows

shortcut through a maze

Why there are no shortcuts to machine learning

As long as companies understand that good data science takes time in an enterprise, and give these people room to learn and grow, they won’t need shortcuts

shortcut through a maze

Why there are no shortcuts to machine learning

As long as companies understand that good data science takes time in an enterprise, and give these people room to learn and grow, they won’t need shortcuts

sort filter group birds on a wire

Why we lose out if we leave everything to algorithms

If we trust a measurement system wholly to data and algorithms, will it inevitably be gamed by the humans it measures?

heart monitor rate ekg hospital medical

How to build stateful streaming applications with Apache Flink

Take advantage of Flink’s DataStream API, ProcessFunctions, and SQL support to build event-driven or streaming analytics applications

blockchain big data

Introducing BigQuery ML for building predictive models with SQL

Google’s beta extension performs linear regression forecasting and binary logistic classification in the BigQuery data warehouse

big data code binary tunnel

Big data: enabling new approaches to IT infrastructure security

Big data technologies and advanced analytics, including AI, are promising a way to get ahead of cyber threats

big data elephant analytics risk predictions vulnerable

3 big data platforms look beyond Hadoop

Learn how the Cloudera, Hortonworks, and MapR data platforms are evolving to meet the demands for real-time analytics and machine learning

big data elephant analytics risk predictions vulnerable

3 big data platforms look beyond Hadoop

Learn how the Cloudera, Hortonworks, and MapR data platforms are evolving to meet the demands for real-time analytics and machine learning

Load More