Big Data

Big Data | News, how-tos, features, reviews, and videos

abstract data

What is deep learning? Algorithms that mimic the human brain

Deep neural networks can solve the most challenging problems, but require abundant computing power and massive amounts of data

man on mountain top winner leader alone

4 reasons big data projects fail—and 4 ways to succeed

Nearly all big data projects end up in failure, despite all the mature technology available. Here's how to make big data efforts actually succeed

machine learning

What is machine learning? Intelligence derived from data

Machine learning algorithms learn from data to solve problems that are too complex to solve with conventional programming

Exploding binary numbers

Machine learning algorithms explained

Machine learning uses algorithms to turn a data set into a model. Which algorithm works best depends on the problem

Sparks

Delta Lake gives Apache Spark data sets new powers

A new open source project from Databricks adds ACID transactions, versioning, and schema enforcement to Spark data sources that don't have them

big data blue

Pub/sub messaging: Apache Kafka vs. Apache Pulsar

Apache Kafka set the bar for large-scale distributed messaging, but Apache Pulsar has some neat tricks of its own

container ship storage transport colorful containers diversity outsourcing

IBM preps Watson AI services to run on Kubernetes

IBM Watson services arrive in versions that can run on the public cloud or on privately hosted container infrastructure

cloud connect comput woman carry lights

How to use Azure Data Explorer for large-scale data analysis

Microsoft’s tool for querying terabytes of data finally arrives for everyone to use

sparkler / firework / sparks / celebration / hands

Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

bullseye target with 3 arrows

Why you should use Gandiva for Apache Arrow

An execution engine for Arrow-based in-memory processing, Gandiva brings dramatic performance improvements to analytical workloads

Artificial intelligence computer brain circuits electronics grid

Review: MXNet deep learning shines with Gluon

With the addition of the high-level Gluon API, Apache MXNet rivals TensorFlow and PyTorch for developing deep learning models

clouds cloud cloudy mccloudster

The future is cloudy, with a chance of success

True cloud computing metaphors: not every cloud is a rain cloud, and too much rain is disastrous for the unprepared

one yellow arrow moving opposite a stream of white arrows

Real-time data processing with data streaming: new tools for a new era

Real-time data streaming is still early in its adoption, but over the next few years organizations with successful rollouts will gain a competitive advantage

bos 2018 main rev

Bossies 2018: The Best of Open Source Software Awards

InfoWorld recognizes the leading open source projects for software development, cloud computing, big data, and machine learning

bos 2018 data

The best open source software for data storage and analytics

InfoWorld’s 2018 Best of Open Source Software Award winners in databases and data analytics

data lake

What is a data lake? Flexible big data management explained

A data lake can be a much more flexible repository than a data warehouse. Or it can be a trash dump that grows and grows

shortcut through a maze

Why there are no shortcuts to machine learning

As long as companies understand that good data science takes time in an enterprise, and give these people room to learn and grow, they won’t need shortcuts

sort filter group birds on a wire

Why we lose out if we leave everything to algorithms

If we trust a measurement system wholly to data and algorithms, will it inevitably be gamed by the humans it measures?

heart monitor rate ekg hospital medical

How to build stateful streaming applications with Apache Flink

Take advantage of Flink’s DataStream API, ProcessFunctions, and SQL support to build event-driven or streaming analytics applications

blockchain big data

Introducing BigQuery ML for building predictive models with SQL

Google’s beta extension performs linear regression forecasting and binary logistic classification in the BigQuery data warehouse

Load More