Big Data
Big Data | News, how-tos, features, reviews, and videos
What is a data lake? Massively scalable storage for big data analytics
Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.
Where AI has made real progress
Better data infrastructure has provided a big boost to AI’s growth, but some things still require a human.
Working with Azure Managed Instance for Cassandra
Use open-source tools to build big data systems that bridge on premises and cloud.
How to use R with BigQuery
See how to use R to query data in Google BigQuery with the bigrquery and dplyr R packages.
How the cloud and big compute are remaking HPC
High-performance computing projects require massive quantities of compute resources. Pairing simulation and specialized hardware with the cloud powers the breakthroughs of the future.
Why developers use Confluent to manage Apache Kafka
How the fully managed Kafka service can bring peace and simplicity to the lives of those who depend on event streaming infrastructure.
Google’s Logica language addresses SQL’s flaws
Open source logic programming language compiles to SQL and runs on Google BigQuery, with experimental support for PostgreSQL and SQLite.
Ahana Cloud for Presto review: Fast SQL queries against data lakes
Ahana Cloud for Presto turns a data lake on Amazon S3 into what is effectively a data warehouse, without moving any data. SQL queries run quickly even when joining multiple heterogeneous data sources.
Solving query optimization in Presto
By combining machine learning and adaptive query execution, query optimization in Presto could become smarter and more efficient over repeated use.
Microsoft brings .NET dev to Apache Spark
.NET for Apache Spark 1.0 provides high-performance .NET APIs to Apache Spark including Spark SQL, Spark Streaming, and MLlib
Azure Databricks previews parallelized Photon query engine
Microsoft and Databricks say the vectorized query engine written in C++ accelerates Apache Spark workloads by up to 20x
Why you should use Presto for ad hoc analytics
A federated SQL query execution engine created at Facebook, Presto brings interactive querying to all of your data — no matter where it resides
15 hot tech skills getting hotter -- no certification required
Employers are apt to invest more often in cash premiums for noncertified tech skills compared to certifications. Here are a few they’re coveting the most now and going forward.
Deep Dive
Machine learning megaguide: Amazon, Microsoft, Databricks, Google, HPE, IBM
Download InfoWorld's massive roundup of Amazon, Microsoft, Databricks, Google, HPE, and IBM machine learning toolkits
Deep Dive
Public cloud megaguide: Amazon, Microsoft, Google, IBM, and Joyent compared
The top five public clouds pile on the services and options, while adding unique twists
Deep Dive
Quick guide: Learn to crunch big data with R
Get started using the open source R programming language to do statistical computing and graphics on large data sets
Deep Dive
Build an IoT analytics solution with big data tools
The Internet of things seems futuristic, but real systems are delivering real analytics value today. Here’s some real-world IoT advice from the field
Deep Dive
Download the Hadoop Deep Dive
Businesses are using Hadoop across low-cost hardware clusters to find meaningful patterns in unstructured data. In this in-depth PDF, InfoWorld explains how Hadoop works and how you can reap its benefits