Big Data
Big Data | News, how-tos, features, reviews, and videos
AWS Glue upgrades Spark engines, backs Ray framework
Serverless data integration service in the Amazon cloud also adds support for built-in Pandas APIs and the Apache Hudi, Apache Iceberg, and Delta Lake formats.
Starburst Galaxy gets data discoverability updates
At AWS re:Invent 2022, the company also announced support for AWS Lake Formation via Starburst Enterprise suite to help joint customers implement data mesh architecture.
When is enough data enough?
Maybe we don’t need more data, we just need people who understand the data we already have and its value in a business context.
Dremio Cloud review: A fast and flexible data lakehouse on AWS
Dremio Cloud leaps big data in a single bound with a fast SQL engine and optimizations that can accelerate queries dramatically. Plus it lets you use other engines on the same data.
Why Apache Iceberg will rule data in the cloud
Apache Iceberg is an open table format that offers scalability, usability, and performance advantages for very large data sets. Here are five reasons Iceberg is optimal for cloud data workloads.
Databricks adds data governance, marketplace features
The data marketplace and other features are expected to accelerate data engineering tasks with an option for data monetization down the road, Databricks said.
Databricks open sources its Delta Lake data lakehouse
Databricks is open sourcing Delta Lake to counter criticism from rivals and take on Apache Iceberg as well as data warehouse products from Snowflake, Starburst, Dremio, Google Cloud, AWS, Oracle and HPE.
12 programming tricks to cut your cloud bill
Cutting cloud costs is a team effort, and that includes developers. Here are 12 tricks for developing software that is cheaper to run in the cloud.
What is TensorFlow? The machine learning library explained
TensorFlow is a Python-friendly open source library for numerical computation that makes machine learning and developing neural networks faster and easier.
What is a data lake? Massively scalable storage for big data analytics
Dive into data lakes—what they are, how they're used, and how data lakes are both different and complementary to data warehouses.
Where AI has made real progress
Better data infrastructure has provided a big boost to AI’s growth, but some things still require a human.
Working with Azure Managed Instance for Cassandra
Use open-source tools to build big data systems that bridge on premises and cloud.
How to use R with BigQuery
See how to use R to query data in Google BigQuery with the bigrquery and dplyr R packages.
How the cloud and big compute are remaking HPC
High-performance computing projects require massive quantities of compute resources. Pairing simulation and specialized hardware with the cloud powers the breakthroughs of the future.
Why developers use Confluent to manage Apache Kafka
How the fully managed Kafka service can bring peace and simplicity to the lives of those who depend on event streaming infrastructure.
Google’s Logica language addresses SQL’s flaws
Open source logic programming language compiles to SQL and runs on Google BigQuery, with experimental support for PostgreSQL and SQLite.
Ahana Cloud for Presto review: Fast SQL queries against data lakes
Ahana Cloud for Presto turns a data lake on Amazon S3 into what is effectively a data warehouse, without moving any data. SQL queries run quickly even when joining multiple heterogeneous data sources.
Deep Dive
Machine learning megaguide: Amazon, Microsoft, Databricks, Google, HPE, IBM
Download InfoWorld's massive roundup of Amazon, Microsoft, Databricks, Google, HPE, and IBM machine learning toolkits