Big Data

Big Data | News, how-tos, features, reviews, and videos

Abstract network of digital streams.
Conceptual image of an individual user working with an extruded virtual display.

speed_digital_car_lights_vehicle_fabio ballasina unsplash

Ahana Cloud for Presto review: Fast SQL queries against data lakes

Ahana Cloud for Presto turns a data lake on Amazon S3 into what is effectively a data warehouse, without moving any data. SQL queries run quickly even when joining multiple heterogeneous data sources.

bolts of light speeding through the acceleration tunnel 95535268

Solving query optimization in Presto

By combining machine learning and adaptive query execution, query optimization in Presto could become smarter and more efficient over repeated use.

spiral sparks / steelwork / coil / spring

Microsoft brings .NET dev to Apache Spark

.NET for Apache Spark 1.0 provides high-performance .NET APIs to Apache Spark including Spark SQL, Spark Streaming, and MLlib

spark on the globe shutterstock 7869750551

Azure Databricks previews parallelized Photon query engine

Microsoft and Databricks say the vectorized query engine written in C++ accelerates Apache Spark workloads by up to 20x

bolts of light speeding through the acceleration tunnel 95535268

Why you should use Presto for ad hoc analytics

A federated SQL query execution engine created at Facebook, Presto brings interactive querying to all of your data — no matter where it resides

iot security startups hot planets rocket lock security

15 hot tech skills getting hotter -- no certification required

Employers are apt to invest more often in cash premiums for noncertified tech skills compared to certifications. Here are a few they’re coveting the most now and going forward.

datacenter servers warehouse database

Rakuten frees itself of Hadoop investment in two years

The U.S. arm of the Japanese e-commerce giant has moved away from Hadoop in a bid to cut hardware costs and ease the management of its estate

Clash of fists in silhouette

Julia vs. Python: Which is best for data science?

Python has turned into a data science and machine learning mainstay, while Julia was built from the ground up to do the job

sparking jumper cables 94261543

Apache Spark 3.0 adds Nvidia GPU support for machine learning

The next major release of the in-memory data processing framework will support GPU-accelerated functions courtesy of Nvidia RAPIDS

toy rocket ship

Cython tutorial: How to speed up Python

How to use Cython and its Python-to-C compiler to give your Python applications a rocket boost

ifw data lakes outdoors mountains water by ryan stone via unsplash

Is your data lake open enough? What to watch out for

Like yesterday’s data warehouses, today’s data lakes threaten to lock us into proprietary formats and systems that restrict innovation and raise costs

jw series data structure algorithms java coding programmer 2400x1600 davidgoh akindo gettyimages 53

Data structures and algorithms in Java: A beginner's guide

Learn all about array and list data structures in Java, and the algorithms you can use to search and sort the data they contain

holiday lights neurons network stream

What is Apache Spark? The big data platform that crushed Hadoop

Fast, flexible, and developer-friendly, Apache Spark is the leading platform for large-scale SQL, batch processing, stream processing, and machine learning

target bullseyes

Neo4j 4.0 targets scalability, security, and performance

Leading native graph database adds long-awaited horizontal sharding, granular security, and reactive processing

big data blue

Why data-driven businesses need a data catalog

Enterprises need better tools to learn and collaborate around data sources. Data catalogs with pioneering machine learning capabilities can help you tap your valuable data

data lake

Qubole review: Self-service big data analytics

Cloud-native data platform puts Spark, Presto, Hive, and Airflow at your fingertips, while controlling your cloud spending

data scientist woman at virtual monitor user interface tools for data science by metamorworks getty

Who should be responsible for your data? The knowledge scientist

Organizations that recognize the importance of clean and reliable data while elevating knowledge work will move faster along the path to true data-driven decision-making

rivalry tug war compet conflict challenge determin

Will data gravity favor the cloud or the edge?

An industry standard confidential computing framework could unlock secure data processing at both the center and the edge

Load More