Tutorial: Spark application architecture and clusters

Learn how Spark components work together and how Spark applications run on standalone and YARN clusters

Become An Insider

Sign up now and get FREE access to hundreds of Insider articles, guides, reviews, interviews, blogs, and other premium content. Learn more.

Before you begin your journey as an Apache Spark programmer, you should have a solid understanding of the Spark application architecture and how applications are executed on a Spark cluster. This article closely examines the components of a Spark application, looks at how these components work together, and looks at how Spark applications run on standalone and YARN clusters.

Anatomy of a Spark application

A Spark application contains several components, all of which exist whether you’re running Spark on a single machine or across a cluster of hundreds or thousands of nodes.

Each component has a specific role in executing a Spark program. Some of these roles, such as the client components, are passive during execution; other roles are active in the execution of the program, including components executing computation functions.

The components of a Spark application are:

To continue reading this article register now