Apache Spark on YARN – Resource Planning

Apache Spark on YARN – Resource Planning

Overview

This is the second article of a four-part series about Apache Spark on YARN. As Apache Spark is an in-memory distributed data processing engine, the application performance is heavily dependent on resources such as executors, cores, and memory allocated. The resources for the application depends on the application characteristics such as storage and computation.

Few performance bottlenecks were identified in the SFO Fire department call service dataset use case with YARN cluster manager. One of the bottlenecks was about improper usage of resources in YARN cluster and execution of the application based on default Spark configuration.

To understand about the use case and performance bottlenecks identified, refer our previous blog on Apache Spark on YARN – Performance and Bottlenecks. In this blog, let us discuss about the resource planning for the same use case used in our previous blog and about improving the performance of the Spark application used in that use case.

Our other articles of the four-part series are:

Spark Resource Planning Principles

The general principles to be followed while deciding resource allocation for Spark application is as follows:

  • Most granular (smallest sized executors) level of resource allocation reduces application performance due to the inability to use the power of running multiple tasks inside single executor. To perform computation faster, multiple tasks within the executor share the cached data.
  • Least granular (biggest sized executors) level of resource allocation influences application performance due to over usage of the resources and not considering memory overhead of OS and other daemons.
  • The balanced resources (executors, cores, and memory) with memory overhead improves the performance of the Spark application especially when running Spark application on YARN.

Understanding Use Case Performance

The Spark application is executed in YARN cluster mode. The resource allocation for the use case Spark application is illustrated in the below table:

select

The observation from Spark UI are as follows:

  • High-level API implementation of the application was completed and the results were provided in 1.8 and 1.9 minutes.
  • Low-level RDD API implementation of the application was completed in 22 minutes and even with Kyro serialized way the application was completed in 21 minutes.

Fire Service Call Output

Let us understand the YARN resources before performing Spark application resource tuning.

Understanding YARN Resource

A cluster is set up and the YARN resource availability from YARN configuration is illustrated in the below table:

select

The maximum memory and vcores available per node are 8 GB and 3 Cores. Totally, we have 16 GB and 6 Cores as shown in the below diagram:

YARN Cluster Metrics

If the underlying instance has more memory and core, the above configuration can be increased. Let us stick with the above configuration of YARN and tune the resources. If the resources allocated to the Spark application exceeds these limits, then the application will be terminated with error messages.

Executor Memory Exceeds Cluster Memory Error

Executor Core Exceeds Cluster Vcore Error

Hope, you understood the use case performance and available resources in YARN.

Spark on YARN – Resource Planning

Let us find out the reasonable resources to execute the Spark application in YARN.

Memory available per node 8 GB
Core available per node 3

 

To find out the number of executors, cores, and memory and its works for our use case with notable performance improvement, perform the following steps:

Step 1: Allocate 1 GB memory and 1 core for driver per node. Driver can be launched at any one of the nodes at run time. If the output of the action returns more data (for example, more than 1 GB), then driver memory must be adjusted.

Memory available per node 7 GB
Core available per node 2

 

Step 2: Assign 1 GB memory and 1 core for OS & Hadoop Daemons overhead per instance. Let us look at the below instance to launch the cluster:

Instance details: m4.xlarge (4 cores, 16 GB RAM)

1 core and 8 GB RAM are freed up for other resources and YARN is configured with 8 GB RAM and 3 cores per node. The freed-up resource will be used on OS and Hadoop Daemons overhead. Memory available and core available per node remains unchanged after Step 2.

Step 3: Find out number of cores per executor. As 2 cores per node are available, decide the number of cores as 2 per executor.

Note: If you have more cores per instance, (for example, 16 – 1(overhead) = 15), then stick with number of cores per executor as 5 while running in YARN with HDFS due to high HDFS throughput.

Step 4: Find out number of executors and memory per executors.

Number of cores per executor: 2 

Total cores = Number of nodes * Number of cores per node (after taking overhead) => 2 * 2 = 4

Total Executors: 2

Total executors = Total cores / Number of nodes => 4 / 2 = 2

Number of executors per node = Total executors / Number of nodes => 2/2 = 1. Each node will have one executor.

Memory per executor: 7 GB (This must be adjusted as per the application payload)

                Memory per node / Number of executors per node => 7 / 1 => 7 GB

 

This calculation works well with our use case except for the memory per executor as input dataset size is 1.5 GB and using 6 GB per executor to process 1.5 GB is like over using the memory.

Executor memory with 2 GB is applied and increased up to 7 GB per executor to execute the Spark application. 2 GB memory per executor is decided as there are no additional performance improvements while increasing executor memory from 2 GB to 7 GB.

The decided resource allocation derived from the above steps for the use case Spark applications is illustrated in the below table:

select

Note: Different organization has different workloads and the above steps may not work well for all the cases. You can obtain an idea about calculating executors, cores, and memory.

Running Spark on YARN with Tuned Resource

DataFrame Implementation of Spark Application

DataFrame implementation of Spark application is executed in most granular, lease granular, and balanced resource (which we have calculated) levels.

FireService Call Analysis DataFrame Test2 Executors Stats

Fire Service Call Analysis DataFrame Test1 Executors Stats

Fire Service Call Analysis DataFrame Test Executor Stats

The balanced resource allocation provides notable performance improvement from 1.8 minute to 1.3 minutes.

FireServiceCallAnalysisSPTuneOutput

RDD Implementation of Spark Application

RDD implementation of Spark application is executed in most granular, lease granular, and balanced resource (which we have calculated) levels.

The RDD implementation on balanced resource allocation is 2 times faster than default Spark configuration execution. Spark default configuration produced results in 22 minutes. But, after resource tuning, the result is produced in 11 minutes.

FireServiceCallAnalysisRDDSPTuneOutput

Spark Applications with Default Configuration

Fire Service Call Output

Spark Application After Resource Tuning

FireServiceCallAnalysisRDDSPTuneOutput

FireServiceCallAnalysisSPTuneOutput

Conclusion

In this blog, we have discussed about the Spark resource planning principles and understood the use case performance and YARN resource configuration before doing resource tuning for Spark application.

We followed certain steps to calculate resources (executors, cores, and memory) for Spark application. The results are as follows:

  • Significant performance improvement in the DataFrame implementation of Spark application from 1.8 minutes to 1.3 minutes.
  • RDD implementation of Spark application is 2 times faster from 22 minutes to 11 minutes.

We covered one of the bottlenecks discussed in our previous blog Apache Spark on YARN – Performance and Bottlenecks. In the following blog posts, we will cover other two bottlenecks and performance tuning to eliminate those:

After performance tuning and fixing of bottleneck, the final duration to complete the application in both high-level and low-level APIs are:

Straggler Fix Output

References

5505 Views 4 Views Today