Hadoop, one of the most well-known and widely used open source distributed framework used for large scale data processing. It is based on five main building blocks which are MapReduce Framework, YARN infrastructure, Storage, HDFS Federation and Cluster. This is a definitive guide on how to use YARN in Hadoop.
YARN stands for yet another Resource Negotiator. It provides a central platform and resource management for security, to deliver consistent operations and for data governance tools. It also offers cost effective, this is a framework which provides computational resource that are required for executing applications. While using YARN on Hadoop, there are two elements that you must know about.
- The Resource Manager
The resource manager is the master and has the main authority that keeps track of the cluster resources. The runs many services and the most essential one is the resource scheduler. This is the program which decides how resources are to be assigned.
- The Node Manager
The node manager communicates with the resource manager and helps it in setting up containers for executing tasks. Every cluster has a node manager who is responsible for announcing itself to the resource manager. Every node manager provides some resources to the cluster and when it is run-time, the resource scheduler determines how its capacity must be used.
Things you must know for using YARN
- YARN is compatible with MapReduce applications which were developed for Hadoop.
- The resource manager of YARN focuses mainly on scheduling and manages clusters as they continue to expand to nodes.
- If you want to use new technologies that are found within the data centre, you can use YARN as it extends the power of Hadoop to a greater extent. Take advantage of the linear scale and cost effective processing and storage provided by the incumbent technologies. It offers a consistent framework to developers for writing data access applications.
- YARN consists of 3 sectors mainly. They are :
- The Job Submitter, which works as the client
- The Resource Manager, which works as the master
- The node manager, which works as the slave
YARN allows many open source and proprietary access engines to use Hadoop as a common platform for interactive, batch and real time engines which can get access to the same data set simultaneously. Reservation System is a resource reservation component which enables users to specify a particular profile of resources, reserve them and ensure its execution on time. This is supported by YARN.