How-to: Deploy Apache Hadoop Clusters Like a Boss

The HDFS docs have some information, and logically it makes sense to separate the network of the Hadoop nodes from a “management” network. However, in our experience, multi-homed networks can be tricky to configure and support. The pain stems from Hadoop integrating with a large ecosystem of components that all have their own network and port-binding settings.
The traditional recommendation for worker nodes was to set swappiness (vm.swappiness) to 0. However, this behavior changed in newer kernels and we now recommend setting this to 1. (This post has more details.)

Cloudera Manager Like A Boss

We highly recommend using  Cloudera Manager to manage your Hadoop cluster. Cloudera Manager offers many valuable features to make life much easier. The Cloudera Manager documentation is pretty clear on this but in order to stamp out any ambiguity, below are the high-level steps to do a production-ready Hadoop deployment with Cloudera Manager.

 

Set up an external database and pre-create the schemas needed for your deployment.

 

create database amon DEFAULT CHARACTER SET utf8;

grant all on amon.* TO ‘amon’@’%’ IDENTIFIED BY ‘amon_password’;

create database rman DEFAULT CHARACTER SET utf8;

grant all on rman.* TO ‘rman’@’%’ IDENTIFIED BY ‘rman_password’;

create database metastore DEFAULT CHARACTER SET utf8;

grant all on metastore.* TO ‘metastore’@’%’ IDENTIFIED BY ‘metastore_password’;

create database nav DEFAULT CHARACTER SET utf8;

grant all on nav.* TO ‘nav’@’%’ IDENTIFIED BY ‘nav_password’;

create database sentry DEFAULT CHARACTER SET utf8;

grant all on sentry.* TO ‘sentry’@’%’ IDENTIFIED BY ‘sentry_password’;

(Please change the passwords in the examples above!)

 

Install the cloudera-manager-server and cloudera-manager-daemons packages per documentation.

 

yum install cloudera-manager-server cloudera-manager-daemons

1

yum install cloudera-manager-server cloudera-manager-daemons

Run the scm_prepare_database.shscript specific to your database type.

 

/usr/share/cmf/schema/scm_prepare_database.sh mysql -h cm-db-host.cloudera.com -utemp -ptemp –scm-host cm-db-host.cloudera.com scm scm scm

1

/usr/share/cmf/schema/scm_prepare_database.sh mysql -h cm-db-host.cloudera.com -utemp -ptemp –scm-host cm-db-host.cloudera.com scm scm scm

Start the Cloudera Manager Service and follow the wizard from that point forward.

This is the simplest way to install Cloudera Manager and will get you started with a production-ready deployment in under 20 minutes.
Many customers purchase new hardware in regular cycles; adding new generations of computing resources makes sense as data volumes and workloads increase. For such environments containing heterogeneous disk, memory, or CPU configurations, Cloudera Manager allows Role Groups, which allow the administrator to specify memory, YARN containers, and Cgroup settings per node or per groups of nodes.
Previously, we published some recommendations on selecting new hardware for Apache Hadoop deployments. That post covered some important ideas regarding cluster planning and deployment such as workload profiling and general recommendations for CPU, disk, and memory allocations.

Enter Your Comment

Your email address will not be published. Required fields are marked *