Memory Optimization of your Hadoop Cluster

If you want to get a better performance out of your Hadoop cluster, you might want to look into optimizing the memory settings of your NodeManagers. If you installed your cluster using Ambari, and you didn't change the memory settings during setup, your might underutilize the memory which is available on your machines. Therfore Hortonworks provides a nifty little tool for calculating the memory settings based on their best-practices.

You can find the corresponding python script on github.

To calculate the recommending settings just

git clone  
./2.1/ -c 16 -m 64 -d 4 -k True

Adjust the property to fit the sizing of your machines

-c = cores (number of cores)
-m = memory (64 GB of RAM)
-d = disks (number of disks)
-k = is HBase enabled (True/False)

The result in this case would be

Using cores=16 memory=64GB disks=4 hbase=True  
Profile: cores=16 memory=49152MB reserved=16GB usableMem=48GB disks=4  
Num Container=8  
Container Ram=6144MB  
Used Ram=48GB  
Unused Ram=16GB  
***** mapred-site.xml *****  
***** yarn-site.xml *****
***** tez-site.xml *****  

You can now change the corresponding properties in your *.xml files and restart the affected components/deamons. Now you should have more YARN memory availble in your cluster and can run more and bigger containers on the NodeManagers.

Tweaking those settings is also possible. But be careful to not overcommit your memory settings to much, because this might have severe performance implications since swapping the memory back to disk is extremly slow.

Andreas Fritzler

Data Jedi | Cloud and Big Data Expert | Machine Learning Enthusiast | Deep Learning Fanatic @SAP Opinions are my own