By Khaled Tannir
Configure your Hadoop cluster to run optimal
MapReduce jobs
Overview
* Optimize your MapReduce activity functionality * determine your
Hadoop cluster's weaknesses * song your MapReduce configuration
In Detail
MapReduce is the distribution procedure that the Hadoop MapReduce
engine makes use of to distribute paintings round a cluster by means of working
parallel on smaller facts units. it's precious in a variety of
applications, together with dispensed pattern-based searching
distributed sorting, net link-graph reversal, term-vector per
host, net entry log stats, inverted index development, document
clustering, computing device studying, and statistical machine
translation
This booklet introduces you to complex MapReduce options and
teaches you every thing from deciding on the criteria that affect
MapReduce task functionality to tuning the MapReduce configuration
Based on real-world adventure, this e-book can assist you to fully
utilize your cluster's node assets to run MapReduce jobs
optimally
This ebook info the Hadoop MapReduce activity performance
optimization technique. via a couple of transparent and practical
steps, it's going to assist you to completely make the most of your cluster's node
resources
Starting with how MapReduce works and the criteria that affect
MapReduce functionality, you may be given an summary of Hadoop
metrics and a number of other functionality tracking instruments. additional on, you
will discover functionality counters that assist you establish resource
bottlenecks, money cluster wellbeing and fitness, and measurement your Hadoop cluster
You also will find out about optimizing map and decrease initiatives by
using Combiners and compression
The e-book ends with top practices and proposals on how to
use your Hadoop cluster optimally
What you'll examine from this book
* find out about the standards that have an effect on MapReduce functionality *
Utilize the Hadoop MapReduce functionality counters to identify
resource bottlenecks * dimension your Hadoop cluster's nodes * Set the
number of mappers and reducers adequately * Optimize mapper and
reducer job throughput and code dimension utilizing compression and
Combiners * comprehend a number of the tuning houses and best
practices to optimize clusters
Approach
This e-book is an example-based educational that offers with optimizing
MapReduce activity performance
Who this publication is written for
If you're a Hadoop administrator, developer, MapReduce consumer, or
beginner, this ebook is the most suitable choice to be had if you want to
optimize your clusters and functions. Having earlier knowledge
of growing MapReduce purposes isn't valuable, yet will
help you greater comprehend the suggestions and snippets of MapReduce
class template code