In the last post (BigData Investigation 5 – MapReduce with Python and Hadoop Streaming) we came across different Hadoop cluster modes. This post explains the three supported Hadoop cluster modes.
A Hadoop cluster can be configured in one of three modes. Fully-Distributed Mode allows to configure Hadoop clusters ranging from a few nodes to thousands of nodes. All Hadoop services are running as daemons where a separate Java Virtual Machine (JVM) is started for each Hadoop component. The services communicate via TCP sockets. Fully-Distributed Mode is used for production. Smaller Fully-Distributed Hadoop clusters are required for development and quality assurance (QA).
In addition Hadoop supports two single-node modes. Pseudo-Distributed Mode runs all Hadoop daemons on a single server, and thus looks similar to a Fully-Distributed Hadoop cluster. All daemons communicate via TCP sockets like in Fully-Distribute Mode. Pseudo-Distributed Mode is a good choice for testing and learning.
Local (Standalone) Mode does not run any Hadoop daemon. All Hadoop services and the application run in a single Java Virtual Machine (JVM). Applications run in Local (Standalone) Mode faster than in Pseudo-Distribute Mode, because no TCP communication is required. Local (Standalone) Mode is a good choice for development and testing.
In the next post I will explain how to download Apache Hadoop and to install it on Linux.
Changes:
2016/09/30– added link – “how to download Apache Hadoop and to install it on Linux” => BigData Investigation 7 – Installing Apache Hadoop in Local (Standalone) Mode