Configuring Apache Hadoop Cluster & High Availability – Chapter 2

Steps :

Step1: download and configure Zookeeper

Step2: Hadoop configuration and high availability settings

Step3: creating  folders for Hadoop cluster and  file permissions

Step4: hdfs service and file system format

Let us see the Steps in Details :

Step 1 : download and configure Zookeeper

1.1  Download and configure Zookeeper software package from.(


Extract source

[hduser@mn1~]$tar –zxvf zookeeper-3.4.5.tar.gz

1.2  Zookeeper related configuration files are located

Configuration files     : /home/hduser/zookeeper-3.4.5/conf

Binary executables      : /home/hduser/zookeeper-3.4.5/bin

The Main configuration file


cp -rp zoo_sample.cfg zoo.cfg

Modifying zoo.cfg as per our installation guide

[hduser@mn1~]$vi /home/hduser/zookeeper-3.4.5/conf/zoo.cfg










Save & Exit!

Note :-  Each of the servers hosted in the same physical machine as virtual instance , every server port number has changed to mn1:2888:3888 , mn2: 2889:3889 & dn1:2890:3890

Create the myid file in /home/hduser/zookeeper/data/ and assign the value of each of the nodes in cluster. (mn1=1,mn2=2 & dn1=3)

create directory for data and log refer step3

[hduser@mn1~]$vi /home/hduser/zookeeper/data/myid


Save and Exit!

[hduser@mn2~]$vi /home/hduser/zookeeper/data/myid


Save & Exit!

[hduser@dn1~]$vi /home/hduser/zookeeper/data/myid


Save & Exit!

Step 2 : Hadoop configuration and high availability settings

2.1  Add / modify , following lines in file to apply environment variable settings.

[hduser@mn1~]$ vi /home/hduser/2.3.0/etc/hadoop/

export JAVA_HOME=/usr/java/jdk1.7.0_45/

export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hduser/2.3.0/lib/native/

export HADOOP_OPTS=”-Djava.library.path=/home/hduser/2.3.0/lib/native/”

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-“/home/hduser/2.3.0/etc/hadoop”}

2.2   Add following lines in cores-site.xml file to configure  journaling , default FS , temp directory & hdfs cluster. Within the <configuration> tag.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/core-site.xml













2.3  Add following lines in hdfs-site.xml file to configure  dfs nameservice , cluster , dfs high availability, zookeper & failover. Within the <configuration> tag.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/hdfs-site.xml























































2.4  Add datanodes in the slaves configuration file as shown below.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/slaves




Save & Exit!

2.5  Add the following lines for applying mapreduce settings, within the <configuration> tag.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/yarn-site.xml









Save & Exit!.

Step 3 : creating  folders for Hadoop cluster and set file permissions

3.1  Create folder structure for journalnode as defined in core-site.xml, repeat following step in all the cluster nodes  (mn1, mn2 & dn1)

[hduser@mn1~]$mkdir –p /home/hduser/journal/node/local/data

3.2  Create temp folder for hadoop cluster as defined in core-site.xml , repeat following step  in all the  cluster nodes (mn1,mn2 & dn1)

[hduser@mn1~]$mkdir /home/hduser/tmp

3.3  Create the folder structure for Zookeeper data and logs as defined in zoo.cfg , repeat following step in all the nodes in the cluster (mn1, mn2 & dn1)

[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/data/

[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/log/

already completed…

3.4   Copy hadoop source and zookeper .bahs_profile configured in mn1 node to mn2 and dn1

Compress using tar

[hduser@mn1~]$tar -zcvf hadoopmove.tgz 2.3.0 zookeeper-3.4.5 .bash_profile

Copy hadoopmove.tgz  to mn2 and dn1

[hduser@mn1~]$scp hadoopmove.tgz hduser@mn2

[hduser@mn1~]$scp hadoopmove.tgz hduser@dn1

Log in mn2 and dn1 extract hadoopmove.tgz

[hduser@mn2~]$tar –zxvf hadoopmove.tgz

[hduser@dn1~]$tar –zxvf hadoopmove.tgz

Step 4 : hdfs service and file system format

4.1  Start zookeeper service in all the nodes in cluster used for zookeeper , repeat below step in all the cluster nodes running zookper(mn1,mn2 & dn1).

[hduser@mn1~]$./ start

[hduser@mn2~]$./ start

[hduser@dn1~]$./ start

4.2  Format Zookeepr file system in mn1

[hduser@mn1~]$hdfs zkfc –formatZK

before format start journalnode in all the cluster nodes (mn1,mn2 & dn1)

$ start journalnode

4.3 Format namenode in mn1

[hduser@mn1~]$hdfs namenode –format

4.4  Copy meta data information in slave name node in our guide (mn2), run below command in


make sure that namenode service running in master node….

$ start namenode


[hduser@mn2~]$hdfs namenode –bootstrapStandby

start hadoop service

$cd /home/hduser/2.3.0/sbin


and start again.


run jps to check services running in mn1 , mn2 and dn1

hostname incorrectly configure in /etc/sysconfig/network and restarted all nodes changes to take effect…

[hduser@mn1 sbin]$ jps

1597 QuorumPeerMain

1990 JournalNode

1835 DataNode

2358 NodeManager

2256 ResourceManager

1743 NameNode

2570 Jps

2168 DFSZKFailoverController

[hduser@mn2 bin]$ jps

1925 DFSZKFailoverController

2035 NodeManager

1833 JournalNode

1667 NameNode

2075 Jps

1573 QuorumPeerMain

1743 DataNode

[hduser@dn1 bin]$ jps

1958 Jps

1595 QuorumPeerMain

1711 JournalNode

1655 DataNode

1840 NodeManager

Thank You.

