Configuring Apache Hadoop Cluster & High Availability – Chapter 2

Steps :

Step1: download and configure Zookeeper

Step2: Hadoop configuration and high availability settings

Step3: creating  folders for Hadoop cluster and  file permissions

Step4: hdfs service and file system format

Let us see the Steps in Details :

Step 1 : download and configure Zookeeper

1.1  Download and configure Zookeeper software package from.( https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz)

[hduser@mn1~]$wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz

Extract source

[hduser@mn1~]$tar –zxvf zookeeper-3.4.5.tar.gz

1.2  Zookeeper related configuration files are located

Configuration files     : /home/hduser/zookeeper-3.4.5/conf

Binary executables      : /home/hduser/zookeeper-3.4.5/bin

The Main configuration file

/home/hduser/zookeeper-3.4.5/conf/zoo.cfg

cp -rp zoo_sample.cfg zoo.cfg

Modifying zoo.cfg as per our installation guide

[hduser@mn1~]$vi /home/hduser/zookeeper-3.4.5/conf/zoo.cfg

tickTime=2000

clientPort=2181

initLimit=5

syncLimit=2

dataDir=/home/hduser/zookeeper/data/

dataLogDir=/home/hduser/zookeeper/log/

server.1=mn1:2888:3888

server.2=mn2:2889:3889

server.3=dn1:2890:3890

Save & Exit!

Note :-  Each of the servers hosted in the same physical machine as virtual instance , every server port number has changed to mn1:2888:3888 , mn2: 2889:3889 & dn1:2890:3890

Create the myid file in /home/hduser/zookeeper/data/ and assign the value of each of the nodes in cluster. (mn1=1,mn2=2 & dn1=3)

create directory for data and log refer step3

[hduser@mn1~]$vi /home/hduser/zookeeper/data/myid

1

Save and Exit!

[hduser@mn2~]$vi /home/hduser/zookeeper/data/myid

2

Save & Exit!

[hduser@dn1~]$vi /home/hduser/zookeeper/data/myid

3

Save & Exit!

Step 2 : Hadoop configuration and high availability settings

2.1  Add / modify , following lines in hadoop-env.sh file to apply environment variable settings.

[hduser@mn1~]$ vi /home/hduser/2.3.0/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_45/

export HADOOP_COMMON_LIB_NATIVE_DIR=/home/hduser/2.3.0/lib/native/

export HADOOP_OPTS=”-Djava.library.path=/home/hduser/2.3.0/lib/native/”

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-”/home/hduser/2.3.0/etc/hadoop”}

2.2   Add following lines in cores-site.xml file to configure  journaling , default FS , temp directory & hdfs cluster. Within the <configuration> tag.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/core-site.xml

<property>

<name>fs.defaultFS</name>

<value>hdfs://mycluster</value>

</property>

<property>

<name>dfs.journalnode.edits.dir</name>

<value>/home/hduser/journal/node/local/data</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hduser/tmp</value>

</property>

2.3  Add following lines in hdfs-site.xml file to configure  dfs nameservice , cluster , dfs high availability, zookeper & failover. Within the <configuration> tag.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/hdfs-site.xml

<property>

<name>dfs.nameservices</name>

<value>mycluster</value>

<final>true</final>

</property>

<property>

<name>dfs.ha.namenodes.mycluster</name>

<value>mn1,mn2</value>

<final>true</final>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.mn1</name>

<value>mn1:8020</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.mn2</name>

<value>mn2:8020</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.mn1</name>

<value>mn1:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.mn2</name>

<value>mn2:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://mn1:8485;dn1:8485;mn2:8485/mycluster</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value>mn1:2181,mn2:2181,dn1:2181</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/hduser/.ssh/id_rsa</value>

</property>

<property>

<name>dfs.replication</name>

<value>3</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.connect-timeout</name>

<value>3000</value>

</property>

2.4  Add datanodes in the slaves configuration file as shown below.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/slaves

mn1

mn2

dn1

Save & Exit!

2.5  Add the following lines for applying mapreduce settings, within the <configuration> tag.

[hduser@mn1~]$vi /home/hduser/2.3.0/etc/hadoop/yarn-site.xml

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

Save & Exit!.

Step 3 : creating  folders for Hadoop cluster and set file permissions

3.1  Create folder structure for journalnode as defined in core-site.xml, repeat following step in all the cluster nodes  (mn1, mn2 & dn1)

[hduser@mn1~]$mkdir –p /home/hduser/journal/node/local/data

3.2  Create temp folder for hadoop cluster as defined in core-site.xml , repeat following step  in all the  cluster nodes (mn1,mn2 & dn1)

[hduser@mn1~]$mkdir /home/hduser/tmp

3.3  Create the folder structure for Zookeeper data and logs as defined in zoo.cfg , repeat following step in all the nodes in the cluster (mn1, mn2 & dn1)

[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/data/

[hduser@mn1~]$mkdir –p /home/hduser/zookeeper/log/

already completed…

3.4   Copy hadoop source and zookeper .bahs_profile configured in mn1 node to mn2 and dn1

Compress using tar

[hduser@mn1~]$tar -zcvf hadoopmove.tgz 2.3.0 zookeeper-3.4.5 .bash_profile

Copy hadoopmove.tgz  to mn2 and dn1

[hduser@mn1~]$scp hadoopmove.tgz hduser@mn2

[hduser@mn1~]$scp hadoopmove.tgz hduser@dn1

Log in mn2 and dn1 extract hadoopmove.tgz

[hduser@mn2~]$tar –zxvf hadoopmove.tgz

[hduser@dn1~]$tar –zxvf hadoopmove.tgz

Step 4 : hdfs service and file system format

4.1  Start zookeeper service in all the nodes in cluster used for zookeeper , repeat below step in all the cluster nodes running zookper(mn1,mn2 & dn1).

[hduser@mn1~]$./zkServer.sh start

[hduser@mn2~]$./zkServer.sh start

[hduser@dn1~]$./zkServer.sh start

4.2  Format Zookeepr file system in mn1

[hduser@mn1~]$hdfs zkfc –formatZK

before format start journalnode in all the cluster nodes (mn1,mn2 & dn1)

$hadoop-daemon.sh start journalnode

4.3 Format namenode in mn1

[hduser@mn1~]$hdfs namenode –format

4.4  Copy meta data information in slave name node in our guide (mn2), run below command in

mn2(slave).

make sure that namenode service running in master node….

$hadoop-daemon.sh start namenode

next…

[hduser@mn2~]$hdfs namenode –bootstrapStandby

start hadoop service

$cd /home/hduser/2.3.0/sbin

./stop-all.sh

and start again.

./start-all.sh

run jps to check services running in mn1 , mn2 and dn1

hostname incorrectly configure in /etc/sysconfig/network and restarted all nodes changes to take effect…

[hduser@mn1 sbin]$ jps

1597 QuorumPeerMain

1990 JournalNode

1835 DataNode

2358 NodeManager

2256 ResourceManager

1743 NameNode

2570 Jps

2168 DFSZKFailoverController

[hduser@mn2 bin]$ jps

1925 DFSZKFailoverController

2035 NodeManager

1833 JournalNode

1667 NameNode

2075 Jps

1573 QuorumPeerMain

1743 DataNode

[hduser@dn1 bin]$ jps

1958 Jps

1595 QuorumPeerMain

1711 JournalNode

1655 DataNode

1840 NodeManager

Thank You.

For more details you can watch video and also subscribe for more Videos :

Both comments and pings are currently closed.

Comments are closed.

Copyright ©Solutions@Experts.com
Copyright © NewWpThemes Techmark Solutions - www.techmarksolutions.co.uk