Enterprise Java

Setting up Apache Hadoop Multi – Node Cluster

We are sharing our experience about Apache Hadoop Installation in Linux based machines (Multi-node). Here we will also share our experience about different troubleshooting also and make update in future.

User creation and other configurations step –

  • We start by adding a dedicated Hadoop system user in each cluster.

 
 
 

$ sudo addgroup hadoop
$ sudo adduser –ingroup hadoop hduser
  • Next we configure the SSH (Secure Shell) on all the cluster to enable secure data communication.
user@node1:~$ su – hduser
hduser@node1:~$ ssh-keygen -t rsa -P “”

The output will be something like the following:

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
.....
  • Next we need to enable SSH access to local machine with this newly created key:
hduser@node1:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

Repeat the above steps in all the cluster nodes and test by executing the following statement

hduser@node1:~$ ssh localhost

This step is also needed to save local machine’s host key fingerprint to the hduser user’s known_hosts file.

Next we need to edit the /etc/hosts file in which we put the IPs and Name of each system in the cluster.

In our scenario we have one master (with IP 192.168.0.100) and one slave (with IP 192.168.0.101)

$ sudo vi /etc/hosts

and we put the values into the host file as key value pair.

 
192.168.0.100 master
192.168.0.101 slave
  • Providing the SSH Access

The hduser user on the master node must be able to connect

    1. to its own user account on the master via ssh master in this context not necessarily ssh localhost.
    2. to the hduser account of the slave(s) via a password-less SSH login.

So we distribute the SSH public key of hduser@master to all its slave, (in our case we have only one slave. If you have more execute the following statement changing the machine name i.e. slave, slave1, slave2).

hduser@master:~$ ssh-copy-id -i $HOME/.ssh/id_rsa.pub hduser@slave

Try by connecting master to master and master to slave(s) and check if everything is fine.

Configuring Hadoop

  • Let us edit the conf/masters (only in the masters node)

and we enter master into the file.

Doing this we have told Hadoop that start Namenode and secondary NameNodes in our multi-node cluster in this machine.

The primary NameNode and the JobTracker will always be on the machine we run bin/start-dfs.sh and bin/start-mapred.sh.

  • Let us now edit the conf/slaves(only in the masters node) with
master
slave

This means that, we try to run datanode process on master machine also – where the namenode is also running. We can leave master to act as slave if we have more machines as datanode at our disposal.

if we have more slaves, then to add one host per line like the following:

master
slave
slave2
slave3

etc….

Lets now edit two important files (in all the nodes in our cluster):

  1. conf/core-site.xml
  2. conf/core-hdfs.xml

1) conf/core-site.xml

We have to change the fs.default.parameter which specifies NameNode host and port. (In our case this is the master machine)

<property>

<name>fs.default.name</name>
<value>hdfs://master:54310</value>

…..[Other XML Values]

</property>

Create a directory into which Hadoop will store its data –

$ mkdir /app/hadoop

We have to ensure the directory is writeable by any user:

$ chmod 777 /app/hadoop

Modify core-site.xml once again to add the following property:

<property>
<name>hadoop.tmp.dir</name>
<value>/app/hadoop</value>
</property>

2) conf/core-hdfs.xml

We have to change the dfs.replication parameter which specifies default block replication. It defines how many machines a single file should be replicated to before it becomes available. If we set this to a value higher than the number of available slave nodes (more precisely, the number of DataNodes), we will start seeing a lot of “(Zero targets found, forbidden1.size=1)” type errors in the log files.

The default value of dfs.replication is 3. However, as we have only two nodes available (in our scenario), so we set dfs.replication to 2.

<property>
<name>dfs.replication</name>
<value>2</value>
…..[Other XML Values]
</property>
  • Let us format the HDFS File System via NameNode.

Run the following command at master

bin/hadoop namenode -format
  • Let us start the multi node cluster:

Run the command: (in our case we will run on the machine named as master)

bin/start-dfs.sh

Checking of Hadoop Status –

After everything has started run the jps command on all the nodes to see everything is running well or not.

In master node the desired output will be  –

$ jps

14799 NameNode
15314 Jps
14880 DataNode
14977 SecondaryNameNode

In Slave(s):

$ jps
15314 Jps
14880 DataNode

Ofcourse the Process IDs will vary from machine to machine.

Troubleshooting

It might be possible that Datanode might not get started in all our nodes. At this point if we see the

logs/hadoop-hduser-datanode-.log 

on the effected nodes with the exception –

java.io.IOException: Incompatible namespaceIDs

In this case we need to do the following –

  1. Stop the full cluster, i.e. both MapReduce and HDFS layers.
  2. Delete the data directory on the problematic DataNode: the directory is specified by dfs.data.dir in conf/hdfs-site.xml. In our case, the relevant directory is /app/hadoop/tmp/dfs/data
  3. Reformat the NameNode. All HDFS data will be lost during the format perocess.
  4. Restart the cluster.

Or

We can manually update the namespaceID of problematic DataNodes:

  1. Stop the problematic DataNode(s).
  2. Edit the value of namespaceID in ${dfs.data.dir}/current/VERSION to match the corresponding value of the current NameNode in ${dfs.name.dir}/current/VERSION.
  3. Restart the fixed DataNode(s).

In Running Map-Reduce Job in Apache Hadoop (Multinode Cluster), we will share our experience about Map Reduce Job Running as per apache hadoop example.

Resources

 

Piyas De

Piyas is Sun Microsystems certified Enterprise Architect with 10+ years of professional IT experience in various areas such as Architecture Definition, Define Enterprise Application, Client-server/e-business solutions.Currently he is engaged in providing solutions for digital asset management in media companies.He is also founder and main author of "Technical Blogs(Blog about small technical Know hows)" Hyperlink - http://www.phloxblog.in
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

7 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
mohammad
mohammad
11 years ago

hi & thanks for this post .
Is it possible for you to do these steps on windows-based system ?
(i have this error : Cannot connect to the Map/Reduce location java.net.ConnectException … in eclipse -based on http://ebiquity.umbc.edu/Tutorials/Hadoop/00%20-%20Intro.html )

thanks.

Shiv
Shiv
10 years ago

Sir at step
Configuring Hadoop

Let us edit the conf/masters (only in the masters node)

and we enter master into the file.

where is conf/masters file located in package or we have to create a file with what extension and how we have to write in it…
please clarifiy sir

ddba
ddba
8 years ago
Reply to  Shiv

Hello! Did you find the solution? Where is the folder? What I did wrong?

Hafiz Muhammad Shafiq
Hafiz Muhammad Shafiq
10 years ago

Good tutorial but without example
Also I have to face the issue disscussed in Troubleshooting part above. I solved it by changing the port numbers on slaves and master ( It was not mentioned above)

thelinuxfaq
thelinuxfaq
10 years ago

Shall i use the ssh login without password

http://www.linuxproblem.org/art_9.html

pooja
pooja
9 years ago

Nice and crisp article, I will try the multi node cluster. Single node cluster is beautifully explained in the following article.

http://mainframewizard.com/content/setup-single-node-hadoop-cluster

Robetro
Robetro
9 years ago

Hi:

If i want to have 1 namenode and 1 secondary namenode in diferent machine, where i type it???

Back to top button