Ankit Jain's blog: Installation of hadoop in the cluster

Tuesday, January 4, 2011

Installation of hadoop in the cluster - A complete step by step tutorial

Hadoop Cluster Setup:

Hadoop is a fault-tolerant distributed system for data storage which is highly scalable.
Hadoop has two important parts:-

1. Hadoop Distributed File System(HDFS):-A distributed file system that provides high throughput access to application data.

2. MapReduce:-A software framework for distributed processing of large data sets on compute clusters.

In this tutorial, I will describe how to setup and run Hadoop cluster. We will build Hadoop cluster using three Ubuntu machine in this tutorial.

Following are the capacities in which nodes may act in our cluster:-

1. NameNode:-Manages the namespace, file system metadata, and access control. There is exactly one NameNode in each cluster.

2. SecondaryNameNode:-Downloads periodic checkpoints from the nameNode for fault-tolerance. There is exactly one SecondaryNameNode in each cluster.

3. JobTracker: - Hands out tasks to the slave nodes. There is exactly one JobTracker in each cluster.

4. DataNode: -Holds file system data. Each data node manages its own locally-attached storage (i.e., the node's hard disk) and stores a copy of some or all blocks in the file system. There are one or more DataNodes in each cluster.

5. TaskTracker: - Slaves that carry out map and reduce tasks. There are one or more TaskTrackers in each cluster.

In our case, one machine in the cluster is designated as namenode, Secondarynamenode and jobTracker.This is the master. The rest of machine in the cluster act as both Datanode and TaskTracker. They are slaves.

Below diagram show, how the Hadoop cluster will look after Installation:-

Fig: After Installation, Hadoop cluster will look like.

Installation, configuring and running of hadoop cluster is done in three steps:

1. Installing and configuring hadoop namenode.

2. Installing and configuring hadoop datanodes.

3. Start and stop hadoop cluster.

INSTALLING AND CONFIGURING HADOOP NAMENODE

1. Download hadoop-0.20.2.tar.gz from http://www.apache.org/dyn/closer.cgi/hadoop/core/ and extract to some path in your computer. Now I am calling hadoop installation root as $HADOOP_INSTALL_DIR.

2. Edit the file /etc/hosts on the namenode machine and add the following lines.

192.168.41.53 hadoop-namenode

192.168.41.87 hadoop-datanode1

192.168.41.67 hadoop-datanode2

Note: Run the command “ping hadoop-namenode”. This command is run to check whether the namenode machine ip is being resolved to actual ip not localhost ip.

3. We have needed to configure password less login from namenode to all datanode machines.

2.1. Execute the following commands on namenode machine.

$ssh-keygen -t rsa

$scp .ssh/id_rsa.pub ilab@192.168.41.87:~ilab/.ssh/authorized_keys

$scp .ssh/id_rsa.pub ilab@192.168.41.67:~ilab/.ssh/authorized_keys

4. Open the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and set the $JAVA_HOME.

export JAVA_HOME=/path/to/javaeg : export JAVA_HOME=/user/lib/jvm/java-6-sun

Note: If you are using open jdk , then give the path of that open jdk.

5. Go to $HADOOP_INSTALL_DIR and create new directory hadoop-datastore. This directory is creating to store metadata information.

6. Open the file $HADOOP_INSTALL_DIR/conf/core-site.xml and add the following properties. This file is edit to configure the namenode to store information like port number and metadata directories. Add the properties in the format below:

<name>fs.default.name</name>

<value>hdfs://hadoop-namenode:9000</value>

<description>This is the namenode uri</description>

</property>

<name>hadoop.tmp.dir</name>

<value>$HADOOP_INSTALL_DIR/hadoop-0.20.2/hadoop-datastore

</value>

<description>A base for other temporary directories.</description>

</property>

7. Open the file $HADOOP_INSTALL_DIR/conf/hdfs-site.xml and add the following properties. This file is edit to configure the replication factor of the hadoop setup. Add the properties in the format below:

<name>dfs.replication</name>

<description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.

</description>

</property>

8. Open the file $HADOOP_INSTALL_DIR/conf/mapred-site.xml and add the following properties. This file is edit to configure the host and port of the MapReduce job tracker in thenamenode of the hadoop setup. Add the properties in the format below:

<name>mapred.job.tracker</name>

<value>hadoop-namenode:9001</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map and reduce

task.

</description>

</property>

9. Open the file $HADOOP_INSTALL_DIR/conf/masters and add the machine names where a secondary namenodes will run. This file is edit to configure the Hadoop Secondary Namenode

hadoop-namenode.

Note: In my case, both primary namenode and Secondary namenode are running on same machine. So, I have added hadoop-namenode in $HADOOP_INSTALL_DIR/conf/masters file.

10. Open the file $HADOOP_INSTALL_DIR/conf/slaves and add all the datanodes machine names:-

hadoop-namenode

/* in case you want the namenode to also store data(i.e namenode also behave like a datanode) this can be mentioned in the slaves file.*/

hadoop-datanode1

hadoop-datanode2

INSTALLING AND CONFIGURING HADOOP DATANODE

1. Download hadoop-0.20.2.tar.gz from http://www.apache.org/dyn/closer.cgi/hadoop/core/ and extract to some path in your computer. Now I am calling hadoop installation root as $HADOOP_INSTALL_DIR.

2. Edit the file /etc/hosts on the datanode machine and add the following lines.

192.168.41.53 hadoop-namenode

192.168.41.87 hadoop-datanode1

192.168.41.67 hadoop-datanode2

Note: Run the command “ping hadoop-namenode”. This command is run to check whether the namenode machine ip is being resolved to actual ip not localhost ip.

3. We have needed to configure password less login from all datanode machines to namenode machine.

3.1. Execute the following commands on datanode machine.

$ssh-keygen -t rsa

$scp .ssh/id_rsa.pub ilab@192.168.41.53:~ilab/.ssh/authorized_keys2

4. Open the file $HADOOP_INSTALL_DIR/conf/hadoop-env.sh and set the $JAVA_HOME.

export JAVA_HOME=/path/to/java
eg : export JAVA_HOME=/user/lib/jvm/java-6-sun

Note: If you are using open jdk , then give the path of that open jdk.

5. Go to $HADOOP_INSTALL_DIR and create new directory hadoop-datastore. This directory is creating to store metadata information.

6. Open the file $HADOOP_INSTALL_DIR/conf/core-site.xml and add the following properties. This file is edit to configure the datanode to determine the host, port, etc. for a filesystem. Add the properties in the format below:

<name>fs.default.name</name>

<value>hdfs://hadoop-namenode:9000</value>

<description>This is the namenode uri</description>

</property>

<name>hadoop.tmp.dir</name>

<value>$HADOOP_INSTALL_DIR/hadoop-0.20.2/hadoop-datastore

</value>

<description>A base for other temporary directories.</description>

</property>

<name>dfs.replication</name>

<description>Default block replication.

The actual number of replications can be specified when the file

is created. The default is used if replication is not specified in

create time.

</description>

</property>

8. Open the file $HADOOP_INSTALL_DIR/conf/mapred-site.xml and add the following properties. This file is edit to identify the host and port at which MapReduce job tracker runs in the namenode of the hadoop setup. Add the properties in the format below

<name>mapred.job.tracker</name>

<value>hadoop-namenode:9001</value>

<description>The host and port that the MapReduce job tracker runs

at. If "local", then jobs are run in-process as a single map and reduce

task.

</description>

</property>

Note:-Step 9 and 10 are not mandatory.

9. Open $HADOOP_INSTALL_DIR/conf/masters and add the machine names where a secondary namenodes will run.

hadoop-namenode

Note: In my case, both primary namenode and Secondary namenode are running on same machine. So, I have added hadoop-namenode in $HADOOP_INSTALL_DIR/conf/masters file.

10. open $HADOOP_INSTALL_DIR/conf/slaves and add all the datanodes machine names

hadoop-namenode /* In case you want the namenode to also store data(i.e namenode also behave like datanode) this can be mentioned in the slaves file.*/

hadoop-datanode1

hadoop-datanode2

Note:-

Above steps is required on all the datanode in the hadoop cluster.

START AND STOP HADOOP CLUSTER

1. Formatting the namenode:-

Before we start our new Hadoop cluster, we have to format Hadoop’s distributed filesystem (HDFS) for the namenode. We have needed to do this the first time when we start our Hadoop cluster. Do not format a running Hadoop namenode, this will cause all your data in the HDFS filesytem to be lost.

Execute the following command on namenode machine to format the file system.

$HADOOP_INSTALL_DIR/bin/hadoop namenode -format

2. Starting the Hadoop cluster:-

Starting the cluster is done in two steps.

2.1 Start HDFS daemons:-

Execute the following command on namenode machine to start HDFS daemons.

$HADOOP_INSTALL_DIR/bin/start-dfs.sh

Note:-

At this point, the following Java processes should run on namenode

machine.

ilab@hadoop-namenode:$jps // (the process IDs don’t matter of course.)

14799 NameNode

15314 Jps

14977 SecondaryNameNode

ilab@hadoop-namenode:$

and the following java procsses should run on datanode machine.

ilab@hadoop-datanode1:$jps //(the process IDs don’t matter of course.)

15183 DataNode

15616 Jps

ilab@hadoop-datanode1:$

2.2 Start MapReduce daemons:-

Execute the following command on the machine you want the jobtracker to run

on.

$HADOOP_INSTALL_DIR/bin/start-mapred.sh

//In our case, we will run bin/start-mapred.sh on namenode machine:

Note:-

At this point, the following Java processes should run on namenode machine.

ilab@hadoop-namenode:$jps // (the process IDs don’t matter of course.)

14799 NameNode

15314 Jps

14977 SecondaryNameNode

15596 JobTracker

ilab@hadoop-namenode:$

and the following java procsses should run on datanode machine.

ilab@hadoop-datanode1:$jps //(the process IDs don’t matter of course.)

15183 DataNode

15616 Jps

15897 TaskTracker

ilab@hadoop-datanode1:$

3. Stopping the Hadoop cluster:-

Like starting the cluster, stopping it is done in two steps.

3.1 Stop MapReduce daemons:-

Run the command /bin/stop-mapred.sh on the jobtracker machine. In our case, we will run bin/stop-mapred.sh on namenode:

3.2 Stop HDFS daemons:-

Run the command /bin/stop-dfs.sh on the namenode machine.

230 comments:

«Oldest ‹Older 201 – 230 of 230

HOME1 said...: Search Chennai real estate, Chennai property, Chennai Home For Sale, Chennai Land for Sale, property in Chennai, real estate in Chennai. Buy Commercial or Industrial Properties in Chennai.
chennai
visit here
best-villa-projects-in-coimbatore; September 5, 2021 at 6:54 PM
Oliver said...: What You Can Do With Your Vt Market Login?; September 24, 2021 at 5:17 AM
Elena James said...: The Original Forex Trading System: tradeatf Is The Original Forex Trading System. It Is 100% Automated And Provides An Easy-to-follow Trading System. You Get Access To Real-time Signals, Proven Methods, And A Money-back Guarantee.; September 29, 2021 at 5:16 AM
UNIQUE ACADEMY said...: hi thanku so much this infromation

cs executive
freecseetvideolectures/; October 9, 2021 at 2:12 AM
komala said...: Great Content. It will be very much useful.
Matlab Training In Chennai
Matlab Course In Chennai; October 21, 2021 at 5:21 AM
sindokht said...: برای خرید دودو می توانید به صورت اینترنتی اقدام کنید.; November 7, 2021 at 2:40 AM
best said...: offers amazon is online store offering most popular Mobile phones, Cameras, Electronic Gadgets, Home Appliances, Apparels, Helmets, etc
online shopping; November 12, 2021 at 7:09 PM
Juliana petar said...: If You Are Looking For A Reliable Fx Broker, Don't Rush And Read This AVATRADE REVIEW Review First. This Is A Serious Warning Against The Broker's Illegal Activities.; November 17, 2021 at 11:49 PM
Unknown said...: Unsweetened Beauty

However, in the beauty industry does “Organic Skincare Products” actually mean organic? We are sure that you might have wondered about this earlier. Let’s dive in to find out whether “Organic” means organic or is this another marketing tactic.

best skincare products; November 24, 2021 at 3:54 AM
Reshma said...: Amazing post.Thanks for sharing.........
DevOps Training in Bangalore
Devops Training in Pune; November 29, 2021 at 3:22 AM
Peter Schiff said...: Here Is A Review Of AVATRADE REVIEW . We Look At The History Of The Broker, Their Website, Platform, Trading Conditions, And Bonuses.; November 30, 2021 at 10:11 PM
Niyaz said...: Nice Blog!!! thanks for sharing this post with us.
Scope of Digital Marketing
Scope of Digital Marketing in India; February 4, 2022 at 2:19 AM
George Mark said...: I read blogs on a similar topic, but I never visited your blog. I added it to favorites and I’ll be your constant reader. Coco Miguel Hoodie; March 9, 2022 at 9:47 PM
milka said...: Great post. keep sharing such a worthy information.
Best Java Training Institute In Chennai; September 5, 2022 at 2:25 AM
Anonymous said...: Best snake catches person murliwale hausla in India 🇮🇳; October 16, 2022 at 9:44 AM
Anonymous said...: Best snake catches person murliwale hausla in India; October 16, 2022 at 9:47 AM
Anonymous said...: Best snake catches person murliwale hausla in India 👇
murliwale hausla; October 16, 2022 at 9:49 AM
Anonymous said...: It's something that I've been struggling to do lately, so it was great to read a different perspective on the topic. Thanks for writing this post!
CCNA training in Pune; April 24, 2023 at 11:13 PM
Joy the Baker. said...: Purchasing YouTube subscribers instantly is a game-changing strategy for creators looking to enhance their channel's visibility and appeal. It accelerates growth by immediately boosting subscriber counts, making the channel more attractive to new viewers and inviting further organic engagement. This service is designed to be accessible, with a variety of budget-friendly packages to choose from, ensuring it fits the needs of all creators, from beginners to the more experienced. The process is straightforward and secure, allowing creators to focus on what they do best: creating engaging and captivating content. An immediate increase in subscribers not only enhances the channel's credibility but also acts as a magnet for more viewers and potential collaborations. It sets a strong foundation for sustainable growth, driving more organic interactions and building a vibrant community around the channel's content. This approach is essential for content creators aiming to stand out in the competitive digital space of YouTube.
https://www.buyyoutubesubscribers.in/; March 12, 2024 at 1:38 AM
Documentation in Python code is crucial for ensuring readability, maintainability, and collaboration within a project. Here are some best practices for effective Python code documentation@ www.nearlea said...: I’ve checked out NearLearn, and I really like how they focus on hands-on projects rather than just theory. It’s a great way to learn real-world skills. Please visit our website https://nearlearn.com/blog/top-10-python-training-institutes-in-bangalore/; August 21, 2025 at 11:09 PM
sravanthi said...: I really like and appreciate your post. Really thank you! Fantastic.

Best SQL & PLSQL Online Training
SharePoint Certification Online Training from Hyderabad
ServiceNow Admin Training Course Online
Best MSBI Online Certification Training India
Full Stack JAVA Developer Online Training; November 5, 2025 at 9:09 PM
sravanthi said...: Very informative blog post. Thanks Again. Great.
Devops CI CD Online Training from Chennai
Generative AI Online Coaching In Australia
Microsoft 365 EndPoint Administrator Certification Training In India
APIGEE Certification Training In India
BDD Specflow Training Institute In Delhi
AWS Certified Data Engineer Associate Online Coaching from Canada
Rest API Automation Training Institute In Hyderabad
SAP Analytics Cloud Online Coaching In Australia
NetSuite Certification Training In India; November 6, 2025 at 11:26 PM
ONLINE IT GURU said...: "Unlock your data potential with the best tableau training designed for beginners and professionals alike. Gain hands-on experience and master interactive dashboards to drive smarter business decisions."; January 13, 2026 at 8:55 PM
ONLINE IT GURU said...: "Enhance your data analytics skills with expert microsoft power bi training designed for beginners and professionals alike."; January 13, 2026 at 8:57 PM
ONLINE IT GURU said...: The sales force admin course equips learners with the essential skills to manage and customize Salesforce platforms, focusing on user management, automation, and reporting.; January 16, 2026 at 12:56 AM
ONLINE IT GURU said...: salesforce development training equips individuals with the skills needed to design, customize, and deploy applications on the Salesforce platform. This training covers Apex programming, Visualforce, Lightning components, and other key tools to boost career growth in cloud-based solutions.; January 16, 2026 at 12:58 AM
anjani02 said...: ⭐ Free Workday Training
Start your journey with free workday training resources for beginners.
You get a basic understanding of Workday navigation and modules.
Practical examples make learning easier to grasp.
This helps you explore Workday before moving to advanced topics.
Guidance from experts adds extra value to your learning.
Recorded content allows flexible learning.
It is a great first step into Workday technology.; February 13, 2026 at 11:17 AM
vr said...: Excellent read! Our java training online
provide practical coding exercises and project-based learning for job-ready skills.; February 15, 2026 at 2:27 AM
vr said...: Nicely explained! ui courses online
fundamentals create strong foundations for design careers.; February 16, 2026 at 7:04 AM
Nikhil said...: Best iOS development course delivers structured professional training. It emphasizes real-world app building. This best ios development course enhances job readiness. It supports guided sessions. Students gain strong technical skills. It is valuable.; February 17, 2026 at 6:10 AM

«Oldest ‹Older 201 – 230 of 230 Newer› Newest»

Tuesday, January 4, 2011

Installation of hadoop in the cluster - A complete step by step tutorial

230 comments:

Post a Comment