HBase cluster setup :
HBase is an open-source, distributed, versioned, column-oriented store modeled after Google 'Bigtable’.
This tutorial will describe how to setup and run Hbase cluster, with not too much explanation about hbase. There are a number of articles where the Hbase are described in details.
We will build hbase cluster using three Ubuntu machine in this tutorial.
A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to get to the running ZooKeeper cluster. HBase by default manages a ZooKeeper cluster for you, or you can manage it on your own and point HBase to it. In our case, we are using default ZooKeeper cluster, which is manage by Hbase
Following are the capacities in which nodes may act in our cluster:
1. Hbase Master:- The HbaseMaster is responsible for assigning regions to HbaseRegionserver, monitors the health of each HbaseRegionserver.
2. Zookeeper: - For any distributed application, ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
3. Hbase Regionserver:- The HbaseRegionserver is responsible for handling client read and write requests. It communicates with the Hbasemaster to get a list of regions to serve and to tell the master that it is alive.
In our case, one machine in the cluster is designated as Hbase master and Zookeeper. The rest of machine in the cluster act as a Regionserver.
Before we start:
Before we start configure HBase, you need to have a running Hadoop cluster, which will be the storage for hbase(Hbase store data in Hadoop Distributed File System). Please refere to Installing Hadoop in the cluster - A complete step by step tutorial post before continuing.
INSTALLING AND CONFIGURING HBASE MASTER
1. Download hbase-0.20.6.tar.gz from http://www.apache.org/dyn/closer.cgi/hbase/ and extract to some path in your computer. Now I am calling hbase installation root as $HBASE_INSTALL_DIR.
2. Edit the file /etc/hosts on the master machine and add the following lines.
192.168.41.53 hbase-master hadoop-namenode
#Hbase Master and Hadoop Namenode is configure on same machine
#Hbase Master and Hadoop Namenode is configure on same machine
192.168.41.67 hbase-regionserver1
192.168.41.67 hbase-regionserver2
Note: Run the command “ping hbase-master”. This command is run to check whether the hbase-master machine ip is being resolved to actual ip not localhost ip.
3. We have needed to configure password less login from hbase-master to all regionserver machines.
2.1. Execute the following commands on hbase-master machine.
$ssh-keygen -t rsa
$scp .ssh/id_rsa.pub ilab@hbase-regionserver1:~ilab/.ssh/authorized_keys
$scp .ssh/id_rsa.pub ilab@hbase-regionserver2:~ilab/.ssh/authorized_keys
4. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and set the $JAVA_HOME.
export JAVA_HOME=/user/lib/jvm/java-6-sun
Note: If you are using open jdk , then give the path of open jdk.
5. Open the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>hbase-master:60000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver
in a single process.
in a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-namenode:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed
Zookeeper true: fully-distributed with unmanaged Zookeeper
Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hbase-master</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example,
"host1.mydomain.com,host2.mydomain.com".
"host1.mydomain.com,host2.mydomain.com".
By default this is set to localhost for local and
pseudo-distributed modes of operation. For a
fully-distributed setup, this should be set to a full
pseudo-distributed modes of operation. For a
fully-distributed setup, this should be set to a full
list of ZooKeeper quorum servers. If
HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop
ZooKeeper on.
HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop
ZooKeeper on.
</description>
</property>
</configuration>
Note:-
In our case, Zookeeper and hbase master both are running in same machine.
6. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and uncomment the following line:
export HBASE_MANAGES_ZK=true
7. Open the file $HBASE_INSTALL_DIR/conf/regionservers and add all the regionserver machine names.
hbase-regionserver1
hbase-regionserver2
hbase-master
Note: Add hbase-master machine name only if you are running a regionserver on hbase-master machine.
INSTALLING AND CONFIGURING HBASE REGIONSERVER
1. Download hbase-0.20.6.tar.gz from http://www.apache.org/dyn/closer.cgi/hbase/ and extract to some path in your computer. Now I am calling hbase installation root as $HBASE_INSTALL_DIR.
2. Edit the file /etc/hosts on the hbase-regionserver machine and add the following lines.
192.168.41.53 hbase-master hadoop-namenode
Note: In my case, Hbase-master and hadoop-namenode are running on same machine.
Note: Run the command “ping hbase-master”. This command is run to check whether the hbase-master machine ip is being resolved to actual ip not localhost ip.
3.We have needed to configure password less login from hbase-regionserver to hbase-master machine.
2.1. Execute the following commands on hbase-server machine.
$ssh-keygen -t rsa
$scp .ssh/id_rsa.pub ilab@hbase-master:~ilab/.ssh/authorized_keys2
4. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and set the $JAVA_HOME.
export JAVA_HOME=/user/lib/jvm/java-6-sun
Note: If you are using open jdk , then give the path of open jdk.
5. Open the file $HBASE_INSTALL_DIR/conf/hbase-site.xml and add the following properties.
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.master</name>
<value>hbase-master:60000</value>
<description>The host and port that the HBase master runs at.
A value of 'local' runs the master and a regionserver
in a single process.
in a single process.
</description>
</property>
<property>
<name>hbase.rootdir</name>
<value>hdfs://hadoop-namenode:9000/hbase</value>
<description>The directory shared by region servers.</description>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
<description>The mode the cluster will be in. Possible values are
false: standalone and pseudo-distributed setups with managed
Zookeeper true: fully-distributed with unmanaged Zookeeper
Quorum (see hbase-env.sh)
</description>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2222</value>
<description>Property from ZooKeeper's config zoo.cfg.
The port at which the clients will connect.
</description>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>hbase-master</value>
<description>Comma separated list of servers in the ZooKeeper Quorum.
For example, "host1.mydomain.com,host2.mydomain.com".
By default this is set to localhost for local and
pseudo-distributed modes of operation. For a fully-distributed
setup, this should be set to a ful list of ZooKeeper quorum
servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
pseudo-distributed modes of operation. For a fully-distributed
setup, this should be set to a ful list of ZooKeeper quorum
servers. If HBASE_MANAGES_ZK is set in hbase-env.sh
this is the list of servers which we will start/stop ZooKeeper on.
</description>
</property>
</configuration>
6. Open the file $HBASE_INSTALL_DIR/conf/hbase-env.sh and uncomment the following line:
export HBASE_MANAGES_ZK=true
Note:-
Above steps is required on all the datanode in the hadoop cluster.
START AND STOP HBASE CLUSTER
1. Starting the Hbase Cluster:-
we have need to start the daemons only on the hbase-master machine, it will start the daemons in all regionserver machines. Execute the following command to start the hbase cluster.
$HBASE_INSTALL_DIR/bin/start-hbase.sh
Note:-
At this point, the following Java processes should run on hbase-master machine.
ilab@hbase-master:$jps
14143 Jps
14007 HQuorumPeer
14066 HMaster
and the following java processes should run on hbase-regionserver machine.
23026 HRegionServer
23171 Jps
2. Starting the hbase shell:-
$HBASE_INSTALL_DIR/bin/hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Version: 0.20.6, r965666, Mon Jul 19 16:54:48 PDT 2010
hbase(main):001:0>
Now,create table in hbase.
hbase(main):001:0>create 't1','f1'
0 row(s) in 1.2910 seconds
hbase(main):002:0>
Note: - If table is created successfully, then everything is running fine.
3. Stoping the Hbase Cluster:-
Execute the following command on hbase-master machine to stop the hbase cluster.
$HBASE_INSTALL_DIR/bin/stop-hbase.sh