Friday, May 6, 2011

Installing Flume in the pseudo mode - A complete step by step tutorial




Flume is a distributed, reliable, and available service for efficiently moving large amounts of data soon after the data is produced.

The primary use case for Flume is as a logging system that gathers a set of log files on every machine in a cluster and aggregates them to a centralized persistent store such as the Hadoop Distributed File System (HDFS).

Installation in pseudo-distributed mode:-


In pseudo-distributed mode, several processes of flume are run on single machine.There are two kinds of processes in the system:
1. Flume Master: - The Flume Master is the central management point, controls the Flume node data flows and monitors Flume nodes.
2. Flume Node: - The Flume nodes are divided into two categories:-
2.1 Flume Agent: - The agent Flume nodes are co-located on machines with the service that is producing logs.
2.2 Flume collector: - The collector listens for data from multiple agents, aggregates logs, and then eventually write the data to HDFS. 

Fig: - Flume processes and there configuration.

Before we start:-

Before we start configure flume, you need to have a running Hadoop cluster, which will be the centralize storage for flume. Please refere to Installing Hadoop in the cluster - A complete step by step tutorial post before continuing.


Installation steps:-

1. Download flume-0.9.1.tar.gz from https://github.com/cloudera/flume/downloads and extract to some path in your computer. Now I am calling Fllume installation root as $Flume_INSTALL_DIR.



2. The Master can be manually started by executing the following command:


2.1 $Flume_INSTALL_DIR/bin/flume master


2.2 After the Master is started, you can access it by pointing a web browser to http://localhost:35871/.This web page displays the status of all Flume nodes that have contacted the Master, and shows each node’s currently assigned configuration. When you start this up without Flume nodes running, the status and configuration tables will be empty.


3. The flume collector can be manually started by executing the following command in another terminal.


3.1 $Flume_INSTALL_DIR/bin/flume node –n flume-collector


3.2 To check whether a Flume node is up, point your browser to the Flume Node status page athttp://localhost:35862/. Each node displays its own data on a single table that includes diagnostics and metrics data about the node, its data flows, and the system metrics about the machine it is running on. If you have multiple instances of the flume node program running on a machine, it will automatically increment the port number and attempt to bind to the next port (35863, 35864, etc) and log the eventually selected port.


3.3 If the node is up, you should also refresh the Master’s status page (http:// localhost: 35871) to make sure that the node has contacted the Master. You brought up one node whose name is flume-collector, so you should have one node listed in the Master’s node status table.


4. Configuring a collector via master:-

4.1 On the Master’s web page click on the config link. Enter the following values into the "Configure a node" form, and then click Submit.

Node name:flume-collector

Source: collectorSource(35853)

Sink:collectorSink("hdfs://hadoop-namenode:9000/user/flume /logs/%Hoo ","%{host}-")

Note: - The collector writes to an HDFS cluster (assuming the HDFS nameNode is called namenode).


5. The flume node can be manually started by executing the following command in another terminal.

5.1 $Flume_INSTALL_DIR/bin/flume node –n flume-agent

5.2 Perform step 3.2 and 3.3 again.


6. Configuring an agent via master:-

6.1 On the Master’s web page, click on the config link. Enter the following values into the "Configure a node" form, and then click Submit.

Node name:flume-agent

Source: tail(“path/to/logfile”)
Ex:-tail("/home/impetus/logAnalytics/dot.log")

Sink: agentSink("localhost",35853)


7. To check whether data is stored into hdfs or not, you can check it by pointing browser to http://localhost:50070/.






             
















9 comments:

Twinkal Khanna said...


Great presentation of Big Data Hadoop Tutorial form of blog and Hadoop tutorial. Very helpful for beginners like us to understand Big Data Hadoop course. if you're interested to have an insight on Big Data Hadoop training do watch this amazing tutorial.https://www.youtube.com/watch?v=nuPp-TiEeeQ&t=16s

Dwarakesh babu said...

Whoa! I’m enjoying the template/theme of this website. It’s simple, yet effective. A lot of times it’s very hard to get that “perfect balance” between superb usability and visual appeal. I must say you’ve done a very good job with this.


AWS Training in Velachery | Best AWS Course in Velachery,Chennai

Best AWS Training in Chennai | AWS Training Institutes |Chennai,Velachery

Amazon Web Services Training in Anna Nagar, Chennai |Best AWS Training in Anna Nagar, Chennai

Amazon Web Services Training in OMR , Chennai | Best AWS Training in OMR,Chennai

Vignesh G said...

I am really happy with your blog because your article is very unique and powerful for new reader.
Click here:
selenium training in chennai
selenium training in bangalore
selenium training in Pune
selenium training in pune
Selenium Online Training

janani said...

Read all the information that i've given in above article. It'll give u the whole idea about it.
Java training in Chennai | Java training in Annanagar

Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in Chennai | Java training institute in Chennai | Java course in Chennai

Java training in Bangalore | Java training in Electronic city

shalinipriya said...

Wonderful article, very useful and well explanation. Your post is extremely incredible. I will refer this to my candidates...
Data Science Training in Indira nagar
Data Science Training in btm layout
Python Training in Kalyan nagar
Data Science training in Indira nagar
Data Science Training in Marathahalli | Data Science training in Bangalore

Saro said...

Wow it is really wonderful and awesome thus it is very much useful for me to understand many concepts and helped me a lot. it is really explainable very well and i got more information from your blog.

rpa training in velachery| rpa training in tambaram |rpa training in sholinganallur | rpa training in annanagar| rpa training in kalyannagar

thulasi ragini said...

I think you have a long story to share and i am glad after long time finally you cam and shared your experience.
online Python training
python training in chennai

johnsy sai said...

Inspiring writings and I greatly admired what you have to say , I hope you continue to provide new ideas for us all and greetings success always for you..Keep update more information..
Best Devops training in sholinganallur
Devops training in velachery
Devops training in annanagar
Devops training in tambaram

sathya shri said...

Your very own commitment to getting the message throughout came to be rather powerful and have consistently enabled employees just like me to arrive at their desired goals.
angularjs Training in bangalore

angularjs online Training

angularjs Training in marathahalli

angularjs interview questions and answers

angularjs Training in bangalore

angularjs Training in bangalore

Post a Comment