Jun 21, 2012

Secondary Namenode - What it really do?

Secondary Namenode is one of the poorly named component in Hadoop. By its name, it gives a sense that its a backup for the Namenode.But in reality its not. Lot of beginners in Hadoop get confused about what exactly SecondaryNamenode does and why its present in HDFS.So in this blog post I try to explain the role of secondary namenode in HDFS.

By its name, you may assume that it has something to do with Namenode and you are right. So before we dig into Secondary Namenode lets see what exactly Namenode does.

Namenode

Namenode holds the meta data for the HDFS like Namespace information, block information etc. When in use, all this information is stored in main memory. But these information also stored in disk for persistence storage.




The above image shows how Name Node stores information in disk. Two different files are

  1. fsimage - Its the snapshot of the filesystem when namenode started
  2. Edit logs - Its the sequence of changes made to the filesystem after namenode started

Only in the restart of namenode , edit logs are applied to fsimage to get the latest snapshot of the file system. But namenode restart are rare in production clusters which means edit logs can grow very large for the clusters where namenode runs for a long period of time. The following issues we will encounter in this situation

  1. Editlog become very large , which will be challenging to manage it
  2. Namenode restart takes long time because lot of changes has to be merged
  3. In the case of crash, we will lost huge amount of metadata since fsimage is very old

So to overcome this issues we need a mechanism which will help us  reduce the edit log size which is manageable and have up to date fsimage ,so that load on namenode reduces . It's very similar to Windows Restore point, which will allow us to take snapshot of the OS so that if something goes wrong , we can fallback to the last restore point.

So now we understood NameNode functionality and challenges to keep the meta data up to date.So what is this all have to with Seconadary Namenode?

Secondary Namenode

Secondary Namenode helps to overcome the above issues by taking over responsibility of merging editlogs with fsimage from the namenode.


The above figure shows the working of Secondary Namenode

  1. It gets the edit logs from the namenode in regular intervals and applies to fsimage
  2. Once it has new fsimage, it copies back to namenode
  3. Namenode will use this fsimage for the next restart,which will reduce the startup time

Secondary Namenode whole purpose is to have a checkpoint in HDFS. Its just a helper node for namenode.That’s why it also known as checkpoint node inside the community.

So we now understood all Secondary Namenode does puts a checkpoint in filesystem which will help Namenode to function better. Its not the replacement or backup for the Namenode. So from now on make a habit of calling it as a checkpoint node.

Jan 6, 2012

Say Hello To "Android" !! Part- I [Introduction To Android]

    Its an era of mobile, tablets and a very exciting time for developers!! Mobile phones have never been more popular and powerful. Smart Phones have become very stylish, versatile packing hardware features like GPS, accelerometers, etc. are an enticing platform which kindles the developers to create some innovative mobile applications.

    With the existing mobile development built on proprietary operating systems that restrict third-party applications, Android offers an open and the best alternative. Without artificial barriers, Android developers are free to write applications that take full advantage of increasingly powerful mobile hardware. As a result, developer interest in Android devices has made their 2008 release a hugely anticipated mobile technology event.

    The open philosophy of android is most welcomed by majority of the developers as it features very powerful SDK libraries. Experienced mobile developers can now tinker with android and explore the platform, leveraging the unique features to enhance existing products or create more innovative ones.

ANDROID
    Its an open source software stack that includes
• Operating System
• Middle ware
• Key Applications + set of API Libraries
which changes the look, feel and function of the mobile.

    In Android, native and third-party applications are written using the same APIs and executed on the same run time. These APIs feature hardware access, location-based services, support for background services, map-based activities, relational databases, inter-device peer-to-peer messaging and 2D and 3D graphics.

Just A Flashback ...

    Historically, the developers had to code in low level C or C++ and had to learn the specific hardware feature upon which they were coding. But, as the hardware features enhanced this became more cumbersome. More recently, Symbian was far successful in giving a room for the developers to better leverage the hardware available. However, it required writing complex C/C++ code and making heavy use of proprietary APIs that are notoriously difficult to use. This difficulty was amplified when developing applications that must work on different hardware implementations and  particularly true when developing for a particular hardware feature like GPS.

    Then came the Java hosted MIDIlets that are executed on the same JVM, abstracting the underlying hardware and letting the developers create apps that run on wide variety of hardware that supports Java run time.

    So our Andy is here... :) Ouch! forgot about Apple's iPhone, Windows mobiles ?? ;) No offence!
    They provide richer UI, UX and simplified development environment! "BUT" they’re built on proprietary operating systems that often prioritize native applications over those created by third parties and restrict communication among applications and native phone data.

    Third-party and native Android applications are written using the same APIs and are executed on the same run time. Users can remove and replace any native application with a third-party developer alternative; even the dialer and home screens can be replaced...{ Wanna Try ?? :p }

    Google describes Android as:
"The first truly open and comprehensive platform for mobile devices, all of the software to run a mobile phone but without the proprietary obstacles that have hindered mobile innovation."
          http://googleblog.blogspot.com/2007/11/wheres-my-gphone.html   :)

    The FLASHBACK is incomplete...... without the introduction of OHA { Oh Haa ??  ;) }

OHA: Open Handset Alliance comprises of
• Developers
• Hardware Manufacturers
• Mobile Carriers

The tech companies involved prominently are Motorola, HTC, T-Mobile, Qualcomm and their words:
"A commitment to openness, a shared vision for the future, and concrete plans to make the vision a reality. To accelerate innovation in mobile and offer consumers a richer, less expensive, and better
mobile experience."
                http://www.openhandsetalliance.com/oha_faq.html :)

        Android offers an excellent enterprise platform, has targeted developers making their lives far simple with Google and the OHA betting that the way to deliver
better mobile software to consumers is by making it easier for developers to write it themselves. This openness and power ensure that anyone with the inclination can bring a vision to life at minimal cost.

       Use Open source softwares, share them and give back something better to the community..... !! :)

       Next post...
• Merits of android
• unboxing SDK features, Software stack and lots more ..... :)


      Well, http://developer.android.com is a great reference site!
                                       
                                              Happy Learning! :)

Sep 25, 2011

Getting Started With ANTLR:Basics

Yeah! It's after a lapse of a month or so that there is a post in this blog! :)
Well, this post drives you towards the basics of ANTLR. Previously, we had learnt about setting up of ANTLR as an external tool. 

RECAP! It's here:
antlr-external tool:)
So, here we go....

What is ANTLR?
• ANother Tool for Language Recognition, is a language tool that provides a framework for constructing recognizers, interpreters, compilers, and translators from grammatical descriptions containing actions.

What can be the target languages?
• Ada
• Action Script
• C
• C#; C#2
• C#3
• D
• Emacs ELisp
• Objective C
• Java
• Java Script
• Python
• Ruby
• Perl6
• Perl
• PHP
• Oberon
• Scala

What does ANTLR support?
• Tree construction
• Error recovery
• Error handling
• Tree walking
• Translation

What environment does it support?
ANTLRWorks is the IDE for ANTLR. It is the graphical grammar editor and debugger, written by Jean Bovet using Swing.


What for ANTLR can be used?

• ""REAL"" programming languages
• domain-specific languages [DSL]

Who is using ANTLR?
• Programming languages :Boo, Groovy, Mantra, Nemerle, XRuby etc.
• Other Tools: HIbernate, Intellij IDEA, Jazillian, JBoss Rules, Keynote(Apple), WebLogic(Oracle) etc.

Where is that you can look for ANTLR?

You can always follow here http://www.antlr.org
• to download ANTLR and ANTLRWorks, which are free and open source
• docs,articles,wiki,mailing list,examples.... You can catch everything here!


Row your Boat....
  • Basic terms

• Lexer : converts a stream of characters to a stream of tokens.
• Parser : processes of tokens, possibly creating AST
• Abstract Syntax Tree(AST): an intermediate tree representation of the parsed input that is simpler to process than the stream of tokens. It can as well be processed multiple times.
• Tree Parser: It processes an AST
• String Template: a library that supports using templates with placeholders for outputting text
  • General Steps

• Write Grammar in one or more files
• Write string templates[optional]
• Debug your grammar with ANTLRWorks
• Generate classes from grammar
• Write an application that uses generated classes
• Feed the application text that conforms to the grammar

A Bit Further....

Lets write a simple grammar which consists of
• Lexer
• Parser

Lexer: Breaks the input stream into tokens
Lets take the example of simple declaration type in C of the form "int a,b;" or "int a;" and same with float.
As we see we can write lexer as follows:
//TestLexer.g

grammar TestLexer;
ID  : ('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'0'..'9'|'_'|'.'|'a'..'z'|'A'..'Z')*;
COMMA: ',';
SEMICOLON:';';
DATATYPE: 'int' | 'float';

As we could see, these were the characters that were to be converted to tokens.
So, now lets write some rules which processes these tokens generated and may it create a parse tree accordingly.
//TestParser.g

grammar TestParser;
options {
language : Java;
}
decl:DATATYPE ID (',' ID)* ;       

Running ANTLR on the grammar just generates the lexer and parser,TestParser and TestLexer. To actually try the grammar on some input, we
need a test rig with a main( ) method as follows:
// Test.java

import org.antlr.runtime.*;
public class Test {
public static void main(String[] args) throws Exception {
// Create an input character stream from standard in
ANTLRFileStream input = new ANTLRFileStream("input"); // give path to the file input
// Create an ExprLexer that feeds from that stream
TestLexer lexer = new TestLexer(input);
// Create a stream of tokens fed by the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// Create a parser that feeds off the token stream
TestParser parser = new TestParser(tokens);
// Begin parsing at rule decl
parser.decl();
}
}

We shall see how to create an AST and walk over the tree in the next blog post...
Happy learning....! :)

Aug 23, 2011

Hadoop workshop : First success story


We completed our first hadoop workshop on 20th August with great success . This post summarizes some of the insights and feedback we got from the event.

People love to learn a new hot technology in market. So many people are interested to learn Hadoop but they just did not have the right place to start. I think our workshop gave them the right platform to kick start in hadoop. We sold all our 17 tickets to the event within few days. So we even sold out next workshop tickets and the third workshop tickets are already selling . Yeah! its on fire.. We are doing small workshops to get the feedback and improve the overall experience.

Out of 17 , twelve people attended the workshop. Participants thoroughly enjoyed the interactive sessions and expressed that the hands on were great . The hands on went as planned which gave the participants an insight to hadoop and map/reduce .Putting in their own words,the following is what the people expressed....

“Great work by small company having effective people...Impressed! I want to have the same training once again” -Vijesh
“Good and Interactive sessions delivered.Nice job by Madhu and company” -Devang Gandhi
"Hands-on trainings were good" -Uma Mahewari
"Content delivery was very good" -Puneetha

With this kind of positive response we are charged to host more workshops. We sold out few tickets for students which is a student centric workshop on 27th Aug . People already signing up for our third workshop . So if you are interested you can register here http://hadoopworkshopsept.eventbrite.com/  asap , since we are sure that we are going to sell out that soon.

We are also launching advanced trainings particularly for the workshop attendees which gives opportunities them to go deep into Hadoop and start their carrier as a Hadoop developer .If you know hadoop and if you want to know more this will be a great opportunity.

So overall it was a great experience and it gave the feeling that we are in a right path.
If you are interested in Hadoop and its ecosystem meet us at any of the above events. We can assure you that it would be a great experience for you.

Aug 5, 2011

Using ANTLR with maven


As a part of Nectar, we are trying to build a custom language using ANTLR. Since our project uses maven during the build time, we have to integrate ANTLR with maven. Though ANTLR provides maven plug-in, its little tricky to use. So, in this post I am explaining the steps to integrate ANTLR with maven using ANTLR3 maven plug in.

Step 1 :  You have to put all your grammar files , aka .g files in the default directory required by the plugin. Custom placing will not work because of some bug in the plugin. Hence, place the .g file in the following manner:

src/main/antlr3/<required-package>/.g

So the <required-package> is the package you specified in the .g file.

Step 2 :  Add the plug-in to the pom as follows:
<plugin>
<groupId>org.antlr</groupId>
<artifactId>antlr3-maven-plugin</artifactId>
<version>3.1.3-1</version>
<executions>
<execution>
<configuration>
    <outputDirectory>src/main/java</outputDirectory>
</configuration>
<goals>
      <goal>antlr</goal>
</goals>
</execution>
</executions>
</plugin>

We added a configuration which generates the lexer and parser files in the src directory rather than  the default generated source in target.

For more information about the plug-in, refer here

Step 3: Just run the pom and your .g will be compiled successfully.

Aug 2, 2011

One day Hadoop Workshop in Bangalore


After releasing Nectar, our open source analytics framework, we got a positive feedback and many of them wanted to know more about how we use hadoop in our company and get started with the hadoop development. So, we thought that a workshop on Hadoop would be great idea.

Thus, we have arranged a workshop about Hadoop on 20th August ,2011 held at Bangalore. In the workshop, we have scheduled events as how we are using hadoop to build our own analytics products and about Nectar.We are also going to talk about how you can use Hadoop in your organization. We will be having hands on experience for the attendees in the labs to setup the hadoop cluster,running map/reduce jobs etc.For more details about the event , refer this page.

Hadoop and small things

As you know Hadoop always wants to play with Big Data . It doesn’t like small files. Initially, we  thought we are going to have workshop for 10 people and the tickets were made free. But within 12 hours, all the tickets were sold out !!!  Now, we have a workshop for 30 people, by adding 20 more paid tickets.On a lighter side, we learnt that we cannot do small things with Hadoop! ;)

So, if you are interested in Hadoop event and want to know more about it, then do come and join us in the workshop. You can register here.

Aug 1, 2011

ANTLR as an external tool in eclipse on ubuntu

This tutorial tells how to setup the ANTLR in your eclipse.
STEP 1:
Download the jar file antlrworks-1.4.2.jar from http://www.antlr.org/download.
Further details about ANTLRWorks: The ANTLR GUI Development Environment, follow the link : http://www.antlr.org/works/index.html
STEP 2:
Create a java project in eclipse as follows:
File->New->Project
Select Java and Java project.
Click on Next.
Name the project as "TestANTLR"
Press Finish.
Add the antlrworks-1.4.2.jar to the project classpath.
Right click on "TestANTLR" project .
Select Properties->Libraries.
Click on "Add External jar"
Select the complete path of the "antlrworks-1.4.2.jar" and press Ok.
STEP 3: make it as an external tool
Goto Run->External Tools->Configure
Click on New.
Name: ANTLR Compiler
Tool Location: /usr/lib/jvm/java-6-sun-1.6.0.26/bin/java
// this must be the complete path to your java
Tool Arguments: -classpath complete_path_to_antlrworks-1.4.2.jar org.antlr.Tool ${resource_name}
Working Directory: ${container_loc}
Here, org.antlr.Tool is the main class which would take the ${resource_name} for processing.
${resource_name} and ${container_loc} can be selected with "Browse Variables" option too.
Going ahead :
*Creating a grammar file
Create a grammar file with .g extension. Say, Example.g
//sample code
grammar Example;
start : 'hello' ID ';' {System.out.println("hiii... "+$ID.text);} ;
ID: 'a'..'z' + ;
WS: (' ' |'\n' |'\r' )+
{$channel=HIDDEN;}
*Running the above code:
Run->External Tools->ANTLR Compiler
Press F5 or right click on the project and "refresh"
all you can see is a lexer and parser files generated with the tokens.
In our example,
ExampleLexer.java , ExampleParser.java and Example.tokens
Create Main.java program in the same project with the following code:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
// create a CharStream that reads from standard input
ANTLRInputStream input = new ANTLRInputStream(System.in);
// create a lexer that feeds off of input CharStream
ExampleLexer lexer = new ExampleLexer(input);
// create a buffer of tokens pulled from the lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// create a parser that feeds off the tokens buffer
ExampleParser parser = new ExampleParser(tokens);
// begin parsing at rule start
parser.start();
}
}
Set the arguments in the Run configurations and click on Apply and Run.
Now you have the output at console.
:)