Learn and shine

Saturday, November 21, 2015

Hadoop Oozie Framework

Oozie - Framework

Oozie is a workflow/coordination system that you can use to manage Apache Hadoop jobs.

Oozie server a web application that runs in java servlet container(the standard Oozie distribution is using Tomcat).

This server supports reading and executing Workflows. Coordinators and Bundles definitions.

Oozie is a framework which is used to handle to run the Hadoop jobs.
It is same like Autosys,Cron jobs, ContolM Scheduling tools

HPDF – Hadoop process definition language – defining the job details, start node , stop node ,input directory ,output directory etc.. details.
Main features:
 Execute and Monitor workflows in Hadoop
 Periodic scheduling of workflows
 Trigger execution of data availability
 HTTP and command line interface and webconsole
Oozie work flow start
Go to installation directory of oozie

Ex: cd /usr/lib/oozie-4.0.0/
:  ./bin/oozie-start.sh

Once it’s started then click the below URL to check whether Oozie started or not.

localhost:11000/oozie

In Oozie webconsole, you can see the job information (logs,configuration,etc..)

Oozie Workflow:

PREP: when a workflow job is first created it will be PREP state. Job is defined but not running.

RUNNING: When a CREATED workflow job is started.it goes into RUNNING state, it will remain in RUNNING state while it does not reach it’s end state, ends in error or it is suspended.

SUSPENDED: A RUNNING workflow job can be suspended, it will remain in SUSPENDED state until the workflow job is resumed or it’s killed.

Scheduling with Oozie

Coordinator to Map Reduce -> Launch MR jobs at Regular intervals

Map Reduce to HDFS -> Write Files

Oozie – workflow.xml

The workflow definition language is XML based and it is called HPDL- Hadoop Process Definiton language).

Workflow.xml minimum details we need to mention like name, starting point and ending point.

Ex:
                                   Name=”WorkFlowRunnerTest”>

Flow control nodes: Provides a way to control the workflow execution path
Start node start : This specifies a starting point of an Oozie Workflow
End node end: This specifies an end point for an Oozie Workflow.
To this we need to add action , and within that we will specify the map-reduce parameters.


   
      localhost:8032
      hdfs://localhost:9000 
    
   
       
           mapred.input.dir
           {inputDir}
      
       
           mapred.output.dir
           (outputDir)

action requires and tags to direct the next action on success or failure.

Job.properties file needs to mention details like input dir, output dir.
Job.properties file no need to move to HDFS.

Running a Oozie Application

1. Create a directory for Oozie Job(WordCountTest)
2. Write a application and create a jar (ex:Mapreduce jar). Move this jar to lib folder in WordCountTest directory.
3. Job.properties and workflow.xml inside WordCountTest directory
4. Move this directory to HDFS
5. Running the application
oozie job -oozie http://localhost:11000/oozie - config job.properties –run
(job.properties should be from local path)

Workflow Job Status command
oozie job –info job_123
Workflow Job Log
oozie job –log job_123
Workflow Job definition
oozie job –definition job_123
Oozie version
oozie admin –oozie http://localhost:11000/oozie -version

Oozie Coordinator

The Oozie Coordinator supports the automated starting of Oozie workflow process.

It is typically used for the design and execution of recurring invocations of Workflow processed triggered by time and/or data availability.



  
      
         hdfs://bar:9000/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}
        
      
       
            
                       ${current(0)}
            
            
            
              
                      
                              hdfs://localhost:9000/WordCountTest_TimeBased
                              
              inputData
              ${data(‘inputLogs’)}

Oozie commands

Checking the multiple workflow jobs

oozie jobs –oozie http://localhost:11000/oozie -localtime -len -filter status=RUNNING

Checking the status of the multiple coordinator jobs

oozie jobs –oozie http://localhost:11000/oozie -jobtype coordinator

Killing a workflow , Coordinator or Bundle Job

oozie job –oozie http://localhost:11000/oozie -kill

Checking the Status of a workflow , Coordinator or Bundle Job or a Coordinator Action

oozie job –oozie http://localhost:11000/oozie –info

Hope this will guide you how to work with Oozie framework.

Saturday, September 5, 2015

struts multibox example

Struts multibox [multiple check boxes] example

1. create a memeber variables and respective setters and getters in any form class like below

import org.apache.struts.action.ActionForm;
   
   public class LanguageForm extends ActionForm
    private String[] selectedLanguages = {}; 
 private String[] languages = {"Java","J2EE","JSP","STRUTS","Spring"}; 
 
 public String[] getSelectedLanguages() {
  return selectedLanguages;
 }

 public void setSelectedLanguages(String[] selectedLanguages) {
  this.selectedLanguages = selectedLanguages;
 }

 public String[] getLanguages() {
  return languages;
 }

 public void setLanguages(String[] languages) {
  this.languages = languages;
 }

2. Add below code inside action class to display languages in JSP

public class LanguageAction extends Action {
 private static final String SUCCESS = "success";

 /**
  * 
  */
 public ActionForward execute(ActionMapping mapping, ActionForm form,
   HttpServletRequest request, HttpServletResponse response)
   throws Exception {
          String selectedLanguageValues="";
    LanguageForm languageForm = (LanguageForm)form;
   for (String selectedLanguage : languageForm.getSelectedLanguages()) {
    selectedLanguageValues = selectedLanguageValues.concat(selectedLanguage+",");
   }
   if(selectedLanguageValues != null && !selectedLanguageValues.isEmpty()){
    selectedLanguageValues = selectedLanguageValues.substring(0,selectedLanguageValues.length()-1);
   }
   System.out.println("selectedLanguageValues["+selectedLanguageValues+"]");
   return SUCCESS;
   
 }
}

3. JSP code as mentioned below

<%@taglib uri="http://jakarta.apache.org/struts/tags-html" prefix="html"%>
         <%@taglib uri="http://jakarta.apache.org/struts/tags-bean" prefix="bean"%>
         <%@taglib uri="http://struts.apache.org/tags-logic" prefix="logic"%>

4. Need to do respective configuratio in struts-config.xml like action class, form class etc...
Refer Struts Step By Step example in this blog for configuration related details.

Friday, September 4, 2015

Create WordCount example using mapreduce framework in hadoop using eclipse run on windows

Create WordCount example using mapreduce framework in hadoop using eclipse run on windows
1. Open Eclipse
2. Create new java project and named it as - mapreduce_demo
3. Create Java class with name WordCount.java

How map reduce will work in hadoop
Approach:1
input.txt file having the following data..

(map)
--------------------------------------------------------------------------------------
abc def ghi jkl mno pqr stu vwx    P1- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P2- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P3- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P4- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P5- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P6- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P7- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P8- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P9- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
abc def ghi jkl mno pqr stu vwx    P10- abc 1 def 1 ghi 1 jkl 1 mno-1 pqr 1  stu 1 vwx 1
---------------------------------------------------------------------------------------
map
-----
  processes one line at a time, as provided by the specified TextInputFormat
  emits a key-value pair of < , 1>.


Reducer
----------
The Reducer implementation, via the reduce method just sums up the values, which are the occurence counts for each key  
    emit(eachWord, sum)

4. Paste the below code in the WordCount.java file

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;

public class WordCount {

  public static class Map extends Mapper{

    public void map(LongWritable key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        value.set(itr.nextToken());
        context.write(value, new IntWritable(1));
      }
    }
  }

  public static class Reduce extends Reducer {
   
   
    public void reduce(Text key, Iterable values,
                       Context context) throws IOException, InterruptedException {
     
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
       context.write(key, new IntWritable(sum));
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = new Job(conf,"mywordcount");
    
    job.setJarByClass(WordCount.class);
    job.setMapperClass(Map.class);
    job.setReducerClass(Reduce.class);
    
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);
    
    Path outputPath = new Path(args[1]);
    
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    
    outputPath.getFileSystem(conf).delete(outputPath);
    
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

5. Set up the build path to avoid errors.
Right Click on the project(mapreduce_demo)->Properties->Java Build Path-> Libraries ->Add External Jars
select jar's from

c:\hadoop\hadoop-2.4.1\share\hadoop\common
 c:\hadoop\hadoop-2.4.1\share\hadoop\common\lib
 c:\hadoop\hadoop-2.4.1\share\hadoop\mapreduce
 c:\hadoop\hadoop-2.4.1\share\hadoop\mapreduce\lib

6. Once you completed the above steps, now we need to create jar to run the WordCount.java in hadoop.
7. Create Jar:

Right click on the project(mapreduce_demo)->Export->Jar(Under java)->Click Next->Next->Main Class->
Select WordCount->Click Finish
8. Now we have created jar successfully.
9. Before Executing the jar, need to start the (namenode,datanode,resourcemanager and nodemanager)
10. c:\hadoop\hadoop-2.4.1\sbin>start-dfs.cmd
11. c:\hadoop\hadoop-2.4.1\sbin>start-yarn.cmd
12. Check whether any inut files available in the hdfs system already if not create the same.

Hadoop basic commands:

c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -ls input (If it is not created) then use below command to create directory
c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -mkdir input
Copy any text file into input directory of hdfs
c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -copyFromLocal input_file.txt input
Verify file has been copied to hdfs or not using below command
c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -ls input
Verift the data of the file which you copied into hdfs
c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -cat input.

13. Once above steps's done. Now run the mapreduce program using the following command
c:\hadoop\hadoop-2.4.1\bin>
yarn jar c:\hadoop\hadoop-2.4.1\wordcount.jar input/ output/

14. verify the result.

c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -cat output
verify the status of the job details and output through web url

http://localhost:50075
http://localhost:8088/cluster