Saturday, November 21, 2015

Hadoop Oozie Framework


Oozie  -  Framework

Oozie is a workflow/coordination system that you can use to manage Apache Hadoop jobs.

Oozie server a web application that runs in java servlet container(the standard Oozie distribution is using Tomcat).

This server supports reading and executing Workflows. Coordinators and Bundles definitions.

Oozie is a framework which is used to handle to run the Hadoop jobs.
It is same like Autosys,Cron jobs, ContolM  Scheduling tools

HPDF – Hadoop process definition language – defining the job details, start node , stop node ,input directory ,output directory etc.. details.
Main features:
Execute and Monitor workflows in Hadoop
Periodic scheduling of workflows
Trigger execution of data availability
HTTP and command line interface and webconsole
Oozie work flow start
Go to installation directory of oozie
Ex: cd /usr/lib/oozie-4.0.0/
:  ./bin/oozie-start.sh
Once it’s started then click the below URL to check whether Oozie started or not.
localhost:11000/oozie

In Oozie webconsole, you can see the job information (logs,configuration,etc..)

Oozie Workflow:

PREP: when a workflow job is first created it will be PREP state. Job is defined but not running.

RUNNING: When a CREATED workflow job is started.it goes into RUNNING state, it will remain in RUNNING state while it does not reach it’s end state, ends in error or it is suspended.

SUSPENDED: A RUNNING workflow job can be suspended, it will remain in SUSPENDED state until the workflow job is resumed or it’s killed.

                   
                                   




Scheduling with Oozie


Coordinator to Map Reduce  ->   Launch MR jobs at Regular intervals

Map Reduce to HDFS -> Write Files
         


Oozie – workflow.xml

The workflow definition language is  XML based and it is called HPDL- Hadoop Process Definiton language).

Workflow.xml minimum details we need to mention like name, starting point and ending point.

Ex:
                                   Name=”WorkFlowRunnerTest”>
        
        

Flow control nodes: Provides a way to control the workflow execution path
Start node start : This specifies a starting point of an Oozie Workflow
End node end: This specifies an end point for an Oozie Workflow.
To this we need to add action , and within that we will specify the map-reduce parameters.

   
      localhost:8032
      hdfs://localhost:9000 
    
   
       
           mapred.input.dir
           {inputDir}
      
       
           mapred.output.dir
           (outputDir)
        
action requires and tags to direct the next action on success or failure.

Job.properties file needs to mention details like input dir, output dir.
Job.properties file no need to move to HDFS.


Running a Oozie Application

1. Create a directory for Oozie Job(WordCountTest)
2. Write a application and create a jar (ex:Mapreduce jar). Move this jar to lib folder in WordCountTest directory.
3. Job.properties and workflow.xml inside WordCountTest directory
4. Move this directory to HDFS
5. Running the application
oozie job  -oozie http://localhost:11000/oozie  - config  job.properties –run
    (job.properties should be from local path)

Workflow Job Status command
oozie job –info job_123
Workflow Job Log
oozie job  –log  job_123
 Workflow Job definition
  oozie job  –definition job_123
   Oozie version
   oozie admin –oozie http://localhost:11000/oozie -version

Oozie Coordinator

The Oozie Coordinator supports the automated starting of Oozie workflow process.

It is typically used for the design and execution of recurring invocations of Workflow processed triggered by time and/or data availability.




  
      
         hdfs://bar:9000/app/logs/${YEAR}/${MONTH}/${DAY}/${HOUR}
        
      
       
            
                       ${current(0)}
            
            
            
              
                      
                              hdfs://localhost:9000/WordCountTest_TimeBased
                              
              inputData
              ${data(‘inputLogs’)}
                    
                  
              
          


Oozie commands

Checking the multiple workflow jobs

oozie jobs –oozie http://localhost:11000/oozie -localtime -len -filter status=RUNNING

Checking the status of the multiple coordinator jobs

oozie  jobs –oozie  http://localhost:11000/oozie -jobtype coordinator

Killing a workflow , Coordinator or Bundle Job

oozie  job –oozie  http://localhost:11000/oozie -kill

Checking the Status of a workflow , Coordinator or Bundle Job or a Coordinator Action

oozie job –oozie  http://localhost:11000/oozie –info


Hope this will guide you how to work with Oozie framework.


AddToAny

Contact Form

Name

Email *

Message *