Thursday, September 3, 2015

Hadoop setup on windows and run word count example

Hadoop setup in windows

1. Download Hadoop from the below link

http://hadoop.apache.org/releases.html

required files to execute in windows- winutils.exe

2. unzip the downloaded hadoop
3. setup java and hadoop path in Environment variables

How to setup path in environment variables

1. Right click on My Computer
2. Click on Environment Variables
3. Under User variables-> Click on New
Variable Name: HADOOP_HOME
Variable Value:c:\hadoop\hadoop-2.4.1
4. Under System variables

Select Path and Click on the Edit

update the java path upto bin : c:\Java\jdk1.6.0_34\bin;
update the value as hadoop path upto bin -> c:\hadoop\hadoop-2.4.1\bin;
4. Go to command prompt

use the below command to check whether hadoop path and java path has been set properly or not

hadoop version
java -version

5. Now it's time to start the hadoop
6. Open the hadoop-env.cmd file which is under c:\hadoop\hadoop-2.4.1\etc\hadoop
set JAVA_HOME=c:\Java\jdk1.6.0_34

7. Open the core-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoop
  
        
    fs.defaultFS
         hdfs://localhost:9000
        

8. Open the hdfs-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoop

       
       dfs.replication
       1
        
 
   dfs.namenode.name.dir
   file:/hadoop/data/dfs/namenode
 
 
   dfs.datanode.data.dir
   file:/hadoop/data/dfs/datanode
 
 
  dfs.webhdfs.enabled
  true
 
   
 
9. Open the yarn-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoop


 
   yarn.nodemanager.aux-services
   mapreduce_shuffle
  
  
   yarn.nodemanager.aux-services.mapreduce.shuffle.class
   org.apache.hadoop.mapred.ShuffleHandler
  
  
 yarn.application.classpath
 
  %HADOOP_HOME%\etc\hadoop,
  %HADOOP_HOME%\share\hadoop\common\*,
  %HADOOP_HOME%\share\hadoop\common\lib\*,
  %HADOOP_HOME%\share\hadoop\mapreduce\*,
  %HADOOP_HOME%\share\hadoop\mapreduce\lib\*,
  %HADOOP_HOME%\share\hadoop\hdfs\*,
  %HADOOP_HOME%\share\hadoop\hdfs\lib\*,          
  %HADOOP_HOME%\share\hadoop\yarn\*,
  %HADOOP_HOME%\share\hadoop\yarn\lib\*
 
     

   
10. Rename mapred-site.xml.template to mapred-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoop


  
    mapreduce.framework.name
     yarn
         
    


11. After successful configuration we need to check whether everything is working fine or not

Go to command prompt -> upto hadoop installation

Basic commands for hadoop :
     1. format the namenode using following command
        c:\hadoop\hadoop-2.4.1\bin>hdfs namenode -format
     2. first start the datanode and namenode by using following command
        c:\hadoop\hadoop-2.4.1\sbin>start-dfs.cmd
     
      After running the comand then 2 windows will open with names(namenode and datanode)
   
      3. first start the node manager and resource manager by using following command
    
      c:\hadoop\hadoop-2.4.1\sbin>start-yarn.cmd
    
       After running the comand then 2 windows will open with names(resourcemanager and nodemanager)
       

12. After successful start of namenode,datanode ,resourcemanager,nodemanager to check whether hadoop has installed successfully or not.

http://localhost:50070
  http://localhost:50075

13. Now we need to run the sample wordcount example using mapreduce program in hadoop
Problem: Need to find out how many words are there in the given file
   Solution:
   1. First we need to create file with any name (input.txt)
        
     input.txt -> under c:\hadoop\
       
   2. we need to create a input directory using hdfs 
     
    c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -mkdir -p input
     
   3. we need to copy from local to hdfs input directory
     
    c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -copyFromLocal  c:\hadoop\input.txt input
       
   4. verify files are moved or not
     
    c:\hadoop\hadoop-2.4.1\bin> hdfs dfs -ls input
      
     If it's not displayiing the result then use  following command
        
    c:\hadoop\hadoop-2.4.1> hdfs dfs -ls input
      
  5. run the wordcount program
     
    c:\hadoop\hadoop-2.4.1\bin>
      
   yarn jar c:\hadoop\hadoop-2.4.1\\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.4.1.jar wordcount input/ output/
           
     
  6. verify the result.
     
   c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -cat output
       
   verify the status of the job and output through web url
       
    http://localhost:50075
    http://localhost:8088/cluster

No comments:

Post a Comment

Contact Form

Name

Email *

Message *