Hadoop setup in windows
1. Download Hadoop from the below link
http://hadoop.apache.org/releases.html
required files to execute in windows- winutils.exe
2. unzip the downloaded hadoop
3. setup java and hadoop path in Environment variables
How to setup path in environment variables
1. Right click on My Computer
2. Click on Environment Variables
3. Under User variables-> Click on New
Variable Name: HADOOP_HOME
Variable Value:c:\hadoop\hadoop-2.4.1
4. Under System variables
Select Path and Click on the Edit
update the java path upto bin : c:\Java\jdk1.6.0_34\bin;
update the value as hadoop path upto bin -> c:\hadoop\hadoop-2.4.1\bin;
4. Go to command prompt
use the below command to check whether hadoop path and java path has been set properly or not
hadoop version
java -version
5. Now it's time to start the hadoop
6. Open the hadoop-env.cmd file which is under c:\hadoop\hadoop-2.4.1\etc\hadoop
set JAVA_HOME=c:\Java\jdk1.6.0_34
7. Open the core-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoop
11. After successful configuration we need to check whether everything is working fine or not
Go to command prompt -> upto hadoop installation
12. After successful start of namenode,datanode ,resourcemanager,nodemanager to check whether hadoop has installed successfully or not.
13. Now we need to run the sample wordcount example using mapreduce program in hadoop
1. Download Hadoop from the below link
http://hadoop.apache.org/releases.html
required files to execute in windows- winutils.exe
2. unzip the downloaded hadoop
3. setup java and hadoop path in Environment variables
How to setup path in environment variables
1. Right click on My Computer
2. Click on Environment Variables
3. Under User variables-> Click on New
Variable Name: HADOOP_HOME
Variable Value:c:\hadoop\hadoop-2.4.1
4. Under System variables
Select Path and Click on the Edit
update the java path upto bin : c:\Java\jdk1.6.0_34\bin;
update the value as hadoop path upto bin -> c:\hadoop\hadoop-2.4.1\bin;
4. Go to command prompt
use the below command to check whether hadoop path and java path has been set properly or not
hadoop version
java -version
5. Now it's time to start the hadoop
6. Open the hadoop-env.cmd file which is under c:\hadoop\hadoop-2.4.1\etc\hadoop
set JAVA_HOME=c:\Java\jdk1.6.0_34
7. Open the core-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoop
8. Open the hdfs-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoopfs.defaultFS hdfs://localhost:9000
9. Open the yarn-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoopdfs.replication 1 dfs.namenode.name.dir file:/hadoop/data/dfs/namenode dfs.datanode.data.dir file:/hadoop/data/dfs/datanode dfs.webhdfs.enabled true
10. Rename mapred-site.xml.template to mapred-site.xml and add the below details, this file is under c:\hadoop\hadoop-2.4.1\etc\hadoopyarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.application.classpath %HADOOP_HOME%\etc\hadoop, %HADOOP_HOME%\share\hadoop\common\*, %HADOOP_HOME%\share\hadoop\common\lib\*, %HADOOP_HOME%\share\hadoop\mapreduce\*, %HADOOP_HOME%\share\hadoop\mapreduce\lib\*, %HADOOP_HOME%\share\hadoop\hdfs\*, %HADOOP_HOME%\share\hadoop\hdfs\lib\*, %HADOOP_HOME%\share\hadoop\yarn\*, %HADOOP_HOME%\share\hadoop\yarn\lib\*
mapreduce.framework.name yarn
11. After successful configuration we need to check whether everything is working fine or not
Go to command prompt -> upto hadoop installation
Basic commands for hadoop : 1. format the namenode using following command c:\hadoop\hadoop-2.4.1\bin>hdfs namenode -format 2. first start the datanode and namenode by using following command c:\hadoop\hadoop-2.4.1\sbin>start-dfs.cmd After running the comand then 2 windows will open with names(namenode and datanode) 3. first start the node manager and resource manager by using following command c:\hadoop\hadoop-2.4.1\sbin>start-yarn.cmd After running the comand then 2 windows will open with names(resourcemanager and nodemanager)
12. After successful start of namenode,datanode ,resourcemanager,nodemanager to check whether hadoop has installed successfully or not.
http://localhost:50070 http://localhost:50075
13. Now we need to run the sample wordcount example using mapreduce program in hadoop
Problem: Need to find out how many words are there in the given file Solution: 1. First we need to create file with any name (input.txt) input.txt -> under c:\hadoop\ 2. we need to create a input directory using hdfs c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -mkdir -p input 3. we need to copy from local to hdfs input directory c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -copyFromLocal c:\hadoop\input.txt input 4. verify files are moved or not c:\hadoop\hadoop-2.4.1\bin> hdfs dfs -ls input If it's not displayiing the result then use following command c:\hadoop\hadoop-2.4.1> hdfs dfs -ls input 5. run the wordcount program c:\hadoop\hadoop-2.4.1\bin> yarn jar c:\hadoop\hadoop-2.4.1\\share\hadoop\mapreduce\hadoop-mapreduce-examples-2.4.1.jar wordcount input/ output/ 6. verify the result. c:\hadoop\hadoop-2.4.1\bin>hdfs dfs -cat output verify the status of the job and output through web url http://localhost:50075 http://localhost:8088/cluster