Learn and shine: Getting started Hadoop with oracle or vmware virtual box and Ubuntu

Sunday, February 14, 2016

Getting started Hadoop with oracle or vmware virtual box and Ubuntu

Hadoop installation with Single DataNode( VMware or Oracle virtual box)
Download latest version VM ware from the below link
http://www.traffictool.net/vmware/
Download Oracle virtual box from the below site and install the same in local system.
http://www.oracle.com/technetwork/server-storage/virtualbox/downloads/index.html
Run the Virtual box(VirtualBox.exe) Application
click on new ->

And click on Next->Next-And create virtual box
Once that’s done virtual box will look like this. Select the Ubuntu downloaded package.

Start the Virtual box, then provide password from which user you want to start.

Once virtual box started then screeb will look like this

Open the terminal, by right click on the screen or search for terminal and open the same.

Command:to update the ubuntu
1. sudo apt-get update
Once update is complete

Command: install openssh server
2. sudo apt-get install openssh–server
Command: create a hadoop directory
3. mkdir /usr/local/hadoop
Download the hadoop latest version from below link
http://hadoop.apache.org/releases.html
copy to virtual box and extract the tar file
Here I extracted under /usr/local/hadoop/
Command: to extract the tar file
4. tar -xvf .tar.gz
After extracting enter this command ls –lrt , you can see the list of folders related to hadoop
Command: To add hadoop to the group
5. sudo addgroup hadoop
Command: create new user called hduser
6. sudo adduser --ingroup hadoop hduser

Command: assign hduser to sudo
7. sudo adduser hduser sudo
Command: change the owner for hadoop as hduser
8. sudo chown –R hduser:hadoop /usr/local/hadoop
Command: switch to hduser
9. su – hduser

Command: install ssh
10. sudo apt-get install ssh
Command: generate a ssh key
11. ssh-keygen -t rsa –P ""
/home/hduser/.ssh/id_rsa
Command: copy id_rsa.pub key to authorized_keys
12. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Command: install vim editor
13. sudo apt-get install vim
Command: Edit the sysctl.conf file to dispable few of the ipv6 realted configuration
14. sudo gedit /etc/sysctl.conf or sudo vi /etc/sysctl.conf
Add below lines

net.ipv6.conf.all.disable_ipv6=1
                   net.ipv6.conf.default_ipv6=1
                  net.ipv6.conf.io.disable_ipv6=1

Command:Start the ssh
15. ssh localhost
Command: get the updates
16. sudo apt-get update
Command: edit the bashrc file to add the path of java and hadoop
17. sudo vi ./bashrc or sudo gedit ./bashrc

export  HADOOP_HOME = /usr/local/hadoop
         export  JAVA_HOME=/usr   [or] where ever your java installed location

Command: Source the bashrc file
18. source .bashrc

Command: Now check the version of java and hadoop
19. java –version
20. hadoop version
Command:Create a data directory inside /usr/local/hadoop
21. mkdir /usr/loca/hadoop/data
Command: edit the hadoop_env.sh file to add the configuration
22. sudo gedit /usr/loca/hadoop/etc/hadoop/hadoop_env.sh

export JAVA_HOME=/usr 
        export HADOOP_OPTS=”$HADOOP_OPTS –Djava.net.preferIPv4Stack= true  -Djava.library.path=$HADOOP_PREFIX/lib”

Command: edit the yarn_env.sh file to add the configuration
23. sudo gedit /usr/loca/hadoop/etc/hadoop/yarn_env.sh

export HADOOP_CONF_LIB_NATIVE_DIR=${HADOOP_PREFIX:-“lib/native”}
        export HADOOP_OPTS=” Djava.library.path=$HADOOP_PREFIX/lib”

Now we need to edit the some of the hadoop related files, to start the single node
Go to /usr/local/hadoop/etc/hadoop$
Command: Edit the existing file and add the below configuration
24. sudo gedit core-site.xml


fs.default.name
hdfs://localhost:9000


hadoop.tmp.dir
/usr/local/hadoop/data

Command: Rename mapred-site.xml.template to mapred-site.xml
Go to /usr/local/hadoop/etc/hadoop
25. mv mapred-site.xml.template mapred-site.xml
26. sudo gedit mapred-site.xml


    
       mapreduce.framework.name
       yarn

Then close this file
Edit the hdfs-site.xml,
Command: to edit the hdfs-site.xml
27. sudo gedit hdfs-site.xml


dfs.replication
3

Command:Edit the yarn.xml
28. sudo gedit yarn.xml


   
       yarn.nodemanager.aux-services 
       mapreduce_shuffle
  
 
        yarn.nodemanager.aux-services.mapreduce_shuffle.class
       org.apache.hadoop.mapred.ShuffleHandler
  


       yarn.resourcemanager.resource-tracker.address
       localhost:8025
  


        yarn.resourcemanager.scheduler.address
       localhost:8030
  


        yarn.resourcemanager.address
       localhost:8050

Command: Need to format the namenode
29. /usr/local/hadoop/bin/hadoop namenode –format
After this format done then we need to start the dfs and yarn
30. /usr/local/hadoop/sbin/start-dfs.sh
31. /usr/local/hadoop/sbin/start-yan.sh
Command: to display all the running datanodes and namemodes
32. jps