Sunday, February 14, 2016

Getting started Hadoop with oracle or vmware virtual box and Ubuntu

Hadoop installation with Single DataNode( VMware or Oracle virtual box)
Download latest version VM ware from the below link
Download Oracle virtual box from the below site and install the same in local system.
Run the Virtual box(VirtualBox.exe) Application
click on new ->

And click on Next->Next-And create virtual box
Once that’s done virtual box will look like this. Select the Ubuntu downloaded package.

Start the Virtual box, then provide password from which user you want to start.

Once virtual box started then screeb will look like this

Open the terminal, by right click on the screen or search for terminal and open the same.

Command:to update the ubuntu
1. sudo apt-get update
Once update is complete

Command: install openssh server
2. sudo apt-get install openssh–server
Command: create a hadoop directory
3. mkdir /usr/local/hadoop
Download the hadoop latest version from below link
copy to virtual box and extract the tar file
Here I extracted under /usr/local/hadoop/
Command: to extract the tar file
4. tar -xvf .tar.gz
After extracting enter this command ls –lrt , you can see the list of folders related to hadoop
Command: To add hadoop to the group
5. sudo addgroup hadoop
Command: create new user called hduser
6. sudo adduser --ingroup hadoop hduser

Command: assign hduser to sudo
7. sudo adduser hduser sudo
Command: change the owner for hadoop as hduser
8. sudo chown –R hduser:hadoop /usr/local/hadoop
Command: switch to hduser
9. su – hduser

Command: install ssh
10. sudo apt-get install ssh
Command: generate a ssh key
11. ssh-keygen -t rsa –P ""
Command: copy key to authorized_keys
12. cat ~/.ssh/ >> ~/.ssh/authorized_keys
Command: install vim editor
13. sudo apt-get install vim
Command: Edit the sysctl.conf file to dispable few of the ipv6 realted configuration
14. sudo gedit /etc/sysctl.conf or sudo vi /etc/sysctl.conf
Add below lines

Command:Start the ssh
15. ssh localhost
Command: get the updates
16. sudo apt-get update
Command: edit the bashrc file to add the path of java and hadoop
17. sudo vi ./bashrc or sudo gedit ./bashrc

export  HADOOP_HOME = /usr/local/hadoop
         export  JAVA_HOME=/usr   [or] where ever your java installed location
Command: Source the bashrc file
18. source .bashrc

Command: Now check the version of java and hadoop
19. java –version
20. hadoop version
Command:Create a data directory inside /usr/local/hadoop
21. mkdir /usr/loca/hadoop/data
Command: edit the file to add the configuration
22. sudo gedit /usr/loca/hadoop/etc/hadoop/
export JAVA_HOME=/usr 
        export HADOOP_OPTS=”$HADOOP_OPTS – true  -Djava.library.path=$HADOOP_PREFIX/lib”
Command: edit the file to add the configuration
23. sudo gedit /usr/loca/hadoop/etc/hadoop/
        export HADOOP_OPTS=” Djava.library.path=$HADOOP_PREFIX/lib”
Now we need to edit the some of the hadoop related files, to start the single node
Go to /usr/local/hadoop/etc/hadoop$
Command: Edit the existing file and add the below configuration
24. sudo gedit core-site.xml


Command: Rename mapred-site.xml.template to mapred-site.xml
Go to /usr/local/hadoop/etc/hadoop
25. mv mapred-site.xml.template mapred-site.xml
26. sudo gedit mapred-site.xml


Then close this file
Edit the hdfs-site.xml,
Command: to edit the hdfs-site.xml
27. sudo gedit hdfs-site.xml


Command:Edit the yarn.xml
28. sudo gedit yarn.xml





Command: Need to format the namenode
29. /usr/local/hadoop/bin/hadoop namenode –format
After this format done then we need to start the dfs and yarn
30. /usr/local/hadoop/sbin/
31. /usr/local/hadoop/sbin/
Command: to display all the running datanodes and namemodes
32. jps

This is how we can setup the hadoop using oracle/vmware virtual box.

Thank you for viewing this post.

Contact Form


Email *

Message *