Learn and shine

Friday, March 25, 2016

HBase Basics, HBase Architecture, Getting started with No SQL Database HBase, HBase Components

This post will explain you about History of Hbase and HBase Architecture,Basic details about HBase, Different types of No SQL Databases
History of HBase
Started in Google.
GFS -> HDFS
MapReduce-> MapReduce
Big Table -> Apache HBASE

Any SQL system – RDBMS
1. Users data is increasing, then we will implement cache mechanism to improve performance.
2. Cache mechanism also having certain limlits.
3. Remove indexing.
4. Avoiding joins
5. Materialized view .
If we use above, then advantages of RDBMS has gone.
Google also faced same problem, then they started with Big Table.
For faster performance we use HBase.
What ever the features hive will not support like crud operations, we can do with HBase.
If anything need to be updated in real time access ,HBase if very useful.
Ad Targeting in real time is very faster.
What is Common problem with existing data processing with Hadoop or Hive?
1. Huge Data
2. Fast Random Access
3. Structured Data
4. Variable Schema- will support to enhance or increase the column names at runtime, which is RDBMS is not supported.
5. Need of compression
6. Need of Distribution(Shading)
How Traditional System(RDBMS) will solve this?
Case: If we want to design Linkedin database to maintain connections?
There 2 tables
1. Users – id,Name,Sex,age
2. Conenctions- User_id,Connection_id,type
But in case of HBase, we can save all the details about users and connections in same column family.
Characteristics of Probable
1. Distributed Database
2. Sorted Data
3. Sparse Data Store
4. Automatic Sharding.

Sorted Data
Example : How data stored in sorted way?
1. www.abc.com
2. www.ghf.com
3. Mail.abc.com
When ever user try’s to access abc.com , then mail.abc.com will not be returned in case of normal storage.

If we use sorted storage then data will be stored like below.
com.abc.www
com.abc.mail
com.ghf.www

If we store like above, then it is easy to access the same.
Sparse Data store
This is mathematical term. If there is null value for particular column , then it will not store.

No SQL Landscape

1.Each No SQL databases as mentioned above is same, they have developed for their purpose.
2.Dynamo is developed by Amazon and it available in Cloud. We can access the same.
3.Cassandra developed by Facebook and they will be using the same. It is combination of Dynamo and HBase, all the features available in Cassandra.

Any No SQL database will have all the characteristics.
It will satisfy only two property at the same time.

HBase Definition
It is a non -relational (NoSQL)database, which stores data in key value pair and it is also called as hadoop database.
1. It Sparse
2. Distributed
3. Multi –dimensional (table name,column name,timestamp) etc..
4. Sorted Map
5. Consistent

Difference between HBase and RDBMS

When to use HBase

When not to use HBase?
1. When you have only few thousand or millions records then it is not advisable to use HBase.
2. Lacks RDBMS commands, if our database requires sql commands then also not go for Hbase
3. When we have hardware less than 5 Data Nodes when replication factor is less than 3, then no need of HBase. It will overhead for system

HBase can run in local system –but this should be considered for a development configuration.
How face book uses HBase as their Message System

1.facebook monitored their usage and figured our what they really needed.
What they needed was a system that could handle two types of data pattern
1. A short set of temporal data that tends to be volatile
2. An ever growing set of data that can be accessed rarely.

1. Real Time
2. Key Value
3. Linearly
4. Big Data
5. Column oriented
6. Distributed
7. Robust
8. Scalable
9. Open source
These are the characteristics of HBase.
HBase is using not only facebook.But also twitter,yahoo etc… they will use to process their large volume data.

Major components of HBase
1. The HBase Master
It will store all the Hbase table and it will coordinate
2. The HRegion server
Actual data will be stored in this server
3. The HBase Client
We will interact to do the crud operations and processing the data

It is same like name node in HDFS
How data distribution will happen in HBase?

We are having data rows from  A to Z
     Rows                                               Servers
    A1,A2 –                Region  Null - A3                  Region server1
    B2,B3,B23,B43-         Region  B2 – B43                   Region server2      
    K1,K2,Z30 -            Region K1 – Z30                    Region server3

How HBase will write data to the file?

1.Every HBase requires confirmation from both Write Ahead Log (WAL) and the MemStore.
2.The two steps ensures that every write to HBase happens as fast as possible while maintaining durability.
3.The Memstore is flushed to a new HFile when it fills up.
4.Usally Memstore default size 256MB, once it is filled up then , it move that information to HFile it's default size is 64 KB.
5.It will be act as a immutable object.
HBase Read File
1.Data is reconciled from the block cache, The Mem-Store and the HFiles to give the client an up to date view of the rows which client requested for.
2.HFiles contain a snapshot of the Memstore at the point when it was flushed. Data for a complete row can be stored across multiple HFiles.
3. In order to read complete row, HBase must read across all HFiles that might contain information for that row in order to compose the complete record.

HFile Compaction

All HFiles will be compacted and put as Compacted HFile.
HBase Components
1. Region – a range of rows stored together
2. Region servers- serves one or more regions
a. A region served by only one region server
3. Master Server – Daemon responsible for managing HBase cluster.
4. HBase stores its data into HDFS- Relies on HDFS’s High availability and fault tolerance.
HBase Architecture

This architecture will explain you about how Hbase will work.

This is Basics about HBase. My next post you can see How to install and work with HBase.
Thank you very much for viewing this post.

Thursday, March 24, 2016

Hive Dynamic , Static Partitions,User defined functions(UDF) with Java

This post is having more advanced concepts in Hive like Dynamic Partition, Static Partition, custom map reduce script, hive UDF using java and python.

Configuring Hive to allow partitions
A query across all partitions can trigger with an enormous Map Reduce Job, if the table data and number of partitions are large. A highly suggested safety measure is putting Hive into strict mode, which prohibits queries of a partitioned table without a WHERE clause that filters the partitions.
We can set the mode to nonstrict, as in the following session.

Dynamic Partitioning –configuration

Hive> set hive.exec.dynamic.partition.mode=nonstrict;
Hive> set hive.exec.dynamic.partition=true;
Hive> set hive.enforce.bucketing=true;

Once we have configured, Then we will see how we will create a dynamic partition

Example:
Source table:
1. Hive> create table transaction_records(txnno INT,txndate STRING,custno INT,amount DOUBLE,category STRING, product STRING,City STRING,State String,Spendby String )row format delimited fields terminated by ‘,’ stored as textfile;
Create Partitioned table:
1.  Hive> create table transaction_recordsByCat(txnno INT,txndate STRING,custno INT,amount DOUBLE, product STRING,City STRING,State String,Spendby String )
Partitioned by (category STRING)
Clustered by(state) INTO 10 buckets 
row format delimited fields terminated by ‘,’ stored as textfile;

In the above partitioned query we are portioning table depending on the category and bucketing by 10 that means it will create 0-9 buckets and assign the hash value the same.

Column category no need to provide in table structure , Since we are creating partition based on the category

Insert existing table data into newly created partition table.

Hive>from transaction_records txn  INSERT OVERWRITE TABLE table transaction_recordsByCat PARTITION(category) select txn.txnno ,txn.txndate,txn.custno,txn.amount,
txn. product,txn.City,txn.State,txn.Spendby ,txn.category DISTRIBUTE BY category;

Static partition
If we get data every month to process the same, we can use the static partition

Hive> create table logmessage(name string,id int) partitioned by (year int,month int) row format delimited fileds terminated by ‘\t’;

How to insert data for static partition table?

Hive>alter table logmessage add partition(year=2014,month=2);

Custom Map Reduce script using Hive

Hive QL allows traditional map/reduce programmers to be able to plug I their custom mappers and reducers to do more sophisticated analysis that may not be supported by the built-in capabilities of the language.

Sample data scenario
We are having movie data, different users will give different ratings for same movie or different movies.

user_movie_data.txt file having data like belowuserid,rating,unixtime

1      1       134564324567
2      3       134564324567
3      1       134564324567
4      2       134564324567
5      2       134564324567
6      1       134564324567

Now with above data, we need to create a table called u_movie_data,then we will load the data to the same.

Hive>CREATE TABLE u_movie_data(userid INT,rating INT,unixtime STRING) ROW FORMATED DELIMITED FIELDS TERMINATED BY ‘\t’ STROED AS TEXTFILE;
Hive> LOAD DATA LOCAL INPATH ‘/usr/local/hive_demo/user_movie_data.txt’ OVERWRITE INTO TABLE u_movie_data;

We can use any logic which will be converted unix time into weekday, any custom integration. Here we used python script.

Import sys
Import datetime
for line in sys.stdin:
          line = line.strip()
         userid,movieid,rating,unixtime=line.split(‘\t’)
        weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
      print ‘\t’.join([userid,movieid,rating,str(weekday)])

How we will execute python script in hive, first add the file into Hive shell?

Hive> add FILE /usr/local/hive_demo/weekday_mapper.py;

Now load the data into table, we need to do TRANSFORM

INSERT OVERWRITE TABLE u_movie_data_new
       SELECT  TRANSFORM(userid,movieid,rating,unixtime)
      USING ‘python weekday_mapper.py’ 
      AS (userid,movieid,rating,weekday) from u_movie_data;

Hive QL- User-defined function
1.Suppose we have 2 columns – 1 is id of type string and another one is unixtimestamp of type String.
Create a data set with 2 columns(udf_input.txt) and place it inside /usr/local/hive_demo/

one,1456432145676
       two, 1456432145676
       three, 1456432145676
       four, 1456432145676
       five, 1456432145676
       six, 1456432145676

Now we can create a table and load the data the same.

create table udf_testing (id string,unixtimestamp string)
              Row format delimited fields terminated by ‘,’;
   Hive>  load data local inpath ‘/usr/local/hive_demo/udf_input.txt’
   Hive>select * from udf_testing;

Now we will write User defined function using java to get more meaningful date and time format.

Open eclipse->create new java project and New class- add the below code inside java class.
Add the jars from hive location.

Import java.util.Date;
Import java.text.DateFormat;
Import org.apache.hadoop.hive.ql.exec.UDF;
Import org.apache.hadoop.io.Text;
public class UnixTimeToDate extends UDF {
    public Text evaluate(Text text){
     if(text==null) return null;
        long timestamp = Long.parseLong(text.toString());
        return new Text(toDate(timestamp));
   }
private String toDate(long timestamp){
   Date date = new Date(timestamp*1000);
   Return DateFormat.getInstance().format(date).toString();
}
}

Once created, then export jar file as unixtime_to_java_date.jar
Now we need to execute jar file from Hive
1. We need to add the jar file in hive shell

Hive>add JAR /usr/local/hive_demo/ unixtime_to_java_date.jar;
      Hive>create temporary FUNCTION  userdate  AS  ‘UnixTimeToDate’;
      Hive> select id,userdate(unixtimestamp) from udf_testing;

This is how we will work with hive. Hope you like this post.
Thank you for viewing this post.

Monday, March 21, 2016

Apache Hive Advanced topics

This post will describe more concepts in Hive
Partitions:
1. How data is stored in HDFS
2. Grouping databases on some column
3. Can have one or more columns.
How partitioning will work?
Usually tables data will be stored in HDFS like below
/user/hive/warehouse//
/user/hive/warehouse//
/user/hive/warehouse//
/user/hive/warehouse//

If we know how data is coming from source of the file , If we implement filter condition using where condition
Then we will do the partitioning for the given data like below

/user/hive/warehouse///month-jan/ /user/hive/warehouse///month-feb/ /user/hive/warehouse///month-march/ /user/hive/warehouse///month-april/ Bucketing is used to improve the performance. What do we mean by Partitions? 1. Partitions means dividing a table into a coarse grained parts based on the value of a particular column such as date. 2. This make it faster to do queries on slices of the data.

Buckets or Clusters 1. Partitions divided further into buckets bases on some other column 2. Use for data sampling. Buckets:  1. Buckets give more extra structure to the data , that may be used for efficient queries.  2. A Join of two tables that are bucketed on the same columns – including the join column can be implemented as a Map Side Join.(Depending on hash value.)  3. Bucketing by user id means, we can easily and quickly evaluate a user based query by running it on a randomized sample of the total set of users. Now we will see how to work partition and bucketing 1. First create a table called transaction_records 2. For that, first create a database called retail Command: to create database

Hive> create database retail;

Command: to use database

Hive> use retail;

Now we need to create a table.

Hive> create table transaction_records(txnno INT,txndate STRING,custno INT,amount DOUBLE,category STRING, product STRING,City STRING,State String,Spendby String )
row format delimited fields terminated by ‘,’ stored as textfile;

How to load data into table?

Hive>  LOAD  DATA  LOCAL INPATH  ‘/usr/local/hive_demo/transaction/’  INTO  TABLE transaction_records;
Hive> select count(*) from transaction_records;

We can try different queries as like SQL. Ex: Aggregation: 1. select category,sum(amount) from transaction_records group by category; Grouping: 2. distinct(select (DISTINCT category ) from transaction_records; How to copy table data into another table or file or HDFS? 1. Insert output into another table

Insert overwite table results(select * from transaction_records);
 Create table results as select * from transaction_records;

2. Insert Output into local file.

Insert overwrite local directory ‘results’ select * from transaction_records;

3. Inserting output into HDFS

Insert overwrite directory  ‘/results’ select * from transaction_records;

How to write all queries in a single script file and execute the same? Hive Scripts are used to execute a set of Hive Commands collectively. This helps in reducing the time and effort invested in writing and executing each command manually. Hive support scripting from Hive 0.10.0 and above versions. Name file as hive_script.hql and place it where ever you like( here I keeping inside /usr/local/hive_demo/

use retail;
 create table transaction_records_script(txnno INT,txndate STRING,custno INT,amount DOUBLE,category STRING, product STRING,City STRING,State String,Spendby String )
row format delimited fields terminated by ‘,’ stored as textfile;
 LOAD  DATA  LOCAL INPATH  ‘/usr/local/hive_demo/transaction/’  INTO  TABLE transaction_records_ script;
Select count(*) from  transaction_records_ script;
select category,sum(amount) from  transaction_records group by category;

How to Run the hive script file. hive -f hive_script.hql OR hive -f hive_script.sql (if we named our script file as .sql then we can use this.) Hive Joins (table joining) Create a script to create tables called employee and email Before creating script we need to create 2 files(emp.txt,email.txt) and need to filled with data /usr/local/hive_demo/emp.txt

siva,56000,bangalore
raju,67000,chennai
arjun,25000,mumbai
sweety,54000,pune

/usr/local/hive_demo/email.txt

siva,siva@gmail.com
raju,raju@yahoo.com
arjun,arjun@aol.com
sweety,sweety@rediff.com
jatin,jatin@gmail.com
sneha,sneha@hotmail.com

Create a script to work with joining tables demo

Use retail;
Create table employee(name string,salary float,city string) row format delimited fields terminated  by ‘,’ ;
Load data local INPATH ‘/usr/local/hive_demo/emp.txt’ into table employee;
Create table email(name string,email string) row format delimited fields terminated by ‘,’;
Load data local inpath ‘/usr/local/hive_demo/email.txt’ into table email;

After creating the script now we need to run the hive_join_demo.hql file. hive -f hive_join_demo.hql Now we will work with joins: Inner join

Hive> select a.name,a.city,a.salary,b.email_id  from employee a  join email b on a.name=b.name;

It will display name,city ,salary and email id where matching condition between two tables; Left outer join

Hive> select a.name,a.city,a.salary,b.email_id  from employee a  LEFT OUTER join email b on a.name=b.name;

It will display all the records from first table and matching records from second table. Right outer join

Hive>select a.name,a.city,a.salary,b.email_id  from employee a  RIGHT OUTER join email b on a.name=b.name;

It will display all the records from second table and matching records from first table.

This is how we will work with hive sql joins.
Thank you very much for viewing this.

Monday, March 7, 2016

Getting started with Apache Hive

This post will explain below points.
1. How to install and configure Hive on Ubuntu.
2. How to create a table using HIVE.
3. How to load local data and HDFS external data.
4. Basic SQL commands usage in Hive

Step 1: Download latest hive tar file from the below link
https://hive.apache.org/downloads.html
Command: untar the file using below command

/usr/local> tar –xvzf  /usr/local/

Step 2: Once tar has been completed. Then we need to do some configurations to start the HIVE.

Command:to edit the bashrc file

sudo gedit  ~/.bashrc

Step 3: Add the below configuration detail in bashrc file

       export  HIVE_HOME=”/usr/local/ apache-hive-1.2.1-bin”
       export PATH= $PATH:$HIVE_HOME/bin
      export HADOOP_USER_CLASSPATH_TEST=true
     export PATH

Step 4: to avoid [ERROR] Terminal initialization failed; falling back to unsupported java.lang.IncompatibleClassChangeError: Found class jline.Terminal, but interface was expected at jline , below ling of configuration will help.

export HADOOP_USER_CLASSPATH_TEST=true

Step 5: We need to add configuration in hive-config.sh file.
Command : To add the hadoop home configuration in hive-config.sh

       cd  /usr/local/apache-hive-1.2.1-bin/bin
       sudo gedit hive-config.sh

Add the below configuration in hive-config.sh

       export HADOOP_HOME=/usr/local/hadoop

Step 6: Once above configurations completed then we need to start the hive

use hive keyword in terminal, then it will open the hive shell for you.

Step 7: This is how we will install and configure HIVE.
Now we are ready to work with HIVE.

Step 8: To know the databases available in hive?
Hive>show databases;
Step 9: To know the tables, which is available in hive?
Hive> show tables;
Step 10: How to create database in Hive?
Hive> create database cricket;
Step 11: How to use created database?
Hive> use cricket;
Step 12: How to create a table inside cricket database

       Hive> create table matchscore(
                                          match_name string,
                                          match_score int,
                                         match_location string
                                      ) row format delimited fields terminated by  ‘,’  ;

Now we have created database successfully. We need to verify whether database created or not.

open another terminal and go up to /user/local>

Step 13: How to Know the database created or not?
$usr/local> hadoop fs –ls /user/hive/warehouse

Step 14: How to Know the database table created or not?

$usr/local> hadoop fs –ls /user/hive/warehouse/cricket.db
Now we have created database and table successfully and verified the same.
We need to insert the data into respective tables.
Now How we will load the data into hive tables.

first create a file in local directory inside /usr/local/hive_demo , If hive_demo dir is not there then create the same.
Step 15: How to create file?
$usr/local/hive_demo> sudo gedit matchinfo.txt

Once we created this file, then we need to load the same into hive table, Go to HIVE shell

Step 16: How to load the data from local system to Hive table

    Hive> LOAD DATA  LOCAL INPATH  ‘/usr/local/hive_demo/matchinfo.txt’  INTO  TABLE matchscore;

Once we have loaded the file, if we want to check ,whether the file has been created inside respective database table or not
Go to terminal /usr/local
Step 17: How to check table data loaded into respective table or not?
$usr/local> hadoop fs –ls /user/hive/warehouse/cricket.db/matchscore

Step 18: How to verify the data has been loaded into Hive table or not
Hive>select * from matchscore;

This is how we will load the local data into Hive tables.
Now we need to check how will load HDFS data into HIVE tables
We can edit the existing file and add the more details to the matchinfo_details.txt file

Step 19: Create HDFS directory
$usr/local> hadoop fs –mkdir -p /usr/local/hive_demo/input

Step 20 :How to put a file in HDFS?

$usr/local>hadoop fs –put /usr/local/hive_demo/ matchinfo_details.txt /usr/local/hive_demo/input/

Now we have created hdfs directory and added the file into HDFS directory.
Step 21: How we will load data into Hive tables?

    Hive> create EXTERNAL table matchscore_result(
                                                      match_name string,
                                                       match_score int,
                                                        match_location string,
                                                       match_result    string)
                              row  format delimited fields terminated by  ‘,’
                               LOCATION ‘/usr/local/hive_demo/input’;

We have successfully loaded the external file data into Hive table.
to check the table data use the select * from matchscore_result from the Hive shell.
Advantage with this external loading is , if we modified the existing file and, again we have kept the updated file into HDFS,
then no need to load the data again into hive, simply we can use select * from matchscore_result. We will get the updated results.

Step 22: How to describe the table structure?
Hive> describe formatted matchscore;

Step 23: How to rename the existing table?
Hive> alter table matchscore rename to matchscore_altered;

Step 24: How to show the updated table list?
Hive> show tables;

This is how we can install and work with Hive basics.
Thank you for viewing this post.

Sunday, February 28, 2016

Apache Hive Basics

Hive Back ground

1. Hive Started at Facebook.
2. Data was collected by cron jobs every night into Oracle DB.
3. ETL via hand-coded python
4. Grew from 10s of GBs(2006) to 1TB/day new data in 2007 , now 10x that

Facebook usecase
1. Facebook uses more than 1000 million users
2. Data is more than 500 TB per day
3. More than 80k queries for day
4. More than 500 million photos per day.

5. Traditional RDBS will not the right solution, to do the above activities.
6. Hadoop Map Reduce is the one to solve this.
7. But Facebook developers having lack of java knowledge to code in Java.
8. They know only SQL well.
So They introduced Hive
Hive
1. Tables can be partitioned and bucketed.
Partitioned and bucketed are used for performance
2. Schema flexibility and evolution
3. Easy to plugin custom mapper reducer code
4. JDBC/ODBC Drivers are available.
5. Hive tables can be directly defined on HDFS
6. Extensible : Types , formats, Functions and scripts.
What do we mean by Hive
1. Data warehousing package built on top of hadoop.
2. Used for Data Analytics
3. Targeted for users comfortable with SQL.
4. It is same as SQL , and it will be called as HiveQL.
5. It is used for managing and querying for structured data.
6. It will hide the complexity of Hadoop
7. No need to learn java and Hadoop API’s
8. Developed by Facebook and contributed to community.
9. Facebook analyse Tera bytes of data using Hive.

Hive Can be defined as below
• Hive Defines SQL like Query language called QL
• Data warehouse infrastructure
• Allows programmers to plugin custom mappers and reducers.
• Provides tools to enable easy to data ETL
Where to use Hive or Hive Applications?
1. Log processing
2. Data Mining
3. Document Indexing
4. Customer facing business intelligence
5. Predective Modeling and hypothesis testing
Why we go for Hive
1. It is SQL like types and if we provide explicit schema and types.
2. By using Hive we can partition the data
3. It has own Thrift sever, we can access data from other places.
4. Hive will support serialization and deserialization
5. DFS access can be accessed implicitly.
6. It supports Joining , Ordering and Sorting
7. It will support own Shell hive script
8. It is having web interface
Hive Architecture

1. Hive data will be stored in Hadoop File System.
2. All Hive meta data like schema name, table structure,view name all the details will be stored in Metastore
3. We will Hive Driver, it will take the request and compile and convert into hadoop understanding language and execute the same.
4. Thrift server is will access hive and fetch data from DFS.

Hive Components

Hive Limitations
1. Not designed for online transaction processing.
2. Does not offer real time queries and row level updates
3. Latency for Hive query’s is high(It will take minutes to process)
4. Provides acceptable latency for interactive data browsing
5. It is not suitable for OLTP type applications.
Hive Query Language Abilities

What is the traditional RDBMS and Hive differences
1. Hive will not verify the data when it is loaded, but it is do at the time of query issued.
2. Schema on read makes very fast initial load. The file operation is just a file copy or move.
3. No updates , Transactions and indexes.
Hive support data types

Hive Complex types:
Complex types can be built up from primitive types and other composite types using the below operators.

Operators
1. Structs: It can be accessed using DOT(.) notation
2. Maps: (Kye-value tuples), it can be accessed using [element-name] as notation
3. Arrays: (Indexable lists) Elements can be accessed using the [n] notation, where n is an index (zero –based) into the array.
Hive Data Models
1. Data Bases
Namespaces – ex: finance and inventory database having Employee table 2 different databases
2. Tables
Schema in namespaces
3. Partitions
How data is stored in HDFS
Grouping databases on some columns
Can have one or more columns
4. Buckets and Clusters
Partitions divided further into buckets on some other column
Use for data sampling

Hive Data in the order of granularity

Buckets
Buckets give extra structure to the data that may be used for more efficient queries
A join of two tables that are bucketed on the same columns – including the join column can be implemented as Map Side Join
Bucketing by user ID means we can quickly evaluate a user based query by running it on a randomized sample of the total set of users.

These are the basics about Hive.

Thank you for viewing the post.

Thursday, February 25, 2016

Clickjacking prevention using X Frame Options and J2EE Filter

1. What is Clickjacking.
It is also known as User Interface redress attack, UI redress attack, UI redressing
It is a malicious technique of tricking a Web user into clicking on something different from what the user perceives they are clicking on, thus potentially revealing confidential information or taking control of their computer while clicking on seemingly innocuous web pages. It is a browser security issue that is a vulnerability across a variety of browsers and platforms
2. How to prevent Clickjacking using Filter in java
Below example shows how Clickjacking will happens and how we can prevent the same.

Here I have created a Simple LoginServlet , after successful login, page will be redirected to success page.
Everyone knows how to create servlet and deploy the same. But still I am writing here to understand who have no idea how to create.
Step 1: Start eclipse
Step2: create a Dynamic Web Project -> clickjacking_prevention
Step3: first we need to create a login.jsp page, under Webcontent of the project

<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
    pageEncoding="ISO-8859-1"%>




Login page


              User Name                   
          Password

Step 4: Need to create a success page

<%@ page language="java" contentType="text/html; charset=ISO-8859-1"
    pageEncoding="ISO-8859-1"%>




Login Success


                 Login Successful        
You can construct page as you like

Step 5: Now we need to create a LoginServlet

package com.siva;

import java.io.IOException;

import javax.servlet.ServletException;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;

public class LoginServlet extends HttpServlet{

 /**
  * 
  */
 private static final long serialVersionUID = 1L;

 public void doPost(HttpServletRequest request, HttpServletResponse response)
   throws ServletException, IOException {

  String username = request.getParameter("username");
  String password = request.getParameter("password");
  if("siva".equalsIgnoreCase(username)&& "raju".equalsIgnoreCase(password)){
   System.out.println("inside if condition");
   response.sendRedirect("loginSuccess.jsp");
  }
 }
}

Step 6: Now we need to do Configuration in web.xml for LoginServlet



  clickjacking_prevention
  
    login.jsp
   
  
    
  
    LoginServlet
    com.siva.LoginServlet
  
  
   LoginServlet
   /loginServlet

Step 7: Once this configuration done, Now we can run the project using any of the servers like Apache tomcat or Jboss.
You can use the http://localhost:8080/clickjacking_prevention/

It will open page like above and you can enter username as siva and password as raju, then submit,
You can redirected to loginSuccess page

Create a html file and provide name as you like and paste the below code.



  click jaking

Once we run this html file we can see the same data which is showed in the loginSuccess page

Step 10 : Now we can see the difference between above two images. One is url page and one is iframe constructed page, both are same.
So hacker can use this , and patch in your actual site and steal the data.
Now How to prevent this.
We need to add this code in our filter or jsp page.
response.addHeader("X-FRAME-OPTIONS", “DENY” );
Here I have written Filter to overcome clickjacking

package com.siva;

import java.io.IOException;

import javax.servlet.Filter;
import javax.servlet.FilterChain;
import javax.servlet.FilterConfig;
import javax.servlet.ServletException;
import javax.servlet.ServletRequest;
import javax.servlet.ServletResponse;
import javax.servlet.http.HttpServletResponse;



public class ClickjackingPreventionFilter implements Filter 
{
  private String mode = "DENY";
  
// Add X-FRAME-OPTIONS response header to tell any other browsers who   not to display this //content in a frame.
     public void doFilter(ServletRequest request, ServletResponse response, FilterChain chain) throws IOException, ServletException {
         HttpServletResponse res = (HttpServletResponse)response;
         res.addHeader("X-FRAME-OPTIONS", mode );   
         chain.doFilter(request, response);
     }
     public void destroy() {
     }
     
     public void init(FilterConfig filterConfig) {
         String configMode = filterConfig.getInitParameter("mode");
         if ( configMode != null ) {
             mode = configMode;
         }
     }
}

Step 11: Once Filter has completed now we need to add same filter configuration in web.xml file


        ClickjackPreventionFilterDeny
        com.siva.ClickjackingPreventionFilter
        
            modeDENY
    
    
    
     
        ClickjackPreventionFilterDeny
        /*

Once we have done configuration , you can run the same Iframe example again, you can see the below page without any content, it will show warning in IE and it will not show any details in other browser.

This is how we can prevent the clickjacking attacks.
Thank you for viewing the post.

Sunday, February 14, 2016

Getting started Hadoop with oracle or vmware virtual box and Ubuntu

Hadoop installation with Single DataNode( VMware or Oracle virtual box)
Download latest version VM ware from the below link
http://www.traffictool.net/vmware/
Download Oracle virtual box from the below site and install the same in local system.
http://www.oracle.com/technetwork/server-storage/virtualbox/downloads/index.html
Run the Virtual box(VirtualBox.exe) Application
click on new ->

And click on Next->Next-And create virtual box
Once that’s done virtual box will look like this. Select the Ubuntu downloaded package.

Start the Virtual box, then provide password from which user you want to start.

Once virtual box started then screeb will look like this

Open the terminal, by right click on the screen or search for terminal and open the same.

Command:to update the ubuntu
1. sudo apt-get update
Once update is complete

Command: install openssh server
2. sudo apt-get install openssh–server
Command: create a hadoop directory
3. mkdir /usr/local/hadoop
Download the hadoop latest version from below link
http://hadoop.apache.org/releases.html
copy to virtual box and extract the tar file
Here I extracted under /usr/local/hadoop/
Command: to extract the tar file
4. tar -xvf .tar.gz
After extracting enter this command ls –lrt , you can see the list of folders related to hadoop
Command: To add hadoop to the group
5. sudo addgroup hadoop
Command: create new user called hduser
6. sudo adduser --ingroup hadoop hduser

Command: assign hduser to sudo
7. sudo adduser hduser sudo
Command: change the owner for hadoop as hduser
8. sudo chown –R hduser:hadoop /usr/local/hadoop
Command: switch to hduser
9. su – hduser

Command: install ssh
10. sudo apt-get install ssh
Command: generate a ssh key
11. ssh-keygen -t rsa –P ""
/home/hduser/.ssh/id_rsa
Command: copy id_rsa.pub key to authorized_keys
12. cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Command: install vim editor
13. sudo apt-get install vim
Command: Edit the sysctl.conf file to dispable few of the ipv6 realted configuration
14. sudo gedit /etc/sysctl.conf or sudo vi /etc/sysctl.conf
Add below lines

net.ipv6.conf.all.disable_ipv6=1
                   net.ipv6.conf.default_ipv6=1
                  net.ipv6.conf.io.disable_ipv6=1

Command:Start the ssh
15. ssh localhost
Command: get the updates
16. sudo apt-get update
Command: edit the bashrc file to add the path of java and hadoop
17. sudo vi ./bashrc or sudo gedit ./bashrc

export  HADOOP_HOME = /usr/local/hadoop
         export  JAVA_HOME=/usr   [or] where ever your java installed location

Command: Source the bashrc file
18. source .bashrc

Command: Now check the version of java and hadoop
19. java –version
20. hadoop version
Command:Create a data directory inside /usr/local/hadoop
21. mkdir /usr/loca/hadoop/data
Command: edit the hadoop_env.sh file to add the configuration
22. sudo gedit /usr/loca/hadoop/etc/hadoop/hadoop_env.sh

export JAVA_HOME=/usr 
        export HADOOP_OPTS=”$HADOOP_OPTS –Djava.net.preferIPv4Stack= true  -Djava.library.path=$HADOOP_PREFIX/lib”

Command: edit the yarn_env.sh file to add the configuration
23. sudo gedit /usr/loca/hadoop/etc/hadoop/yarn_env.sh

export HADOOP_CONF_LIB_NATIVE_DIR=${HADOOP_PREFIX:-“lib/native”}
        export HADOOP_OPTS=” Djava.library.path=$HADOOP_PREFIX/lib”

Now we need to edit the some of the hadoop related files, to start the single node
Go to /usr/local/hadoop/etc/hadoop$
Command: Edit the existing file and add the below configuration
24. sudo gedit core-site.xml


fs.default.name
hdfs://localhost:9000


hadoop.tmp.dir
/usr/local/hadoop/data

Command: Rename mapred-site.xml.template to mapred-site.xml
Go to /usr/local/hadoop/etc/hadoop
25. mv mapred-site.xml.template mapred-site.xml
26. sudo gedit mapred-site.xml


    
       mapreduce.framework.name
       yarn

Then close this file
Edit the hdfs-site.xml,
Command: to edit the hdfs-site.xml
27. sudo gedit hdfs-site.xml


dfs.replication
3

Command:Edit the yarn.xml
28. sudo gedit yarn.xml


   
       yarn.nodemanager.aux-services 
       mapreduce_shuffle
  
 
        yarn.nodemanager.aux-services.mapreduce_shuffle.class
       org.apache.hadoop.mapred.ShuffleHandler
  


       yarn.resourcemanager.resource-tracker.address
       localhost:8025
  


        yarn.resourcemanager.scheduler.address
       localhost:8030
  


        yarn.resourcemanager.address
       localhost:8050

Command: Need to format the namenode
29. /usr/local/hadoop/bin/hadoop namenode –format
After this format done then we need to start the dfs and yarn
30. /usr/local/hadoop/sbin/start-dfs.sh
31. /usr/local/hadoop/sbin/start-yan.sh
Command: to display all the running datanodes and namemodes
32. jps

This is how we can setup the hadoop using oracle/vmware virtual box.

Thank you for viewing this post.

Saturday, February 6, 2016

Getting started with web2py using pythonanywhere web hosting service (cloud)

This post tells you, how to deploy and execute python code written using web2py framework in
pythonanywhere web hosting service kind of cloud for python.
First write any sample code using web2py framework. Sample codes you can check my previous posts like getting started with web2py , blog app using web2py

Once you have completed the simple project, then we need to deploy the same using the cloud.
Web2py can be deployed any web hosting services, now we will look how we can deploy using https://www.pythonanywhere.com/

Click on Signup here! (if you are not sign up yet)

Click on Create a Beginner account and provide the required details, it is enough to post any details in internet.

After successful sign up and Login then, you can redirected to pythonanywhere

Click on DashBoard Then Click on Web

Now we need to create new web app (Click on Add anew web app).

Pythonanywhere will support so many python frameworks, Select the web2py , since we are implementing application using web2py.

Provide the password , which is required to access the application. It will created directory (/home/siva82k/web2py/) with my username.
Click on Next, which will create url for you

Now you can check your application through internet, usually welcome application will be copied to your account.
My case my url will be http://siva82k.pythonanywhere.com, if you try to click on this you can redirected to your application.

Now it’s time to deploy our existing code into pythonanywhere site. First go to our application, where we have written our code, click on pack all as shown in below image.

Then save the code in local system.Once it is completed then go to your pythonanywhere site.
Click on Administrative Interface, and the provide the password, which you have given while creating web2py account.

After successful Login, it will redirected to below page

We need to upload the file into python anywhere site . Provide the details under upload and install packed application
I am providing Application name as sivaweb2py
Upload a package from your local system, Earlier where you have downloaded.
Then click on install, our application got installed on pythonanywhere machine.
Now what ever we did in local host machine same thing available in pythonanywhere internet.
You can check my previous posts related to web2py examples.
Earlier we checked Role based access, same thing we will check in pythonanywhere
Click on https://siva82k.pythonanywhere.com/sivaweb2py/blog/view

It will ask the username and password , in my case I have provided user only (siva82k@gmail.com), have access to post the blog.
https://siva82k.pythonanywhere.com/sivaweb2py/blog/post

If we provide correct user name and password, it will take us to post the blog.

If we provide other details, other than post access then it will say you are not authorized.

One more example which we have worked earlier, basics to add the 2 numbers
https://siva82k.pythonanywhere.com/sivaweb2py/basics/request_args/10/20

We can test whatever we did in our previous examples in local system, same thing available in internet.
This is how we can deploy our web2py code using pythonanywhere.
Thanks for viewing this post