Search This Blog

Follow by Email

Wednesday, August 8, 2018

Getting started Machine Learning Using R on Windows environment Step By Step Process.


This post will explain us, how to install R on windows environment and how to work with Machine learning project using R with simple dataset

First Download R https://cran.r-project.org/bin/windows/base/ from this link.
You can download latest version of R.
1. Once download completes, then install the same in your machine. This is like any other software installation. There is no special instructions required for this.
2. After successful installation we need to setup the path- go to MyComputer-RightClick- environment variables- System variables-
C:\Program Files\R\R-3.5.1\bin


3. After setting up the path, Now we need to start the R
4. Go to command prompt and Type R
5. Now we can see the simple R terminal


6. Now we will understand what is machine learning? and what is datasets?
7. When we are applying machine learning to our own datasets, we are working on a project.
The process of a machine learning project may not be linear, but there are a number of well-known steps:

Define Problem.
Prepare Data.
Evaluate Algorithms.
Improve Results.
Present Results.

8.The best way to really come in terms with a new platform or tool is to work through a machine learning project end-to-end and cover the key steps.
Namely, from loading data, summarizing your data, evaluating algorithms and making some predictions.
Machine Learning using R Step By Step
Now this the time to work simple machine learning program using R and inbuilt dataset called iris

We already installed R and it has started.
Install any default packages using following syntax.


Packages are third party add-ons or libraries that we can use in R.

install.packages("caret")
//While installing the package, after typing the above command , it will ask us for select mirror, you can select default one.
  install.packages(“caret”,dependencies=c(“Depends”,”Suggests”))
  install.packages(“ellipse”)
//Load the package, which we are going to use.
libray(caret)
Load the data from inbuilt data and rename the same using following syntax.
// Attach iris dataset to the current environment
  data(iris)
// Rename iris dataset  to dataset
 dataset <- iris
Now iris data loaded in R and accessible with variable called dataset Now we will create validation dataset. We will split the loaded dataset into two, 80% of which we will use to train our models and 20% that we will hold back as a validation dataset.
//  We create a list of 80% of the rows in the original dataset we can use for training
validation_index <- createDataPartition(dataset$Species, p=0.80, list=FALSE)
//select 20% of the data for validation
validation <- dataset[-validation_index,]
//use the remaining 80% of data to training and testing the models
dataset <- dataset[validation_index,]

          
Now we have training data in the dataset variable and a validation set we will use later in the validation variable. Note that we replaced our dataset variable with the 80% sample of the dataset. 1. dim function We can get a quick idea of how many instances (rows) and how many attributes (columns) the data contains with the dim function.
dim(dataset)
2. Attribute types - Knowing the types is important as it will give us an idea of how to better summarize the data we have and the types of transforms we might need to use to prepare the data before we model it.
sapply(dataset,class)
3. head function used to display the first five rows.
head(dataset)
4. The class variable is a factor. A factor is a class that has multiple class labels or levels
levels(dataset$Species)
5. Class Distribution Let’s now take a look at the number of instances (rows) that belong to each class. We can view this as an absolute count and as a percentage. 6. Summary of each Attribute
    summary(dataset)
Visualize Dataset We now have a seen the basic details about the data. We need to extend that with some visualizations. We are going to look at two types of plots: 1. Univariate plots to better understand each attribute. 2. Multivariate plots to better understand the relationships between attributes. First we will see the Univariate plots, this is for each individual variable. Input attributes x and the output attributes y.
  //Split input and output
    x <- dataset[,1:4]
    y <- dataset[,5]
Given that the input variables are numeric, we can create box and whisker plots of each.
   par(mfrow=c(1,4))
   for(i in 1:4) {
   boxplot(x[,i], main=names(iris)[i])
 }
 
We can also create a barplot of the Species class variable to get a graphical representation of the class distribution (generally uninteresting in this case because they’re even).
plot(y)
This confirms what we learned in the last section, that the instances are evenly distributed across the three class: Multivariate Plots First let’s look at scatterplots of all pairs of attributes and color the points by class. In addition, because the scatterplots show that points for each class are generally separate, we can draw ellipses around them.
featurePlot(x=x,y=y,plot=”ellipse”)
We can also look at box and whisker plots of each input variable again, but this time broken down into separate plots for each class. This can help to tease out obvious linear separations between the classes.
featurePlot(x=x,y=y,plot=”box”)
Next we can get an idea of the distribution of each attribute, again like the box and whisker plots, broken down by class value. Sometimes histograms are good for this, but in this case we will use some probability density plots to give nice smooth lines for each distribution.
// density plots for each attribute by class value
scales <- list(x=list(relation="free"), y=list(relation="free"))
featurePlot(x=x, y=y, plot="density", scales=scales)
Evaluating the Algorithms Set-up the test harness to use 10-fold cross validation. We will split our dataset into 10 parts, train in 9 and test on 1 and release for all combinations of train –test splits. We will also repeat the process 3 times for each algorithm with different splits of the data into 10 groups We are using the metric of “Accuracy” to evaluate models. This is a ratio of the number of correctly predicted instances in divided by the total number of instances in the dataset multiplied by 100 to give a percentage (e.g. 95% accurate). We will be using the metric variable when we run build and evaluate each model next.
control <- tarinControl(method=”csv”,number=10)
     metric <- “Accuarcy”
Build 5 different models to predict species from flower measurements Linear Discriminant Analysis (LDA) Classification and Regression Trees (CART). k-Nearest Neighbors (kNN). Support Vector Machines (SVM) with a linear kernel. Random Forest (RF)
set.seed(7)
fit.lda <- train(Species~., data=dataset, method="lda", metric=metric, trControl=control)
# b) nonlinear algorithms
# CART
set.seed(7)
fit.cart <- train(Species~., data=dataset, method="rpart", metric=metric, trControl=control)
# kNN
set.seed(7)
fit.knn <- train(Species~., data=dataset, method="knn", metric=metric, trControl=control)
# c) advanced algorithms
# SVM
set.seed(7)
fit.svm <- train(Species~., data=dataset, method="svmRadial", metric=metric, trControl=control)
# Random Forest
set.seed(7)
fit.rf <- train(Species~., data=dataset, method="rf", metric=metric, trControl=control)






We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed using exactly the same data splits. It ensures the results are directly comparable. Select the best model. We now have 5 models and accuracy estimations for each. We need to compare the models to each other and select the most accurate. We can report on the accuracy of each model by first creating a list of the created models and using the summary function.
# summarize accuracy of models
results <- resamples(list(lda=fit.lda, cart=fit.cart, knn=fit.knn, svm=fit.svm, rf=fit.rf))
summary(results)

We can also create a plot of the model evaluation results and compare the spread and the mean accuracy of each model. There is a population of accuracy measures for each algorithm because each algorithm was evaluated 10 times (10 fold cross validation)
dotplot(results)
The results can be summarized. This gives a nice summary of what was used to train the model and the mean and standard deviation (SD) accuracy achieved, specifically 97.5% accuracy +/- 4% How to Predictions using predict and confusion Matrix The LDA was the most accurate model. Now we want to get an idea of the accuracy of the model on our validation set. This will give us an independent final check on the accuracy of the best model. It is valuable to keep a validation set just in case you made a slip during such as overfitting to the training set or a data leak. Both will result in an overly optimistic result. We can run the LDA model directly on the validation set and summarize the results in a confusion matrix.
predictions <- predict(fit.lda,validation)
    confusionMatrix(predictions,validation$Species)
   

54 comments:

  1. Anybody with an expository twisted of psyche can turn into an information researcher after a SAS or SPSS preparing. data science course in pune

    ReplyDelete
  2. Thanks for sharing this info,it is very helpful.
    guidewire tutorial

    ReplyDelete
  3. What worth does AI bring and in what manner will it change the job that people play in the workforce? Here are some potential answers:
    machine learning course

    ReplyDelete
  4. Attend The Machine Learning course in Bangalore From ExcelR. Practical Machine Learning course in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course in Bangalore.
    Machine Learning course in Bangalore

    ReplyDelete
  5. Well, the most on top staying topic is Data Science. Data science is one of the most promising technique in the growing world. I would like to add Data science training to the preference list. Out of all, Data science course in Mumbai is making a huge difference all across the country. Thank you so much for showing your work and thank you so much for this wonderful article.

    ReplyDelete
  6. Thank you so much for helping me out to find the Data Analytics Course in Mumbai
    Organisations and introducing reputed stalwarts in the industry dealing with data analyzing & assorting it in a structured and precise manner. Keep up the good work. Looking forward to view more from you.

    ReplyDelete
  7. Such a very useful article. I have learn some new information.thanks for sharing.
    data scientist course in mumbai

    ReplyDelete
  8. Nice article, which you have shared here about the machine learning. Your article is very interesting and useful for those who are interested to learn machine learning. Thanks for sharing this article here. machine learning summer training in jaipur

    ReplyDelete
  9. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
    Data Analytics Course in Mumbai

    ReplyDelete
  10. Nice Blog...Very interesting to read this article. I have learn some new information.thanks for sharing.
    ExcelR Mumbai

    ReplyDelete
  11. This is also a very good post which I really enjoyed reading. It is not every day that I have the possibility to see something like this..
    ExcelR data science

    ReplyDelete
  12. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    ExcelR Data Analytics courses

    ReplyDelete
  13. This is an awesome blog. Really very informative and creative contents. This concept is a good way to enhance the knowledge. Thanks for sharing.
    ExcelR business analytics course

    ReplyDelete
  14. Attend The PMP Certification From ExcelR. Practical PMP Certification Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The PMP Certification.
    ExcelR PMP Certification

    ReplyDelete
  15. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
    ExcelR data analytics courses

    ReplyDelete
  16. I have to search sites with relevant information on given topic and provide them to teacher our opinion and the article.
    data analytics courses

    ReplyDelete
  17. I have a mission that I’m just now working on, and I have been at the lookout for such information.
    Please check ExcelR Data Science Courses

    ReplyDelete
  18. This post is very simple to read and appreciate without leaving any details out. Great work! data science courses

    ReplyDelete
  19. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    data analytics courses
    business analytics courses

    ReplyDelete
  20. I am looking for and I love to post a comment that "The content of your post is awesome" Great work!

    Digital marketing course

    ReplyDelete
  21. One stop solution for getting dedicated and transparent Digital Marketing services and We take care of your brands entire digital presence.
    The digital marketing services we provide includes SEO, SEM, SMM, online reputation management, local SEO, content marketing, e-mail marketing, conversion rate optimization, website development, pay per click etc. We will definitely promote your brand, product and services at highest position with consistency in order to generate more revenue for your business.Digital Marketing Company



    ReplyDelete
  22. Frux Infotech Proves to be the best solutions for you. We offer the web services,Mobile App development,Android apps,IOS apps, Android developmentSoftware Development,SearchEngineoptimization[SEO], Web promotions,link building. So make Your Business Strong & growth With us.Mobile App Development Services in Vizag

    ReplyDelete
  23. Frux Infotech Proves to be the best solutions for you. We offer the web services,Mobile App development,Android apps,IOS apps, Android developmentSoftware Development,SearchEngineoptimization[SEO], Web promotions,link building. So make Your Business Strong & growth With us.Mobile App Development in Vizag

    ReplyDelete
  24. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    Pediatric dentists

    ReplyDelete
  25. Impressive! I finally found a great post here. Nice article on data science . It's really a nice experience to read your post. Thanks for sharing your innovative ideas to our vision.
    Data Science Course
    Data Science Course in Marathahalli
    Data Science Course Training in Bangalore

    ReplyDelete
  26. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    Know more Data Scientist Course

    ReplyDelete
  27. This is a wonderful article, Given so much info in it, Thanks for sharing. CodeGnan offers courses in new technologies and makes sure students understand the flow of work from each and every perspective in a Real-Time environmen python training in vijayawada. , data scince training in vijayawada . , java training in vijayawada. ,

    ReplyDelete
  28. The information provided on the site is informative. Looking forward for more such blogs. Thanks for sharing .
    Artificial Inteligence course in Chandigarh
    AI Course in Chandigarh

    ReplyDelete
  29. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
    Data science Interview Questions

    ReplyDelete
  30. Your work is particularly good, and I appreciate you and hopping for some more informative posts
    Know more about Data Analytics

    ReplyDelete
  31. I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
    More Info of Machine Learning

    ReplyDelete
  32. Attend The Data Science Courses From ExcelR. Practical Data Science Courses Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Data Science Courses.
    Data Science Courses
    Data Science Interview Questions

    ReplyDelete
  33. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
    Data science Interview Questions
    Data Science Course

    ReplyDelete
  34. I have to search sites with relevant information ,This is a
    wonderful blog,These type of blog keeps the users interest in
    the website, i am impressed. thank you.
    machine learning course in hyderabad

    ReplyDelete
  35. I have to search sites with relevant information ,This is a
    wonderful blog,These type of blog keeps the users interest in
    the website, i am impressed. thank you.
    machine learning course in hyderabad

    ReplyDelete
  36. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries. keep it up.
    data analytics course in Bangalore

    ReplyDelete
  37. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries. keep it up.
    data analytics course in Bangalore

    ReplyDelete
  38. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
    Data Science Course

    ReplyDelete
  39. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression

    ReplyDelete
  40. First of all thanks for your excellent sample.. That's what I look for exactly. However, I don't know how I can get the checked items by using this adapter?.. share more details. guys.
    Ai & Artificial Intelligence Course in Chennai
    PHP Training in Chennai
    Ethical Hacking Course in Chennai Blue Prism Training in Chennai
    UiPath Training in Chennai

    ReplyDelete
  41. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression

    ReplyDelete
  42. Thanks for your nice post, i am interested to learn online freelancing, but firstly i have to learn computer , could you suggest me please which computer training center best.







    Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery

    ReplyDelete
  43. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

    Data Science In Banglore With Placements
    Data Science Course In Bangalore
    Data Science Training In Bangalore
    Best Data Science Courses In Bangalore
    Data Science Institute In Bangalore

    Thank you..

    ReplyDelete
  44. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple linear regression
    data science interview questions

    ReplyDelete

AddToAny

Contact Form

Name

Email *

Message *