Wednesday, August 8, 2018

Getting started Machine Learning Using R on Windows environment Step By Step Process.


This post will explain us, how to install R on windows environment and how to work with Machine learning project using R with simple dataset

First Download R https://cran.r-project.org/bin/windows/base/ from this link.
You can download latest version of R.
1. Once download completes, then install the same in your machine. This is like any other software installation. There is no special instructions required for this.
2. After successful installation we need to setup the path- go to MyComputer-RightClick- environment variables- System variables-
C:\Program Files\R\R-3.5.1\bin


3. After setting up the path, Now we need to start the R
4. Go to command prompt and Type R
5. Now we can see the simple R terminal


6. Now we will understand what is machine learning? and what is datasets?
7. When we are applying machine learning to our own datasets, we are working on a project.
The process of a machine learning project may not be linear, but there are a number of well-known steps:

Define Problem.
Prepare Data.
Evaluate Algorithms.
Improve Results.
Present Results.

8.The best way to really come in terms with a new platform or tool is to work through a machine learning project end-to-end and cover the key steps.
Namely, from loading data, summarizing your data, evaluating algorithms and making some predictions.
Machine Learning using R Step By Step
Now this the time to work simple machine learning program using R and inbuilt dataset called iris

We already installed R and it has started.
Install any default packages using following syntax.


Packages are third party add-ons or libraries that we can use in R.

install.packages("caret")
//While installing the package, after typing the above command , it will ask us for select mirror, you can select default one.
  install.packages(“caret”,dependencies=c(“Depends”,”Suggests”))
  install.packages(“ellipse”)
//Load the package, which we are going to use.
libray(caret)
Load the data from inbuilt data and rename the same using following syntax.
// Attach iris dataset to the current environment
  data(iris)
// Rename iris dataset  to dataset
 dataset <- iris
Now iris data loaded in R and accessible with variable called dataset Now we will create validation dataset. We will split the loaded dataset into two, 80% of which we will use to train our models and 20% that we will hold back as a validation dataset.
//  We create a list of 80% of the rows in the original dataset we can use for training
validation_index <- createDataPartition(dataset$Species, p=0.80, list=FALSE)
//select 20% of the data for validation
validation <- dataset[-validation_index,]
//use the remaining 80% of data to training and testing the models
dataset <- dataset[validation_index,]

          
Now we have training data in the dataset variable and a validation set we will use later in the validation variable. Note that we replaced our dataset variable with the 80% sample of the dataset. 1. dim function We can get a quick idea of how many instances (rows) and how many attributes (columns) the data contains with the dim function.
dim(dataset)
2. Attribute types - Knowing the types is important as it will give us an idea of how to better summarize the data we have and the types of transforms we might need to use to prepare the data before we model it.
sapply(dataset,class)
3. head function used to display the first five rows.
head(dataset)
4. The class variable is a factor. A factor is a class that has multiple class labels or levels
levels(dataset$Species)
5. Class Distribution Let’s now take a look at the number of instances (rows) that belong to each class. We can view this as an absolute count and as a percentage. 6. Summary of each Attribute
    summary(dataset)
Visualize Dataset We now have a seen the basic details about the data. We need to extend that with some visualizations. We are going to look at two types of plots: 1. Univariate plots to better understand each attribute. 2. Multivariate plots to better understand the relationships between attributes. First we will see the Univariate plots, this is for each individual variable. Input attributes x and the output attributes y.
  //Split input and output
    x <- dataset[,1:4]
    y <- dataset[,5]
Given that the input variables are numeric, we can create box and whisker plots of each.
   par(mfrow=c(1,4))
   for(i in 1:4) {
   boxplot(x[,i], main=names(iris)[i])
 }
 
We can also create a barplot of the Species class variable to get a graphical representation of the class distribution (generally uninteresting in this case because they’re even).
plot(y)
This confirms what we learned in the last section, that the instances are evenly distributed across the three class: Multivariate Plots First let’s look at scatterplots of all pairs of attributes and color the points by class. In addition, because the scatterplots show that points for each class are generally separate, we can draw ellipses around them.
featurePlot(x=x,y=y,plot=”ellipse”)
We can also look at box and whisker plots of each input variable again, but this time broken down into separate plots for each class. This can help to tease out obvious linear separations between the classes.
featurePlot(x=x,y=y,plot=”box”)
Next we can get an idea of the distribution of each attribute, again like the box and whisker plots, broken down by class value. Sometimes histograms are good for this, but in this case we will use some probability density plots to give nice smooth lines for each distribution.
// density plots for each attribute by class value
scales <- list(x=list(relation="free"), y=list(relation="free"))
featurePlot(x=x, y=y, plot="density", scales=scales)
Evaluating the Algorithms Set-up the test harness to use 10-fold cross validation. We will split our dataset into 10 parts, train in 9 and test on 1 and release for all combinations of train –test splits. We will also repeat the process 3 times for each algorithm with different splits of the data into 10 groups We are using the metric of “Accuracy” to evaluate models. This is a ratio of the number of correctly predicted instances in divided by the total number of instances in the dataset multiplied by 100 to give a percentage (e.g. 95% accurate). We will be using the metric variable when we run build and evaluate each model next.
control <- tarinControl(method=”csv”,number=10)
     metric <- “Accuarcy”
Build 5 different models to predict species from flower measurements Linear Discriminant Analysis (LDA) Classification and Regression Trees (CART). k-Nearest Neighbors (kNN). Support Vector Machines (SVM) with a linear kernel. Random Forest (RF)
set.seed(7)
fit.lda <- train(Species~., data=dataset, method="lda", metric=metric, trControl=control)
# b) nonlinear algorithms
# CART
set.seed(7)
fit.cart <- train(Species~., data=dataset, method="rpart", metric=metric, trControl=control)
# kNN
set.seed(7)
fit.knn <- train(Species~., data=dataset, method="knn", metric=metric, trControl=control)
# c) advanced algorithms
# SVM
set.seed(7)
fit.svm <- train(Species~., data=dataset, method="svmRadial", metric=metric, trControl=control)
# Random Forest
set.seed(7)
fit.rf <- train(Species~., data=dataset, method="rf", metric=metric, trControl=control)






We reset the random number seed before reach run to ensure that the evaluation of each algorithm is performed using exactly the same data splits. It ensures the results are directly comparable. Select the best model. We now have 5 models and accuracy estimations for each. We need to compare the models to each other and select the most accurate. We can report on the accuracy of each model by first creating a list of the created models and using the summary function.
# summarize accuracy of models
results <- resamples(list(lda=fit.lda, cart=fit.cart, knn=fit.knn, svm=fit.svm, rf=fit.rf))
summary(results)

We can also create a plot of the model evaluation results and compare the spread and the mean accuracy of each model. There is a population of accuracy measures for each algorithm because each algorithm was evaluated 10 times (10 fold cross validation)
dotplot(results)
The results can be summarized. This gives a nice summary of what was used to train the model and the mean and standard deviation (SD) accuracy achieved, specifically 97.5% accuracy +/- 4% How to Predictions using predict and confusion Matrix The LDA was the most accurate model. Now we want to get an idea of the accuracy of the model on our validation set. This will give us an independent final check on the accuracy of the best model. It is valuable to keep a validation set just in case you made a slip during such as overfitting to the training set or a data leak. Both will result in an overly optimistic result. We can run the LDA model directly on the validation set and summarize the results in a confusion matrix.
predictions <- predict(fit.lda,validation)
    confusionMatrix(predictions,validation$Species)
   

46 comments:

  1. Anybody with an expository twisted of psyche can turn into an information researcher after a SAS or SPSS preparing. data science course in pune

    ReplyDelete
  2. What worth does AI bring and in what manner will it change the job that people play in the workforce? Here are some potential answers:
    machine learning course

    ReplyDelete
  3. Attend The Machine Learning course in Bangalore From ExcelR. Practical Machine Learning course in Bangalore Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Machine Learning course in Bangalore.
    Machine Learning course in Bangalore

    ReplyDelete
  4. Well, the most on top staying topic is Data Science. Data science is one of the most promising technique in the growing world. I would like to add Data science training to the preference list. Out of all, Data science course in Mumbai is making a huge difference all across the country. Thank you so much for showing your work and thank you so much for this wonderful article.

    ReplyDelete
  5. Thank you so much for helping me out to find the Data Analytics Course in Mumbai
    Organisations and introducing reputed stalwarts in the industry dealing with data analyzing & assorting it in a structured and precise manner. Keep up the good work. Looking forward to view more from you.

    ReplyDelete
  6. Nice article, which you have shared here about the machine learning. Your article is very interesting and useful for those who are interested to learn machine learning. Thanks for sharing this article here. machine learning summer training in jaipur

    ReplyDelete
  7. Nice Blog...Very interesting to read this article. I have learn some new information.thanks for sharing.
    ExcelR Mumbai

    ReplyDelete
  8. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.
    ExcelR Data Analytics courses

    ReplyDelete
  9. This post is very simple to read and appreciate without leaving any details out. Great work! data science courses

    ReplyDelete
  10. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.
    data analytics courses
    business analytics courses

    ReplyDelete
  11. One stop solution for getting dedicated and transparent Digital Marketing services and We take care of your brands entire digital presence.
    The digital marketing services we provide includes SEO, SEM, SMM, online reputation management, local SEO, content marketing, e-mail marketing, conversion rate optimization, website development, pay per click etc. We will definitely promote your brand, product and services at highest position with consistency in order to generate more revenue for your business.Digital Marketing Company



    ReplyDelete
  12. Frux Infotech Proves to be the best solutions for you. We offer the web services,Mobile App development,Android apps,IOS apps, Android developmentSoftware Development,SearchEngineoptimization[SEO], Web promotions,link building. So make Your Business Strong & growth With us.Mobile App Development Services in Vizag

    ReplyDelete
  13. Frux Infotech Proves to be the best solutions for you. We offer the web services,Mobile App development,Android apps,IOS apps, Android developmentSoftware Development,SearchEngineoptimization[SEO], Web promotions,link building. So make Your Business Strong & growth With us.Mobile App Development in Vizag

    ReplyDelete
  14. Great post i must say and thanks for the information. Education is definitely a sticky subject. However, is still among the leading topics of our time. I appreciate your post and look forward to more.

    Pediatric dentists

    ReplyDelete
  15. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
    Know more Data Scientist Course

    ReplyDelete
  16. wonderful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article. This article resolved my all queries.
    Data Science Course

    ReplyDelete
  17. First of all thanks for your excellent sample.. That's what I look for exactly. However, I don't know how I can get the checked items by using this adapter?.. share more details. guys.
    Ai & Artificial Intelligence Course in Chennai
    PHP Training in Chennai
    Ethical Hacking Course in Chennai Blue Prism Training in Chennai
    UiPath Training in Chennai

    ReplyDelete
  18. Thanks for your nice post, i am interested to learn online freelancing, but firstly i have to learn computer , could you suggest me please which computer training center best.







    Dot Net Training in Chennai | Dot Net Training in anna nagar | Dot Net Training in omr | Dot Net Training in porur | Dot Net Training in tambaram | Dot Net Training in velachery

    ReplyDelete


  19. Very interesting blog Thank you for sharing such a nice and interesting blog and really very helpful article.
    Data Science Course in Hyderabad

    ReplyDelete
  20. Very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing, data science online course

    ReplyDelete
  21. he scenario has changed and whether or not you are planning a career in the marketing industry, you cannot deny the fact that everyone today has become a digital marketer by posting updates, pictures and videos on Facebook, digital marketing training in hyderabad

    ReplyDelete
  22. This comment has been removed by the author.

    ReplyDelete
  23. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Simple Linear Regression
    Correlation vs covariance
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  24. I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it. data science courses

    ReplyDelete
  25. very well explained. I would like to thank you for the efforts you had made for writing this awesome article. This article inspired me to read more. keep it up.
    Correlation vs Covariance
    Simple Linear Regression
    data science interview questions
    KNN Algorithm
    Logistic Regression explained

    ReplyDelete
  26. I have express a few of the articles on your website now, and I really like your style of blogging. I added it to my favorite’s blog site list and will be checking back soon…
    Data Scientist Courses I really enjoyed reading this post, big fan. Keep up the good work and please tell me when can you publish more articles or where can I read more on the subject?

    ReplyDelete
  27. I will really appreciate the writer's choice for choosing this excellent article appropriate to my matter.Here is deep description about the article matter which helped me more.
    Data Analyst Course

    ReplyDelete
  28. very informative blog
    Join 360digiTMG for best courses
    Data science course

    ReplyDelete
  29. Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle. Innosilicon A11 Pro ETHMiner (2000Mh) 8GB

    ReplyDelete
  30. Incredibly all around extremely fascinating post. I was searching for such a data and completely delighted in analyzing this one. Continue to post. A commitment of appreciation is all together for sharing..data science course in bhubaneswar

    ReplyDelete
  31. I think this is an informative post and it is very useful and knowledgeable. therefore, I would like to thank you for the efforts you have made in writing this article.
    data science course in hyderabad

    ReplyDelete
  32. I must say, I thought this was a pretty interesting read when it comes to this topic. Liked the material. . . . . business analytics course in mysore

    ReplyDelete
  33. Use them for your favorite video games, similar to roulette, and have the prospect 메리트카지노 to win actual cash. This is why it's value visiting our site a number of} instances to benefit of|benefit from|reap the benefits of} your preferred supply. Their online roulette game is actually a simplified version with just three guess options, paying up to as} 14x per win. You can even take part in “match betting”, which allows gamers to wager throughout tons of of eSports tournaments.

    ReplyDelete
  34. The Digital Marketing Course In Rajouri Garden innovative training programs offer the most comprehensive set of courses spanning the entire product life period. Digi uprise's unique approach blends problems-oriented marketing with a customer-centric approach to design making sure that your product is able to meet the needs of customers.

    ReplyDelete
  35. Origyn IVF is proud and satisfied to have an outstanding success rate and excellence under her supervision. Origyn IVF center is one of the India’s largest IVF center/Test Tube Baby Center in North India having highly advanced equipments & IVF techniques It is an Initiative to originate new life, through the best team effort.

    ReplyDelete
  36. There are many methods to become proficient in Digital Marketing Course in Delhi starting with self-taught classes and obtaining certified programs to enhance your skills. These courses are perfect for entrepreneurs as well as anyone wanting to improve their online presence and get an entry-level job in the field which deals in marketing through digital media.Furthermore, you'll need to be flexible in the way you work with clients. Based on the experience you have, you may be able to charge a higher fee for your services.

    ReplyDelete
  37. Baby Joy IVF is the Best IVF Centre in Delhi, well-known for its expertise and efficient results. Baby Joy IVF center is led by a team of dedicated and experienced professionals and IVF experts, ensuring that you receive the best medical advice and service. We specialize in customized minimal stimulation (mini-IVF), natural cycles (natural IVF), and conventional IVF protocols tailored to the needs of each individual. At Baby Joy IVF Clinic in India, we provide world-class fertility treatments by using cutting-edge technology and the experience of leading gynecologists, IVF specialist team, and resourceful physicians to create results-oriented and cost-effective plans to take advantage of ideal outcomes and provide the Highest IVF Success Rate for treatments in a safe and supportive environment.

    ReplyDelete

AddToAny

Contact Form

Name

Email *

Message *