R is one of the most popular choices of language to learn data science. Despite this fact, most beginners often find themselves lost or confused about finding out the right learning path. The internet is filled with data science and R learning materials which could be overwhelming at first.

Like the saying ‘Think Twice, Code Once’, if you have a well-guided layout, you can save time and effort. This blog will help you lay out a clear systematic approach to learning R for data science.

 

1.      Download R and RStudio

The first thing you need is R, which is the core piece. R is free and open-source language and environment for statistical analysis. The next thing you need is RStudio, which is a nice integrated development environment (IDE) that makes it much easier to use R.

Download R from HERE

Download RStudio from HERE

If you open R itself, it will look very plain.

R

It is not necessary to open RStudio to use R, but having RStudio as your default IDE for R development makes R a little more user-friendly as it provides all the plots, package management and the editor in one place.

 

2.      Get familiar with the RStudio IDE

When you first open RStudio, this is what you see.

RStudio

The left panel is the console for R. Type 1 + 1, hit ‘Enter’ and R will return the answer.

It’s a good idea to use a script so you can save your code. Open new script by selecting File -> New File -> RScript and it will appear in the top left panel of RStudio.

RStudio

It’s basically a text document that can be saved (File -> Save As). You can type and run more than one line at a time by highlighting and clicking the ‘Run’ button on the script toolbar.

RStudio

The top right panel gives you information about what variables you’re working with during your R session.

The bottom right panel can be used to find and open files, view plots, load packages, and look at help pages.

 

3.      Learn the basics

The next thing is to learn the basics of R. Try doing some math operations.

Create a variable

To create a variable in R, we use:

Variable <- Value

e.g. number <- 1

 

Learn about variable types

R has three main variable types – character, numeric and logical.

variable_types

 

Learn about Grouping Data

Learn about Vectors, Lists, Matrix, Data Frames.

Vectors – contain multiple values of the same type (e.g., all numbers or all words)

Lists – contain multiple values of different types (e.g., some numbers and some words)

Matrix – a table, like a spreadsheet, with only one data type

Data Frames – Like a matrix, but you can mix data types

 

Learn about Functions

Functions are a way to repeat the same tasks on a different data.

e.g. x <- c (2, 4, 5, 6, 10)

mean (x)

Another example of a function is plot ().

 

Commenting

To write a comment, type # in front of your comment. Comments do not get evaluated.

e.g. #This is a comment

 

4.      Learn to access the help files

This will be the most useful tool. You should learn how to get the help of a particular function, search the help files for a word or phrase and find help for a package.

?function_name –> get help of a function

e.g. ?mean

help.search (‘phrase’) –> search the help files for a word or phrase

e.g. help.search (‘weighted mean’)

help (package = ‘package_name’) –> find help for a package

e.g. help (package = ‘dplyr’)

 

5.      Learn about R packages

R comes with basic functionality, meaning that some functions will always be available when you start an R session. However, you can write functions for R that are not the part of the base functionality and make it available to other R users in a package. Packages must be installed first then loaded before using it.

To install a package, click on the ‘Packages’ tab on the bottom right panel of RStudio and then click ‘Install’ on the toolbar.

packages

Once you click on ‘Install’, a window will pop up. Now type the name of the package into the ‘Packages’ box, select that package, and click ‘Install’.

install

Now that you’ve installed the package, you still can’t use the function you want. You must load the package first. For this, you will have to use the library () function to load ‘dplyr’ package (for example).

library (“dplyr”)

You don’t have to download the package again once you close and re-open RStudio, but you do need to load the package to use any function of it.

 

6.      Data Loading

Getting the data into R is the first step of the data science process. R has varieties of options to get data of all forms into R. This is a common list of packages best suited for data loading.

·     readr

·     data.table

·     XLConnect

·     rjson

·     XML

·     foreign

 

7.      Exploring the data

Use the following code to obtain the data.

dataExploration

airquality is a data frame that comes inbuilt so that you can play with the data.

Try the following commands and see what you get:

colnames (airquality)

nrow (airquality)

 

Viewing the data

RStudio has a special function called View () that makes it easier to look at data in a data frame.

View (airquality)

 

8.      Data Analysis and Visualization

After learning how to get data into R, and some data exploration techniques, now it’s time to learn some exploratory analysis. Install a list of some wonderful R packages given below that helps to simplify data analysis and visualization.

·        dplyr – helps you do simple and elegant data manipulation

·        data.table – handles big data with ease, provides faster data analysis

·        ggplot2 – awesome package for data visualization

 

9.      Data Preparation

Data preparation is another important step because clean data is hard to find, and often needs to be transformed and molded into a form on which we can run models.

·     reshape2 – melt and cast the dataset into the shape you want

·     tidyr – provides a standardized way to link the structure of a dataset (its physical layout) with its semantics (its meaning)

·     Amelia – missing value imputation

 

10.  Communicate Result

You have learned to extract insights from data, but it could be useless without effective communication or display of result. R Markdown is a great tool for reporting your insights and share your findings with a fellow data scientist.

RMarkdown

Facebook Comments

2564total visits,1visits today

Samikshya Gautam

Subscribe to Samikshya's Blog

Join my mailing list to receive the latest updates and notification.

You have successfully subscribed. Check your mail!