You are on page 1of 4

R Commands

To get the working directory getwd()


To set up the working directory setwd()
To create a row vector v<- c(1,2,3)
To create a matrix M1<- matrix(1:20,nrow=4,ncol=5)

To enter data row wise


M1<- matrix(1:20, nrow=4,ncol=5,byrow=TRUE)
To get the value of a vector at particular position v1[2]
To access the particular package from library library(ggplot2)
To know the number of variables have been executed ls()

To get the data from datasets datasets:: mtcars


To view the datasets View(mtcars)
To store datasets into own file datasets:: mtcars
View(mtcars)

File1<- data(mtcars)
View(mtcars)
To know about which class data belongs to class(File1)
Import the data without header file Myfile<- read.csv(file.choose(), sep=””, header=FALSE)
To store the data from one data base to another Myfile<- as.data.frame(mtcars)

It will store the data from mtcars database to Myfile


database
To have the first 6 rows of datasheet head(Myfile)
To have last 6 rows of datasheet tail(Myfile)
To find out the structure of datafile str(Myfile)
To get to know about the descriptive statistics of summary(Myfile)
datafile
To get to know descriptive statistics of particular summary(Myfile$mpg)
variable in a datafile
The $ is used to symbolize the variable you want to
know the stastics. Here mpg is variable in data file
named Myfile
To get the variance of particular variable in datafile var(Myfile$mpg)

To get the standard variance of particular variable sqrt(var(Myfile$mpg))

Importing the Datafile Advertising.txt advertising<- read.csv(file.choose(), sep=” ”)


then write
advertising
Importing the Excel File Firsty, we need to install package
Install.packages(“readxl”)

Then load the package


R Commands
library(“readxl”)

for opening excel


my_data<- read_excel(file.choose())

Cleaning of Data
 Mismatch Data
 Missing Values
 Irrelevant Data
 Outliers
 Infeasible value (Ex: Age can never be negative)
 Redundant Data

To check missing values (overall dataset named is.na(Advertising)


Advertising)
Does the data set contains missing values any(is.na(Advertising))

It will give output as either True or False


To know total number of missing values in the sum(is.na (Advertising))
data set
To find out the exact location of missing value which(is.na(Advertising))

Remove Missing Value


If data set has missing values we can work on it in two ways:

 Eliminate the whole row which has the missing value


 To replace the missing values by a mean value
To remove the row which has Advertising_new<- na.omit(Advertising)
missing value Advertising_new
To replace the missing value by sum((Advertising$NewspaperAds),na.rm=TRUE)
the mean value at the position This will give the total of all observation other that NA
where NA is written
mean((Advertising$NewspaperAds),na.rm=TRUE)
This will provide the mean of all observations under Newspaper
Ads no including NA
R Commands

Avg_newspaper= mean((Advertising$NewspaperAds),na.rm
=TRUE)
Substituting the value of mean in Avg_newspaper variable

Advertising$NewspaperAds[is.na(Advertising$NewspaperAds)]<-
Avg_newspaper
Putting the mean at the position where NA is written
View(Advertising$NewspaperAds)

To Rename the header of column


Renaming the header row for each column names(Advertising)<- c(“City”, “TVAds”,
“RadioAds”, “NewspaperAds”, “Sales”,
“StoreType”)

Convert into different datatype


Converting numeric into integer Ex:
V1<-10
This is not an integer, it is stored in numeric form
To convert into integer
V1<-as.integer(10)
Now it is stored in form as integer.

OR

Put “L” at last to convert into integer


V1<- 10L
Now it will enter data as integer form.

Visualization
Before plotting up the graphs…we just need to load the package
For that use command:
library(ggplot2)
R Commands
To create a histogram hist(Advertising$RadioAds)
To create a scatter plot plot(Advertising$RadioAds)

Binding the rows/columns


Binding two vectors in column-wise x<-21:23
y<- 7:9

cbind(x,y)
This will only work when both vectors are of
same size

O/p will be
21 7
22 8
23 9
Binding vectors in row-wise x<-21:23
y<- 7:9

rbind(x,y)
This will only work when both vectors are of
same size

O/p will be
21 22 23
7 8 9

Dimension Names of a Matrix


to allocate names to rows and columns of m<-matrix(1:6,nrow=2, dimnames =
matrix list(c("a","b"),c("c","d","e")))

You might also like