0 Up votes0 Down votes

2.2K views39 pagesThis tutorial on R is prepared by the Applied Statistics and Computing lab at the Indian School of Business, Hyderabad. This presentation is a comprehensive guide for someone who wishes to begin using R for data analysis.
Hope you find the tutorial interesting and useful.
Happy learning :)

Aug 26, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

This tutorial on R is prepared by the Applied Statistics and Computing lab at the Indian School of Business, Hyderabad. This presentation is a comprehensive guide for someone who wishes to begin using R for data analysis.
Hope you find the tutorial interesting and useful.
Happy learning :)

Attribution Non-Commercial (BY-NC)

2.2K views

This tutorial on R is prepared by the Applied Statistics and Computing lab at the Indian School of Business, Hyderabad. This presentation is a comprehensive guide for someone who wishes to begin using R for data analysis.
Hope you find the tutorial interesting and useful.
Happy learning :)

Attribution Non-Commercial (BY-NC)

- (10) Hypergeometric Distribution
- (1) Introduction
- (1) Set Theory
- (3) Probability
- (11) Poisson Distribution
- (2) Types of Data
- (2) Permutations and Combinations
- (4) Condensation of Data
- (9) Basic Box-Plot
- (7) Measures of Central Tendency
- (5) Bayes' Rule
- (6) Random Variables and PMF
- (12) Bivariate Data
- (15) Chi-square, Student’s t and Snedecor’s F distributions
- (6) Graphical Presentation 2
- (4) Conditional Probability
- (8) Measures of Dispersion
- (13) Normal Distribution
- (3) Methods of Data Collection
- (7) Discrete Uniform Distribution

You are on page 1of 39

: Ice Breaker

Applied Statistics and Computing Lab Indian School of Business

Learning Goals

• What is R?

• Why we use R?

• How to read data into R

• Getting familiar with basic commands & coding

• More of R: What next?

Applied Statistics and Computing Lab

^{2}

R: What is it and Why we use it

• Open-Source, cross platform, free Statistical Language and Program

• Works on Windows, Mac-OS, Linux, Unix platforms

• Flexible: own functions, modify existing function/commands to suit your purpose

• Powerful: Open source, Constantly being updated by users ( Scientists, Statisticians, Researchers, Students!)

• And: Beautiful Graphics, Facilitates research, comes with an enormous library of pre-defined functions, can be integrated into many environments and platforms such as LaTex, Hadoop etc

Applied Statistics and Computing Lab

^{3}

Installing R

• Can be downloaded for free from http://www.r-project.org/

• Download the version compatible with your OS

• Simple/Standard installation process

Applied Statistics and Computing Lab

^{4}

R Interface

Windows

Applied Statistics and Computing Lab

Mac

5

Interacting with R

• We have seen in the console the command prompt ‘>’, indicating that we must begin entering our command

• Basic Rule: Type a command and hit enter to execute it

• E.g. x<-1:100 (create a vector of length 100, with elements 1,2,3,4…… 100)

Applied Statistics and Computing Lab

^{6}

Interacting with R: R Script

Applied Statistics and Computing Lab

•Can write and save codes here file New script Or ‘ctrl+N’ •Write code, select the part you want to run and ‘ctrl+R’ to execute

^{7}

R Console: As a Calculator

• Type this in the console:

12+5 Enter

• Let us try something more complex:

(12+5)*(39-13) /45 Enter

• Can be used like any other calculator

• WARNING: Beware of lurking square brackets

[(12+5)*(39-13)]/45 Enter

We will see later on in this tutorial that ‘[]’ means something else in R.

• Much more than a calculator!

Applied Statistics and Computing Lab

^{8}

R Commands

• Are mostly in the form of functions

E.g.: plot(x,y), mean(x)

• How do we tell R what x and y are?

– We can assign values to x and y ourselves

– Or import a dataset that contains x and y

– We will learn this through examples

Applied Statistics and Computing Lab

^{9}

R: The Very Basics

• Essential basics to move forward with R:

– Create your own Objects (Variables, Vectors, Matrices, Lists etc)

– Assign names to these Objects

– Learn to access an Object or any subset/part of it

– Perform simple calculations, transformations on these objects

Applied Statistics and Computing Lab

^{1}^{0}

R: The Very Basics

• Suppose you own 5 cars

Vectors

– Type: Compact, Minivan, SUV, Roadster and a Pickup Truck

– Mileage: 1256,237,6780,1000,12000

• Let us define our first vector using the ‘c’ function in R, which “Combines Values into a Vector or List”

• Vector Mileage

– Create the vector:

c(1256,237,6780,1000,12000)

– Assign the name ‘mileage’ to this vector using ‘->’

mileage<-c(1256,237,6780,1000,12000)

Applied Statistics and Computing Lab

^{1}^{1}

R: The Very Basics

Vectors contd…

– Vector “type”

type<-c(Compact, Minivan, SUV, Roadster,Pickup Truck)

For creating a vector of string components, we use “ “ to separate the elements. This would work:

type<-c(“Compact”, “Minivan”, “SUV”, “Roadster”,”Pickup Truck”)

Applied Statistics and Computing Lab

^{1}^{2}

R:Tip 1

• R is case sensitive

Applied Statistics and Computing Lab

^{1}^{3}

R: The Very Basics

Matrices, Data Frames

• Create a simple 2x2 matrix, lets call it ‘m’:

m<-matrix(data=c(2,3,4,5),nrow=2,ncol=2)

Applied Statistics and Computing Lab

^{1}^{4}

R: The Very Basics

Matrices, Data Frames Contd…

• Consider the 5 cars in our previous example, along with ‘type’ and ‘mileage’ , the following data is also available:

– Price, price<-c(36790,3445,66789,2455,76889)

– Number of cylinders in the engine,

no.cyl<-c(3,4,4,4,4)

• Create a Data Frame that contains all this information:

cars<-data.frame(type,price,mileage,no.cyl)

Applied Statistics and Computing Lab

^{1}^{5}

R: Packages

• Are a collection of R functions and data sets

• Few standard ones come with the R installation, others have to be downloaded ( from http://cran.r-project.org/, or a simple Google search could lead you to the download site) and manually installed

• Or the packages can be installed using “install.packages(“package name”)“ and select the CRAN Mirror closest to your location

• Once installed we need to call the package in when needed using “library(“package name”)”

Applied Statistics and Computing Lab

^{1}^{6}

•

Example:

R: Packages

Example

– Package: ‘gdata’

– Various R programming tools for data manipulation

17

Applied Statistics and Computing Lab

R: Working Directory (WD)

• Some location/Folder on your PC where you have the data, code etc

• You want to import files, code from this location

• You want to save your output here

• Setting a WD on starting your R session makes importing, exporting data files, code files etc easier

Applied Statistics and Computing Lab

^{1}^{8}

R: Working Directory

• file change dir

Applied Statistics and Computing Lab

^{1}^{9}

R: Importing Data

• More often than not , data are already available in different formats ready to be imported to R.

• R accepts files of many formats, we will learn importing files of the following formats:

– Text (.txt)

– CSV (.csv)

– Excel (.xls)

– SPSS ( .sav)

– STATA (.dta)

– SAS (.ssd)

(For more formats you can visit http://cran.r- project.org/doc/manuals/R-data.pdf , here you get information on how to import image files as well ! )

Applied Statistics and Computing Lab

20

R: Importing Data

Text , CSV and Excel files

• Text Files:

– Comma Delimited Text Files:

data1<- read.table("C:/Users/xyz/Desktop/folderX/mydata.txt", header=TRUE, sep=",“)

– Space as the separator:

data1<- read.table("C:/Users/xyz/Desktop/folderX/mydata.txt", header=TRUE)

– Another(easier) way, set your working directory then the command is:

data1<- read.table("mydata.txt", header=TRUE)

• CSV Files:

– Similar way, use ‘read.csv’ instead of ‘read.table’

• Excel Files:

– Use read.xls package)

(needs package ‘gdata’, use ‘library(gadata)’ after installing this

Applied Statistics and Computing Lab

21

R: Importing Data

From other Statistical Software

• SPSS:

– Need library ‘foreign’

– Use command: ‘read.spss’

• STATA:

– Need library ‘foreign’

– Use command: ‘read.dta’

• SAS:

– Need library ‘foreign’

– Use command: ‘read.ssd’

Applied Statistics and Computing Lab

^{2}^{2}

R: Tip 2

• For any help on any function just type the following in the R console:

?’fucntion name’ Or help(‘function name’)

We don’t see anything here as these commands take you to a webpage where the function and its arguments are explained.

Applied Statistics and Computing Lab

^{2}^{3}

R: Master Example

• The Used Cars Data:

– Data collected from Kelly Blue Book for several 2005 Used cars

– Interest is to determine a model for car value based on a variety of characteristics such as mileage, make, model, engine size, interior style, and cruise control

– 810 observations, 12 variables

– File name: ‘Used Cars’, CSV format

Applied Statistics and Computing Lab

^{2}^{4}

R: Master Example

Input the Used cars data

Applied Statistics and Computing Lab

^{2}^{5}

R: Master Example

Summary of the Data

Applied Statistics and Computing Lab

^{2}^{6}

R: Master Example

View the Dataset

Applied Statistics and Computing Lab

^{2}^{7}

R: Master Example

Variable Calling

• Suppose you want a frequency table of the ‘Make’ variable:

– Use function ‘table()’

Applied Statistics and Computing Lab

^{2}^{8}

R: Master Example

Certain Rows or Columns in the Dataset

Applied Statistics and Computing Lab

^{2}^{9}

R: Master Example

Subsets of the data

• How to obtain a subset that contains cars whose price is less than or equal to 10,000 Dollars?

– Use the ‘which’ function

cars.subset1<-used.cars[which(used.cars$Price<=10000),]

Applied Statistics and Computing Lab

^{3}^{0}

R: Master Example

Subsets of the data contd

• Sedans that cost less than 10000 Dollars

cars.subset2<-used.cars[which(Price<=10000 & Type=="Sedan"),]

Applied Statistics and Computing Lab

^{3}^{1}

R: Master Example

Subsets of the data contd

• Other functions:

– ‘subset’:

cars.subset2<-subset(used.cars,Price<=10000 & Type=="Sedan")

– ‘sample’ : For random samples

For more, you can look at:

http://www.ats.ucla.edu/stat/r/modules/subsetting.htm

Applied Statistics and Computing Lab

^{3}^{2}

R: Transformations

Applied Statistics and Computing Lab

^{3}^{3}

R: Plots

Applied Statistics and Computing Lab

^{3}^{4}

R: Plots Contd…

Applied Statistics and Computing Lab

^{3}^{5}

R: Write your own functions

•

Syntax:

my.function<-function(arg1, arg2,….) { Statement 1 Statements 2

:

return(return.value)

}

• Example: Add two numbers/vectors

addition.mine<-function(x,y) { return(x+y) }

• Example: Sum of Diagonal elements of a matrix ( Trace of a matrix)

trace.mine<-function(mat) { sum(diag(mat)) }

Applied Statistics and Computing Lab

^{3}^{6}

R Studio

• A free and open source integrated development environment (IDE) for R

• Can be downloaded from

http://www.rstudio.com/

Applied Statistics and Computing Lab

^{3}^{7}

R: Extra Help

• Rseek : An exclusive R search engine

• More help and resources:

– R-bloggers

– UCLA’s R help

– Quick-r

– R-help

•

Google!

Applied Statistics and Computing Lab

^{3}^{8}

Thank you

Applied Statistics and Computing Lab

- (10) Hypergeometric DistributionUploaded byASClabISB
- (1) IntroductionUploaded byASClabISB
- (1) Set TheoryUploaded byASClabISB
- (3) ProbabilityUploaded byASClabISB
- (11) Poisson DistributionUploaded byASClabISB
- (2) Types of DataUploaded byASClabISB
- (2) Permutations and CombinationsUploaded byASClabISB
- (4) Condensation of DataUploaded byASClabISB
- (9) Basic Box-PlotUploaded byASClabISB
- (7) Measures of Central TendencyUploaded byASClabISB
- (5) Bayes' RuleUploaded byASClabISB
- (6) Random Variables and PMFUploaded byASClabISB
- (12) Bivariate DataUploaded byASClabISB
- (15) Chi-square, Student’s t and Snedecor’s F distributionsUploaded byASClabISB
- (6) Graphical Presentation 2Uploaded byASClabISB
- (4) Conditional ProbabilityUploaded byASClabISB
- (8) Measures of DispersionUploaded byASClabISB
- (13) Normal DistributionUploaded byASClabISB
- (3) Methods of Data CollectionUploaded byASClabISB
- (7) Discrete Uniform DistributionUploaded byASClabISB
- (10) Box-Plot With FencesUploaded byASClabISB
- (9) Geometric and Negative Binomial DistributionUploaded byASClabISB
- (8) Binomial DistributionUploaded byASClabISB
- (5) Graphical Presentation 1Uploaded byASClabISB
- (14) Joint DistributionUploaded byASClabISB
- (12)Continuous DistributionsUploaded byASClabISB
- (8b) Grouped Data_central Tendency and DispersionUploaded byASClabISB
- (11) Notched and Variable Width Box-PlotsUploaded byASClabISB
- SMDA6e Chapter 03Uploaded bygen_mkv
- Teradata CaseUploaded byMila Gorodetsky

- (7) Discrete Uniform DistributionUploaded byASClabISB
- (15) Chi-square, Student’s t and Snedecor’s F distributionsUploaded byASClabISB
- (14) Joint DistributionUploaded byASClabISB
- (2) Types of DataUploaded byASClabISB
- (13) Normal DistributionUploaded byASClabISB
- (12)Continuous DistributionsUploaded byASClabISB
- (9) Basic Box-PlotUploaded byASClabISB
- (7) Measures of Central TendencyUploaded byASClabISB
- (9) Geometric and Negative Binomial DistributionUploaded byASClabISB
- (8) Binomial DistributionUploaded byASClabISB
- (12) Bivariate DataUploaded byASClabISB
- (6) Random Variables and PMFUploaded byASClabISB
- (5) Bayes' RuleUploaded byASClabISB
- (2) Permutations and CombinationsUploaded byASClabISB
- (4) Condensation of DataUploaded byASClabISB
- (4) Conditional ProbabilityUploaded byASClabISB
- (8) Measures of DispersionUploaded byASClabISB
- (10) Box-Plot With FencesUploaded byASClabISB
- (11) Notched and Variable Width Box-PlotsUploaded byASClabISB
- (8b) Grouped Data_central Tendency and DispersionUploaded byASClabISB
- (3) Methods of Data CollectionUploaded byASClabISB
- (6) Graphical Presentation 2Uploaded byASClabISB
- (5) Graphical Presentation 1Uploaded byASClabISB

- Java Debugging ToolsUploaded bylghmshari
- sikulix-2014Uploaded byRajesh Kushwaha
- Fortinet CliUploaded byRasakiRraski
- 217014-AUploaded bySatish Kumar
- _winideaUploaded byjopiguer
- OptiCut v - ManualUploaded bydocumundo
- Cloud Migrations_101_1540308884524001QeGBUploaded byJack Wang
- MultiNIC a & B DifferencesUploaded byYeshwanth Karanam
- Installing OM2.x on Windows XP 7 2003Uploaded bypreety99
- 4.3.3.4 Lab - Configure HSRPUploaded byLeo Leo
- USBCam User GuideUploaded byGustavo Orellana
- DFSMShsm Implementation and CustUploaded bysathish64
- Introduction to NachosUploaded bysathishv
- Matlab BookUploaded byNgema Zama
- About CL ProgrammingUploaded bywladimirk
- a96113Uploaded bynilachip
- odv4GuideUploaded byLos Rog
- ePass2003 User GuideUploaded byKishore Ainavilli
- InstallJammerUserGuideUploaded byIrshad Ahamed
- AGM_User_Experience_Virtualization_(UE-V)_1.0.pdfUploaded bynando_baqueiro
- Amldonkey Quickstart GuideUploaded byRobert Long
- GrasshopperPrimer_V3-3_ES_low.pdfUploaded byAlison Mariñas
- Python 3 TutorialUploaded bySilas
- Device Discovery ConsoleUploaded byLuciano Rodrigues E Rodrigues
- mo21deploymentsUploaded bywsiang7
- book_0.1.0.pdfUploaded byAngel AT
- spring_boot_tutorial.pdfUploaded byKirti Parmar
- Flex and Flash Automation TestingUploaded byMushtaque Asghar
- 2.4c Informix Application Development 4GL LabUploaded bySai Krishna
- Mastering PowerShellUploaded byttravel2552

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.