You are on page 1of 8

Introduction to R

R is an open source software - a well organized and sophisticated


package - that facilitates data analysis, modelling, inferential testing
and forecasting. It is a user friendly software which allows to create
new function commands to solve statistical problems. It runs on a variety
of UNIX platforms (and similar systems such as LINUX), Windows and Mac
OS.
R is the most preferred open-source language for analytics and data science. At Microsoft, R is
used by its data scientists, who apply machine learning to data from Bing, Azure, Office, and the
Sales, Marketing, and Finance departments. Twitter has been using R for measuring user-
experience. On the other hand, the cross-platform compatibility of R and its capacity to handle
large and complex data sets make it an ideal tool for academicians to analyze data in their
labs.

R can be used for simple calculations, matrix calculations, differential equations, optimization ,
statistical analysis, plotting graphs, etc. Also, it is useful to anybody who wishes to undertake
extensive statistical computations and data visualization.
R programming

• Functions

• x! factorial(x)
• n
• x
• choose(n,x)
• Γ( ) x gamma(x)
• x e exp(x)
• Lnx log(x)
• X sqrt(x)
• n x x^n
R programming

• Statistics

• Given a data set stored in a vector x:


• n length(x)
• x sum(x)
• 2 x sum(x^2)
• X mean(x)
• Median median(x)
• 2 ( ) x − x sum((x-mean(x))^2)
• sd sd(x)
• Variance var(x)
• Q1 quantile(x,0.25)
• Q3 quantile(x,0.75)
R programming

• lines(x,y) adds line to existing plot, has same options for line types,
widths and colours.
• points(x,y) adds points to existing plot, has same options for points
character and colours.

• Plot y=a + bx

• Abline (a,b) plots straight line with intercept a and gradient b


• has same options for line types, widths and colours
• Abline (model) if insert linear model – will plot the regression line
• Abline (h=...) plots a horizontal line at ...
• Abline (v=...) plots a vertical line at ...
QQ plots

• qqnorm(x) plots quantiles of sample (on vertical) against normal (on


horizontal)
• qqline(x) adds a diagonal line to qqnorm to show “true” position
• qqplot(sim,x) plots quantiles of sample (on vertical) against
simulations of theoretical distribution

• (on horizontal), eg sim <- rgamma(1000,3,2)


• abline(0,1) adds the correct diagonal line to a qqplot
Discrete Distributions
• The names and parameters for discrete distributions are:
• Binomial binom(..., <n>,<p>)
• Poisson pois(..., <mu>)
• Type 2 geometric geom(..., <p>)
• Type 2 Negative Binomial nbinom(..., <k>,<p>)
• Hypergeometric
• hyper(...,<success in pop>, <failure in pop>,<sample size>)
Continuous Distributions
• The names and parameters for continuous distributions are:
• Exponential exp(..., <lambda>)
• Gamma gamma(..., <alpha>,<lambda>)
• Chi square chisq(..., <dof>)
• Uniform unif(..., <min>,<max>)
• Beta beta(..., <alpha>,<beta>)
• Normal norm(..., <mean>,<sd>)
• Log normal lnorm(..., <mu>,<sigma>)
• T t(..., <dof>)
• F f(..., <dof1>,<dof2>)

You might also like