You are on page 1of 24

R PROGRAMMING -

INTRODUCTION

INSTRUCTED BY :
TUSHAR KAKAIYA
PROFESSOR OF PRACTICE
NARAYANA BUSINESS SCHOOL
The R Language
 R is a language and environment for statistical computing and graphics
 R provides a wide variety of statistical techniques like:
 linear and nonlinear modelling,
 classical statistical tests,
 time-series analysis,
 classification, clustering, and so many…)
 It is highly extensible.
 R has become popular as the single most important tool for computational statistics, visualization and
data science.
Evolution of R Language
 It was inspired by, and is mostly compatible with,
the statistical language S developed at Bell
laboratory (formerly AT & T, now Lucent
technologies).
 Although there are some very important
differences between R and S, nevertheless much of
the code written for S runs unaltered on R.
 R was initially written by Ross Ihaka and Robert
Gentleman at the Department of Statistics of the
University of Auckland in Auckland, New Zealand. R
made its first appearance in 1993.
 A large group of individuals has contributed to R by
sending code and bug reports.
 Since mid-1997 there has been a core group (the
"R Core Team") who can modify the R source code
archive.
Features of R
 As stated earlier, R is a programming language and software environment for statistical analysis,
graphics representation and reporting.
 The following are the important features of R −
 R is a well-developed, simple and effective programming language which includes conditionals, loops, user
defined recursive functions and input and output facilities.
 R has an effective data handling and storage facility,
 R provides a suite of operators for calculations on arrays, lists, vectors and matrices.
 R provides a large, coherent and integrated collection of tools for data analysis.
 R provides graphical facilities for data analysis and display either directly at the computer or printing at the
papers.
Why should you learn R?
 If you have a need to run statistical calculations in your application. Learn and deploy R! It integrates
with programming languages such as Java, C++, Python, Ruby.
 If you need to run your own analysis, think of R.
 If you are working on an optimization problem, use R.
 If there is a need to use reusable libraries to solve a complex problem. Leverage the 2000+ free
libraries provided by R.
 If you wish to create compelling charts, leverage the power of R.
 If you aspire to be a Data Scientist in future, you will learn R.
 If you wish to have fun with statistics, you will learn R.
 As of August 2021, R is one of the top five programming languages of the year, so it’s a favorite
among data analysts and research programmers.
Why should you learn R?
 R is free. It is available under the terms of the Free Software Foundation’s GNU General
Public License in source code form
 It is Available for Windows, Macs, wide variety of Unix platforms (including FreeBSD, Linux,
etc.)
 In addition to enabling statistical operations, it’s a general programming language, so that
you can automate your analyses and create new functions
 It has excellent tools for creating graphics from bar charts to scatter plots to multi-panel
lattice charts
 It is object-oriented and functional programming structure
 It has a Support from a robust, vibrant community
Why should you learn R?
 It has a flexible analysis tool kit: this makes it easy to access data in various formats,
manipulate it (transform, merge, aggregate, etc.), and subject it to traditional and modern
statistical models (such as regression, ANOVA, tree models etc.)
 R can be extended easily via packages
 R relates easily to other programming languages. Existing software as well as emerging
software can be integrated with R packages to make them more productive
 R can easily import data from MS Excel, MS Access, MySQL, SQLite, Oracle etc.
 It can easily connect to databases using ODBC (Open Database Connectivity Protocol) and
ROracle Package
Why should you learn R?
 It has a flexible analysis tool kit: this makes it easy to access data in various formats,
manipulate it (transform, merge, aggregate, etc.), and subject it to traditional and modern
statistical models (such as regression, ANOVA, tree models etc.)
 R can be extended easily via packages
 R relates easily to other programming languages. Existing software as well as emerging
software can be integrated with R packages to make them more productive
 R can easily import data from MS Excel, MS Access, MySQL, SQLite, Oracle etc.
 It can easily connect to databases using ODBC (Open Database Connectivity Protocol) and
ROracle Package
Why should you learn R?
Advanced Statistics

Supportive Open Fun with statistics


source community

Free Open Source


Integration with Other R Programming
Programming Languages Language

Great Visualization

Easy extensibility
Cross Platform
Compatibility
Where R is being used?
 Google  Uber
 LinkedIn  HP
 Facebook  Twitter
 IBM  American Express
 Bing  And many more…
 Mozilla
 SAP
 Oracle
 New York Times
 Airbnb
 Microsoft
Where R is being used?
 Fintech Companies (financial services)
 Academic Research
 Government (FDA, National Weather
Service)
 Retail
 Social Media
 Data Journalism
 Manufacturing
 Healthcare
 And Many More…
R Usage
 Use as a calculator.
 Compute several statistics about data.
Complexity
 Plot data.
 Develop machine learning algorithms.
R as a Calculator
 R performs most of the mathematical calculations you can think of:

 1 + 2 on the console yields 3


 100-50.6 yields 49.4

 R has special functions to compute well known mathematical operations:

 sqrt() function computes a square-root.


 exp() computes an exponential.
 log() computes the logarithm.
R as a Calculator
 You can also perform complex calculations that obey to mathematical rules:

(10/10) / (5/5) = 1

exp(0) * 20 = 20

exp(sqrt(9)) = 20.08554
R Objects
 Most of the stuff we use in R are objects.

 These objects have their:


 Own set of characteristics (size, dimensions, data types, etc.)
 Own set of functions.

 Example of R objects:
 Vector;
 Matrix;
 List;
 DataFrame;
R Vectors

One of the most simple, yet powerful, object is the vector.

- Vectors are characterized by their size and type of the elements.


- All elements have to be of the same type in a vector.
- Vectors are index-based and you can access the elements by their position
(index).

- You create a vector with the command c()


- To create a vector melons with 4 melons and their weight, in kilograms:
- melons <- c(3.4, 3.1, 3, 4.5)

- To access the weight of melon number 1 we would use melons[1]


The Environment
Assigning objects into variables using = or <- is the same!

melons = c(3.4, 3.1, 3, 4.5)


melons <- c(3.4, 3.1, 3, 4.5)
Vector Operations
It would be good if R would let us make calculations on our vector..

melons <- c(3.4, 3.1, 3, 4.5)

“There was a problem when weighting, the melons have only half the weight”:
melons/2

Each item in the vector is divided by 2!

“There was a problem when weighting, the melons have two times the weight”:
melons*2

Each item in the vector is multiplied by 2!


Vector Operations
melons <- c(3.4, 3.1, 3, 4.5)

“There was a problem on the measurements, you have to add these


values to their weight: 0.4, 0.2, 0.4, 0.3”
Melons + c(0.4, 0.2, 0.4, 0.3)

Each melon will have it’s weight summed with the corresponding
element of the new vector so the resulting vector will be:
(3.4+0.4, 3,1+0.2, 3+0.4, 4+0.3)

The cool thing is that we can make this more meaningful by calling the
c(0.4, 0.2, 0.4, 0.3) vector something related, such as
adjust_weight and then our calculation could be:
new_melons = melons+adjust_weight
Vector Operations
melons <- c(3.4, 3.1, 3, 4.5)

“The value of the melons weight is the square root of the value we gave
you”
sqrt(melons)

“Sum all the melons’ weight please”


sum(melons) would yield 14

“What’s the mean of the melons’ weight?”


mean(melons) would yield 3.5
Vector Operations
melons <- c(3.4, 3.1, 3, 4.5)

R also let you do comparison calculations:

“Which Melons weight more than 4 kilograms?”


melons>4 This would yield:
[FALSE, FALSE, FALSE, TRUE]

“Which Melons weight exactly 3 kilograms?”


melons==3 This would yield:
[FALSE, FALSE, TRUE, FALSE]
Vector Operations
melons <- c(3.4, 3.1, 3, 4.5)

AND and OR are also possible:

“Which Melons weight exacly 4 kilograms AND 3 kilograms?”


melons == 4 & melons== 3 This would yield:
[FALSE, FALSE, FALSE, FALSE]

“Which Melons weight exacly 4 kilograms OR 3 kilograms?”


melons == 4 | melons== 3 This would yield:
[FALSE, FALSE, TRUE, FALSE]
Watch out!
When you divide something by 0 you get the symbol meaning infinite:
Inf

If you use a vector that has an Infinite element in it, our calculations do
not stand:
sum(c(3.4, 3.1, 3, 4.5/0)) would yield Inf

The same happens for NaN (short for Not a Number):


sum(c(3.4, 3.1, 3, sqrt(-1)) would yield NaN

The same happens for NA (short for Not Available):


sum(c(3.4, 3.1, 3, NA)) would yield NA
Watch out!
- NA’s are pretty common in data analysis so watch out for them.

Fortunately, most functions can dodge this by using an extra argument na.rm =
TRUE
sum(c(3.4, 3.1, 3, NA), na.rm=TRUE) would yield 9.5

mean(c(3.4, 3.1, 3, NA), na.rm=TRUE) would yield 3.166667

You might also like