You are on page 1of 36

Name of Institution

Amity Law College


Course Title: Data Visualisation for New Age Managers
Course Code: CSIT245
Course Level : UG

Module II: Data Visualization in R


Weightage -25%
Basic Syntax : R Studio, R Console, Code Editor.
By - Jitendra K Sharma

1
Name of Institution

Table of Content
 What is R?
 Basic Syntax: R studio, R Console, Code editor,
 Graphical Parameters
 Ggplot: Bar Chart, Histogram, Density Plot,
Scatterplot,
 Base R Graphics
 Use Case
• Project work#

2
Name of Institution


What is R?
R is an open-source programming language and environment used for statistical analysis, data
visualization, and data science.
• Being open-source, R has a massive community that continuously works to improve the environment as
well as helps members worldwide to improve and innovate.
• R can be used for data analytics, statistical analysis, as well as machine learning purposes.
• R is compatible with a number of different technologies and is highly flexible.
• It has over 10,000 different libraries and packages to enhance and add on to its already significant
capabilities. It also has graphics libraries for static as well as dynamic graphics.
• History of R
• R is an extension of the S-programming language, which was created by John Chambers at Bell
Laboratories (formerly AT&T, now Lucent Technologies) in 1976. S was a premiere tool for statistical
research, but it wasn’t very feasible outside scholarly research.
• In 1992, Ross Ihaka and Robert Gentleman created R at the University of Auckland, New Zealand, as a
tool that their students could learn and use easily. Ihaka and Gentleman released the initial version in
1995, and a stable beta version was released in 2000. Since then, it is maintained by the R
Development Core Team.
• Before You Start Learning R
• No mandatory prerequisites to R. it is recommended to have some basic knowledge of below
• Basic understanding of statistics, mathematics, and probability
• General understanding of data science and the processes involved.
• Basic Understanding of various types of graphs and data representation techniques.

3
Introduction to R Name of Institution

Survey of Data Science Mining Experts


60%

50%
50%

40%
35%

30% 30%
30%
27%

20% 20%
20%
15%

10% 10%
10%

0%
R Rapid Minor Python SQL MS Excel KNIME Hadoop Tableau SAS Spark

4
IEEE Ranking of Top Programming Languages
Name of Institution

5
Name of Institution
Why R So Popular
• R is the Language of Data
Scientists
• R is Free and open Source,
Free &
Flexibility there Free Packages,
Open • which can be added to get
Operations
Source desired results

Vast Big 3rd Party


Supported available
Communit Packages Visit R Studio Community for more
y available details
https://community.rstudio.com/

6
Name of Institution

7
Installation of Rstudio Name of Institution

Installing R and RStudio Connect

Installing R
To use RStudio Connect, first you need to install R 2.11.1 (or higher). If you don’t
already have R, you can download it here.
Please note that to make use of multiple versions of R in RStudio Connect, you will
need to compile R from source as explained here:
https://support.rstudio.com/hc/en-us/articles/218004217-Building-R-from-Source

RStudio is an integrated development environment (IDE) for R. It includes a console,


syntax-highlighting editor that supports direct code execution, as well as tools for
plotting, history, debugging and workspace management.

Installing RStudio Connect


To install RStudio Connect, click the link included on the trial page or in the email sent
with your license key. If you don’t have this link, please contact Support and let us
know.
For a list of common dependencies for RStudio Connect, see our article here.
If you have previously been using Shiny Server, check out our information on migrating
from Shiny Server to RStudio Connect here. 8
Name of Institution

• Installation of R Studio (IDE


Environment)

9
Installation of “R” Name of Institution

• Installing R and RStudio on Windows


• To install R and RStudio on windows, go through the following steps:
• Install R on windows
• Step – 1: Go to CRAN R project website.
10
Name of Institution

11
Name of Institution

• Configuration of R Studio

12
Name of Institution

(Download R Studio, it’s an IDE (Integrated Development Environment for R)


https://www.rstudio.com/products/rstudio/download/

13
Name of Institution

Shopping Cart, a Temporary Storage


during Shopping
Variables in R
• Variables are Temporary
Storage Spaces –

• We can store data in R,


memory these memory
locations have a name ; they
are called Variables.

• Example (Shopping Cart,


where we place our selected
items, and then keep on
replacing or adding more and
more items before the final
billing, to be billed at the
Cash counter)
Shopping Cart, we keep on changing,
before the final billing

14
Name of Institution

Components of a Variable
Components

Memory
Name Value Data Type
Address

Radius 0x6d8d8c8 3 Numeric

15
Name of Institution


Data Types in R
Whenever you add data to variable, they can of
different Types, as below

“Jitendra”,
3.14 or 20,500
“Noida”
Numeric
Character

6+1, 100i-20a
TRUE,FALSE (Real Value
/ Yes or No and imaginary
Logical or Part)
Boolean Value Complex

16
Name of Institution

• Programming in RStudio

17
Name of Institution

Working with - Variables

18
Name of Institution

19
Name of Institution

Basic Syntax: R studio, R Console, Code editor,


Visit below Link to go through more keyboard shortcuts
https://support.rstudio.com/hc/en-us/articles/200711853-Keyboard-Shortcuts

1. Practice some of the Easy commands


a = “Mobile”
b = “Pen Drive”
c = “Amity University”
See Variables Values in Environment Console

2. Arithmetic Operators
Set value to Variables
>a = 1
>b = 2
>num1 = 10
>num2 = 20
Now to the calculations
>num1 + num2
>num1 – num2
>num1 * num 2
>num1 / num 2

20
Name of Institution

• Installation of Package (ggplot2)

21
Installation of ggplot Name of Institution

The ggplot 2 package , created by Hadley Wickham, offers a powerful


graphics language for creating elegant and complex plots.
gg stands for “Grammar of Graphics”
Ggplot2 allows to create graphs that represent both univariate and
multivariate numerical and categorical data in a straightforward manner
Grouping can be represented by color, symbol, size and transparency.
The creation of frame plots (i.e., conditioning) is relatively simple.is an
enhanced data visualization package for R. Create stunning multi-layered
graphics with ease.

Installation Link : -
https://rdrr.io/cran/ggplot2/

https://ggplot2.tidyverse.org  
URL
https://github.com/tidyverse/ggplot2 
Package repository View on CRAN
Install the latest version of this package
Installation by entering the following in R:
install.packages("ggplot2")

22
Name of Institution

Bar Chart,
Ggplot:
Bar Chart,
Histogram, Histogram,
Density Plot,
Scatterplot, Density Plot,

Scatterplot,
23
Name of Institution
The Plot()function
• The most used plotting function in R Programming is the plot() function. It is
a generic function, meaning, it has many methods which are called
according to the type of object passed to plot().
• https://rstudio-pubs-static.s3.amazonaws.com/114588_15d2bd7cc4b546ffa79af3d22a7c32ba.html

In the simplest case, we can pass in a vector and


• x<-seq(-pi,pi,0.1)
we will get a scatter plot of magnitude vs index.
• plot(x,sin(x),main = "The Sine Wave Function by Jitendra", ylab = "sin(x)"
But generally, we pass in two vectors and a scatter
plot of these points are plotted.

For example, the command plot(c(1,2,c(3,5))


would plot the points (1,3) and (2,5).

24
Histogram Name of Institution

Let use the built-in database air quality which has Daily air Quality measurements
In New York, Mar to September 1973, -R documentation.
str(airquality)
temperature <- airquality$Temp
hist(temperature)

25
Bar Plot Name of Institution
Plotting categorical data
• Now plotting this data will give us required bar plot. Note below, that we define
the argument density to shade the bars
age <- c(17,18,18,17,18,19,18,16,18,18)
table(age)
age
barplot(table(age),
main=“Age Count of 10 Students”,
xlab=“Age”,
ylab=“Count”,
border=“red”,
col=“blue”,
density=10)

26
Name of Institution
Bar Chart – ggplot2
function and customization
and export function
• library(tidyverse)
• install.packages("tidyverse")
• #ggplot2 library load

• library(ggplot2)

• #load prebuil data set


• ==========================
• data("mpg")
• ggplot(mpg)+
• geom_bar(aes(x = class), fill – ‘green’)
• ===========================
• data("mpg")
• ggplot(mpg)+
• geom_bar(aes(x = class, fill = drv))
• ===========================
• data("mpg")
• ggplot(mpg)+
• geom_bar(aes(x = class, fill =
factor(cyl)))

• We can also save file as pdf or png


• ggsave(“data.png”)
• ggsave(“data.pdf”) 27
Name of Institution

Overlaying Plots Plot() function


• Calling plot() multiple times will have the effect of plotting the current graph on the
• Same window replacing the previous one.
• This is made possible with the function's lines() and points() to add lines and
• Points respectively, to the existing plot.

x <- seq(-pi,pi,pi,0.1)
plot(x, sin(x),sin(x),main=“Overlaying Graphs”,ylab=“”,type=“l”,col=“blue”)
lines(x,cos(x),col=“red”)
legend(“topleft”, c”sin(x)”,”), fill=c(“blue”,red”))

28
Scatter Plot Name of Institution
• Scatter plots are similar to line graphs in that they use horizontal and vertical axes to plot data points.
However, they have a very specific purpose. Scatter plots show how much one variable is affected by
another. The relationship between two variables is called their correlation.
• A Scatter plot pairs up values of two quantitative variables in a data set and display them as geometric
points inside a Cartesian diagram.
• Each point represents the values of two variables.
• #Get the input values.

input <- mtcars[,c('wt','mpg')]


> plot(x = input$wt,y = input$mpg,xlab = "Weight", ylab = "Milage",xlim = c(2.5,5),ylim = c(15,30), main = "Weight vs
Milage")

29
Name of Institution

Basic Syntax: R studio, R Console, Code editor


RStudio – An Overview
Code Editor
Hello Amity World with R using
RStudio

• Estimated time needed : 15 minutes


• Learning Objectives
– Get familiar with RStudio
– Write your first R code snippet in RStudio

32
Name of Institution
• Introduction to R
• Write the First Hello Amity code snippet in the Console
Introduction • Lets write your first Hello Amity in RStudio Console
• Find the blinking cursor in the Console panel, type an
to Data incomplete prin or print and pause a little bit
Visualization • Type print(“Hello Amity World”)
• my name is Jitendra”)
Practice R • For practice you can type anything by starting with you name
(write your name)
• Practice to understand the command
• If you want to clear the console, you could just press Ctrl or
Control + L key combination

Recommended Study.
https://rstudio-education.github.io/hopr/starting.html
33
https://docs.rstudio.com/
Name of Institution
Graphical Parameters
• Graphical Parameters in R – They are used to customize
features, of the Graphs, (fonts, colors, axes, titles) through the
graphic options in R.
• Below is the Reference tutorial for Graphical know how…
• http://rstudio-pubs-static.s3.amazonaws.com/315576_85cccd774c29428ba46969316cbc76c0.html#bty

34
Features of R
Name of Institution

Features of R
Community to Support and explore:
• R has a massive community that works tirelessly to improve and add upon R’s abilities. CRAN or
Comprehensive R Archive Network has over 10,000 packages or extensions that can be used from
producing high-definition graphics to creating interactive web-apps.
• R can perform complex mathematical and statistical operations on vectors, matrices, data frames,
arrays, and other data objects of varying sizes.
• R is an interpreted language and does not need a compiler. It generates a machine-independent code
that is easy to debug and is highly portable.
• R is a comprehensive programming language that supports object-oriented as well as procedural
programming with generic and first-class functions.
• It supports matrix arithmetic.
• R can present data graphically. With static graphics, producing production quality visualizations and
extended libraries providing interactive graphic capabilities, data visualization, and data representation
becomes very easy. From concise charts to elaborate flow diagrams, all are well within R’s repertoire.
• R can be used throughout the data analysis process. It helps to gather the data, to clean it, to
investigate it, to model it, and finally, it helps you to compile the results in an eye-catching and easy to
understand reports with R markdown. R can also help you build apps to show the results to the world.
• It can use distributed computing to process large datasets parallelly.
• R has packages that allow it to interact with multiple databases of different formats. It can also interact
with various database management systems.
• R is compatible with many other programming languages like C, C++, Java, Python, etc.

35
R – in Professional World Name of Institution
• Social media companies like Facebook and Twitter use R for behavior analysis of
their users, to analyze trends, and to improve online marketing strategies.
• E-commerce companies like Amazon use R to identify potential customers, to judge
the effectiveness of their advertising campaigns, and to analyze customer feedback
and sentiment.
• Banking and finance firms use it for risk assessment, fraud detection, to predict
stock market trends, and to assist in the business decision-making process.
• Uber and other transportation companies use it to improve their pathfinding and
navigation algorithms.
• Airbnb uses R to improve their listings and to isolate factors resulting in better
ratings and business.
• Search engines such as Google use it to improve their search results.
• R is also used in the bioinformatics sector to analyze genetic sequences and to
assist in pharmaceutical research.
• R is a statistical analysis tool. It was created as a research aid and is still used
for academic and statistical research.
• Ford motors use R along with Hadoop to analyze customer feedback and also to
analyze social media to find user opinions that may help them in improving their
products and also to predict market hikes and demands.
36
Name of Institution

• Summary
– In this lab, you have been introduced to
RStudio. You have practiced how to write
and run R code in console.
– Share your Console Snip-it in Teams
Chat Window.
– Thanks – Jitendra Sharma

37
Name of Institution

References to Read
• Text Reading:

• 1. Applied Data Visualization with R and ggplot2, Dr. Tania Moulik, Packt, 2018

•  2. Learn R for Applied Statistics With Data Visualizations, Regressions, and Statistics. Eric Goh Ming Hui,Apress, 2019

• 3. Pro Tableau: A Step-by-Step Guide, Seema Acharya. Subhashini Chellappan, Apress, 2019

• 4. Mastering Tableau, David Baldwin,Packt,2016

• 5. Data Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems, Dr.Osaama
Embarak, Apress, 2018

• 6. Practical Tableau :100 Tips, Tutorials, and Strategies from a Tableau Zen Master, Ryan Sleeper, O’Reilly, 2018

• 7. Python: Data Analytics and Visualization, Understand, evaluate, and visualize data, Packt 2017

• 8. Mastering Python Data Visualization: Generate effective results in a variety of visually appealing charts using the
plotting packages in Python, Kirthi Raman, Packt 2015

38

You might also like