Professional Documents
Culture Documents
Pie Chart
Overview
1 Variable
tables
Bar chart / Bar plot
Categorical Pareto - Chart
Contingency
2 Variables Side-by-Side Chart
tables
Tables
and
visualizations Ordered Stem-and-Leaf
Arrays
Histogram
Frequency
Numerical
Distribution
Percentage Polygon
Distribution
Cumulative Percentage Polygon
Function
Furtwangen University 2
Column Chart
Furtwangen University 3
Column Chart
z <- c(1,1,2,2,2,3,3,3,3,3,3,3,4,4,4,4,5,5,5,6,6)
z1 <- table(z)
plot(z1, xlim=c(0,6), ylim=c(0,10), xlab = "z-Values",
ylab = "Frequencies", main = "Column Chart")
z1
z
123456
237432
Task:
Create a Column Chart for the following data:
"a","a","b","b","b","c","c","c","c",
"c","c","c","d","d","d","d","e","e","e","f","f"
?
Furtwangen University 4
Column Chart
Furtwangen University 5
Column Chart
Task:
Create the figure on the right side for data Cars93
The data are in package „MASS“
?
Furtwangen University 6
Column Chart
install.packages("MASS")
library(MASS) # for Cars93
Furtwangen University 7
Pie Chart
Furtwangen University 8
Pie Chart
For categorical variables, with the table() function it is easy to get the frequencies
table <- table(Cars93$Type)
Furtwangen University 9
Pie Chart
install.packages("MASS")
library("MASS")
Task:
Create the same figure, not with colours, but in different grey colours
with the grey.colours() function
?
Furtwangen University 10
Pie Chart
Furtwangen University 11
Pie Chart
Task
Create the figure on the right, based on the following
statements:
Furtwangen University 12
Pie Chart
Furtwangen University 13
3D Pie Chart
install.packages("plotrix")
library(plotrix)
Furtwangen University 14
Bar Chart
Furtwangen University 15
Bar Chart
library("MASS")
table(Cars93$Type)
barplot(table(Cars93$Type), ylim = c(0, 30),
xlab = "Car Type",
ylab = "Frequencies of the car types",
axis.lty = "solid",
space = 0.1,
main = "Frequencies of Car Types")
Furtwangen University 16
Bar Chart
library("MASS")
library("ggplot2")
Task:
Try to create the same figure with ggplot2,
+ with different colours, according to Type
+ x-axis = only car types, no additional text
+ y-axis = “Absolute frequencies of the car types",
+ header = “Frequencies of the Car Types“
+ header in the center of the figure
Use:
?
ggplot(Cars93, aes(Type))+
geom_bar(fill = "grey80", colour = 'black')
Furtwangen University 17
Bar Chart
Furtwangen University 18
Bar Chart, Relative Frequencies
library("MASS")
library("plotly")
Task:
?
Try to create the same figure with ggplot,
+ with different colours according to palette =
"viridis"
Furtwangen University 19
Bar Chart, Relative Frequencies
Furtwangen University 20
Bar Chart, Absolute Frequencies
library("MASS")
library("plotly")
ggplot(Cars93, aes(Manufacturer))+
geom_bar(fill = "grey70", colour = 'black')+
theme_bw()+
theme(legend.position = "none")+
theme(axis.text.x = element_text(angle = -45,
vjust = 0.5))+
labs(x = "", y = "Absolute frequencies of
manufacturers")+
ggtitle("Absolute Frequencies of Manufacturers")+
theme(plot.title = element_text(hjust = 0.5))
Task:
Try to create the same figure with ggplot,
+ with different colours, given by manufacturer
?
Furtwangen University 21
Bar Chart
Furtwangen University 22
Bar Chart, stacking
library("MASS")
A bad example! library("plotly")
Furtwangen University 23
Bar Chart, stacking
library("MASS")
library("plotly")
Task:
Try to create the same figure with ggplot,
?
+ with different grey-colours, given by manufacturer
Furtwangen University 24
Bar Chart, stacking
Furtwangen University 25
Bar Chart, Horizontal Display
library("MASS")
library("plotly")
Task
Create the same figure
in different colors, given from manufacturer
?
Furtwangen University 26
Bar Chart, Horizontal Display
Furtwangen University 27
Bar Chart, Horizontal Display
library("MASS")
library("plotly")
Task
Create the same figure
?
in different grey colors, legend on the right
Furtwangen University 28
Bar Chart, Horizontal Display
Furtwangen University 29
Pareto Chart
Furtwangen University 30
Pareto Chart
➢ In a Pareto Chart, the frequencies for each category are plotted as vertical bars in decending order, and are combined
with the cumulative percentage line on the same chart
➢ Pareto charts gettheir name from the Pareto Principle: In many data sets, a few categories of a categorical variable
represent the majority of the data, while all other categories represent a relatively small amount of data
Percentage distribution
!
46% of all cars are Small or Midsize
Furtwangen University 31
Pareto Chart
library(MASS) #Cars93
library(qcc) # Pareto chart
tab = table(Cars93$Type)
pareto.chart(tab, xlab = "Car types",
ylab = "Absolute frequency of car types",
col = c("red", "blue"),
cumperc = seq(0, 100, by = 5), # ranges percentages right
ylab2 = "Cumulative relative frequency of car types",
main = "Pareto Chart for Car Types") # title of the chart
Furtwangen University 32
Pareto Chart
Task
The file „Interruptions“ contains the number of network
interruptions per day in a company for 130 days
?
Furtwangen University 33
Pareto Chart
Furtwangen University 34
Pareto Chart
Task
Create the same chart in grey colours
Furtwangen University 35
Pareto Chart
Task
Create the same pareto chart in grey colours ∰
Solution
library(qcc) # Pareto chart
interruptions <- read.csv("D:/HFU Arbeitskreise/Leuchtturm/Buch/Data/Interruptions.csv", sep = ";", header = TRUE)
tab = table(interruptions$Interruptions)
pareto.chart(tab, xlab = "Number of interruptions",
ylab = "Absolute frequency of interruptions",
col = c("grey50"),
cumperc = seq(0, 100, by = 5),
ylab2 = "Cumulative relative frequency of interruptions",
main = "Network Interruptions - Pareto Chart")
Furtwangen University 36
Pareto Chart
Task
Create a Pareto Chart for the interruptions,
Take ggplot()
Furtwangen University 37
Pareto Chart
Task
Create a Pareto Chart for the Car Types,
Take ggplot()
Furtwangen University 38
Pareto Chart
tab = table(Cars93$Type)
tab1 <- as.data.frame((tab))
ggplot(tab1, aes(x = Var1, y = Freq)) +
labs(x = "Car Types", y = "Absolute frequency of car types")+
ggtitle("Pareto Chart for Car Types")+
theme(plot.title = element_text(hjust = 0.5))+
stat_pareto(point.color = "grey50",
point.size = 3,
line.color = "black",
bars.fill = c("grey50", "grey90")) +
theme(axis.text.x = element_text(angle = 45, hjust = 1, vjust=0.5))
Furtwangen University 39
Stem – and - leaf Diagramm
Furtwangen University 40
Stem-and-leaf Diagramm
The prices on the right side have 1 decimal point > View(Cars93$Price)
With the following R code we round off the values to the nearest whole number and sort them in decending order:
We display the rounded values we can take the cat() function and we can tell the number of digits per line with the
fill() function:
cat(round(sort(Cars93$Price)), fill=50)
Furtwangen University 41
Stem-and-leaf Diagramm
stem(Cars93$Price) 7
8
Also 9!
10
➢ The number to the left of the vertical line is the stem, the numbers of the right are the leaves.
➢ The decimal point information means multiply each stem by 10 (or 100, or 10000,…),
then add each leave to the stem
Furtwangen University 42
Stem-and-leaf Diagramm
Example stem(Fastfood$Amount) stem(Screwlength$Width)
Take the files „Fastfood“ and „Screwlenght“
from Felix and create the solutions on the The decimal point is at the | The decimal point is 2 digit(s)
right. to the left of the |
4|9
Take View(function) to display the amounts 5 | 589 830 | 27
and widths. 6 | 3558 832 | 3
7 | 149 834 | 381
Click on the column with the data to sort 8 | 33 836 | 3
them, so you can check your results 9 | 56 838 | 2356
840 | 3559001234459
842 | 000279969
844 | 4778
846 | 002569
848 | 114988
Furtwangen University 43
Data Frames
Furtwangen University 44
Pareto Chart - Repetition
Solution
library(qcc) # Pareto chart
intrupt <- read.csv("D:/HFU
Arbeitskreise/Leuchtturm/Buch/Data/Interruptions.csv",
sep = ";", header = TRUE)
tab = table(interruptions$Interruptions)
pareto.chart(tab, xlab = "Number of interruptions",
ylab = "Absolute frequency of interruptions",
col = c("red", "blue"), # colors of the chart
cumperc = seq(0, 100, by = 5), # ranges on the right
ylab2 = "Cumulative relative frequency of interruptions",
main = "Network Interruptions - Pareto Chart")
Furtwangen University 45
Data Frames
View(intrupt) str(intrupt)
How can we create a data.fame that shows us the number of days with same
number of interruptions?
Furtwangen University 46
Data Frames
To become more comfortable with operations we convert the table to Same display but different
a data frame with data.frame()
class(frame.intrupt)
frame.intrupt <- data.frame(tab.intrupt) [1] "data.frame"
View(frame.intrupt)
𝒏𝒊
𝒇𝒊 = Relative frequency of 𝒙𝒊
𝒏
Furtwangen University 48
Data Frames
Furtwangen University 49
Data Frames
Interpretations
1. On 32 days we had 1 interruption (n2)
2. On 107 days there were less than 2 interruptions (N3)
3. The amount of the days with 1 interruption is 0.25 or
25% (f2)
4. The amount of the days with 2 or 3 interruptions is
0.15 + 0.09 = 0.24 or 24% (f3 + f4)
5. The amount of the days with 2 or less interruptions is
0.83 or 82% (F3)
6. The amount of the days with 3 or more interruptions is
0.17 or 17% (1 – amount of days with 2 or less
interrruptions, 1 – F3 = 1 – 0.83)
Furtwangen University 50
Pareto Chart
Task
In a previous slide we had the Pareto library(MASS) #Cars93
chart with the solution on the right side. library(qcc) # Pareto chart
?
Furtwangen University 51
Histogram
Furtwangen University 52
Histogram
➢ The class boundaries (or class midpoints) are shown on the horizontal axis
➢ The height of the bars represent the frequency, relative frequency, or percentage
Furtwangen University 53
Histogram
8
Histogram: Age Of Students
6
Frequency
4
0
5 15 25 35 45 55 More
Furtwangen University 54
Histogram
Absolute frequencies
Result:
+ prob = TRUE: density function, Relative frequencies,
+ density plot, Density plot
+ Density distribution of the car prices
Furtwangen University 55
Histogram
Furtwangen University 56
Histogram
Task
Take the „state.x77“ dataset from package „datasets“
Create a histogram for the „income“
Solution should be the figure on the right
Furtwangen University 57
Cumulative Frequency Distribution
Furtwangen University 58
Cumulative Frequency Distribution
➢ A cumulative frequency distribution of an intervall (class) is the sum of its own frequency plus all frequencies in the preeding classes
Furtwangen University 59
Cumulative Frequency Distribution
> prices
$breaks
[1] 5 10 15 20 25 30 35 40 45 50 55 60 65 Intervalwidth = 5
$counts
[1] 12 21 29 10 9 5 4 1 1 0 0 1 Absolute frequencies 𝒏𝒊
$density
[1] 0.025806452 0.045161290 0.062365591 0.021505376 0.019354839 0.010752688 0.008602151 Relative frequencies 𝒇𝒊
[8] 0.002150538 0.002150538 0.000000000 0.000000000 0.002150538
$mids
[1] 7.5 12.5 17.5 22.5 27.5 32.5 37.5 42.5 47.5 52.5 57.5 62.5 Midpoints of the classes
$xname
[1] "Cars93$Price" Analyzed variable
$equidist
[1] TRUE Equidistant > interval width =
attr(,"class")
[1] "histogram"
Furtwangen University 60
Cumulative Frequency Distribution
➢ We have access to the information, for our frequency distribution we need the counts
Furtwangen University 62
Empirical Cumulative Frequency Distribution
➢ The empirical cumulative distribution function is more detailt: It doesn´t show the frequency within an interval
➢ For a certain value x it shows the portion of values that are less or equal of this certain value x
➢ For this in R the ecdf() function is availbale
Furtwangen University 63
Empirical Cumulative Frequency Distribution
library(ggplot2)
quants <- quantile(Cars93$Price)
ggplot(NULL, aes(x = Cars93$Price))+
geom_step(stat = "ecdf", color = "red")+
labs(x = "Price * $1.000", y = "Cumulative frequencies")+
geom_vline(aes(xintercept = quants), linetype = "dashed")+
scale_x_continuous(breaks = quants, labels = quants)+
ggtitle("Empirical Cumulative Distribution Function (ECDF) of Car Prices")+
theme_bw()
Furtwangen University 64
Other Examples
Other Examples
Furtwangen University 65
Scatterplots
It´s 3-dimensional:
➢ X-axis = displ Class identifies a colour Fuel
➢ Y-axis = hwy
economy
➢ Colour = class
high
Cars with high fuel economy for their high
engine sizes are 2-seaters
low Engine size high
Furtwangen University 66
Facetting
ggplot(mpg, aes(x = displ, y = hwy, colour = class))+
geom_point()+
facet_wrap(~class)
Furtwangen University 67
Boxplots
Furtwangen University 68