You are on page 1of 10

MATH 2411: Desciptive Statistics and Graphics

with R

CAI Mingxuan, ZHAO Jia

Department of Mathematics, HKUST

2020/2/22
Summary statistics for a single group

I simple summary statistics

set.seed(1234)
x <- rnorm(1000)
mean(x) #mean

## [1] -0.0265972

sd(x) #standard deviation

## [1] 0.9973377

var(x) #variance

## [1] 0.9946825

median(x) #median

## [1] -0.03979419
Summary statistics for a single group

I simple summary statistics

quantile(x) #quantile

By default you get the minimum, the maximum, and the three
quartiles — the 0.25, 0.50, and 0.75 quantiles.
pvec <- seq(0,1,0.1)
quantile(x, pvec)

It is also possible to obtain other quantiles; this is done by adding an


argument containing the desired percentage points.
Summarize an entire data frame
I Galton Height data
Galton <- read.table("/Users/jiazhao/Documents/HKUST/TA/20Spr
summary(Galton)
## family father mother midparentHei
## 185 : 15 Min. :62.0 Min. :58.00 Min. :64.4
## 066 : 11 1st Qu.:68.0 1st Qu.:63.00 1st Qu.:68.1
## 120 : 11 Median :69.0 Median :64.00 Median :69.2
## 130 : 11 Mean :69.2 Mean :64.09 Mean :69.2
## 166 : 11 3rd Qu.:71.0 3rd Qu.:65.88 3rd Qu.:70.1
## 097 : 10 Max. :78.5 Max. :70.50 Max. :75.4
## (Other):865
## children childNum gender childHeig
## Min. : 1.000 Min. : 1.000 female:453 Min. :56
## 1st Qu.: 4.000 1st Qu.: 2.000 male :481 1st Qu.:64
## Median : 6.000 Median : 3.000 Median :66
## Mean : 6.171 Mean : 3.586 Mean :66
## 3rd Qu.: 8.000 3rd Qu.: 5.000 3rd Qu.:69
## Max. :15.000 Max. :15.000 Max. :79
##
Graphical display of distributions
I Histograms
hist(x)
Histogram of x
200
150
Frequency

100
50
0

−3 −2 −1 0 1 2 3

x
Graphical display of distributions
I Empirical cumulative distribution
y <- rnorm(50)
n <- length(y)
plot(sort(y),(1:n)/n,type="s",ylim=c(0,1))
1.0
0.8
0.6
(1:n)/n

0.4
0.2
0.0

−2 −1 0 1 2

sort(y)
Graphical display of distributions

I Q-Q plots One purpose of calculating the empirical cumulative


distribution function (c.d.f.) is to see whether data can be assumed
normally distributed. For a better assessment, you might plot the kth
smallest observation against the expected value of the kth smallest
observation out of n in a standard normal distribution.
Graphical display of distributions
I The point is that in this way you would expect to obtain a straight
line if data come from a normal distribution with any mean and
standard deviation.
qqnorm(y)
Normal Q−Q Plot
2
1
Sample Quantiles

0
−1
−2

−2 −1 0 1 2
Graphics for grouped data

I In dealing with grouped data, it is important to be able not only to


create plots for each group but also to compare the plots between
groups.
I Histograms

h1 <- min(Galton$mother)
h2 <- max(Galton$father)
hist(Galton$father,breaks = 100,xlim = c(h1,h2),
ylim=c(0,150),col="white")
hist(Galton$mother,breaks = 100,xlim = c(h1,h2),
ylim=c(0,150),col="grey")
Graphics for grouped data
I Parallel boxplots

boxplot(Galton[c("father","mother")])
75
70
65
60

father mother

You might also like