Professional Documents
Culture Documents
Contents
I
I
I
Recommended reading
I
Pe
na, D., Romo, J., Introducci
on a la Estadstica para las Ciencias
Sociales
I
Chapters 4, 5
Chapter 2
piechart
barchart
Numerical
histogram
polygon
boxplot
Class
A
B
AB
O
Total
Absolute
Frequency
12
11
8
9
40
Relative
Frequency
0.300
0.275
0.200
0.225
1
Piechart
Example 1 cont.:
I Each slice is a fraction of the total size of the pie
I Many softwares rank slices alphabetically
I Although pretty harder to read than barcharts
I Avoid 3D piecharts, for those the area in the background seems to
be smaller than the area in the foreground
O 22.5%
B 27.5%
A 30%
AB 20%
Class
VU
U
S
VS
Total
Absolute
Frequency
62
108
319
412
901
Relative
Frequency
0.07
0.12
0.35
0.46
1
Cumulative
Absolute
Frequency
62
170
489
901
Cumulative
Relative
Frequency
0.07
0.19
0.54
1
Barchart
200
100
0
FREQUENCY
300
400
Example 2 cont.:
I Bars are of the same width and equally-spaced, with the heights
corresponding to the frequencies
I There are gaps between the bars
I Bars are labeled with class names
I Many softwares rank bars alphabetically
VU
VS
Barchart
12
10
8
6
4
2
0
Barcharts can also be constructed for discrete data if there are not
too many values
This is a barchart for Example 3 of Ch.1 where we looked at the
number of leaves attacked by a pest for a sample of 50 plants
FREQUENCY
10
Midpoint
15
25
35
45
15
ni
3
6
5
4
2
20
fi
0.15
0.30
0.25
0.20
0.10
1
Ni
3
9
14
18
20
Fi
0.15
0.45
0.70
0.90
1
Polygon
FREQUENCIES
10
20
30
40
TEMP (F)
50
60
70
0.030
0.020
0.010
0.000
10
20
30
40
TEMP (F)
50
60
70
Center
mean
median
mode
New notation:
n
X
range
interquartile range
variance
standard deviation
coeff. of variation
Others
quartiles
percentiles
xi = x1 + x2 + . . . + xn
i=1
P
( : sum, i = 1: the lower limit, n: the upper limit, xi : example of a
formula depending on i)
Example:
3
X
i 2 = (1)2 + 02 + 12 + 22 + 32 = 15
i=1
Population mean
PN
=
Sample mean
xi
N
Pn
x =
I
i=1
i=1
xi
x1 + . . . + xN
N
x1 + . . . + xn
n
Example: X : 3, 1, 5, 4, 2,
x =
Y : 3, 1, 5, 4, 200
3+1+5+4+2
=3
5
y =
3 + 1 + 5 + 4 + 200
= 42.6!
5
M = x((5+1)/2) =
3rd smallest
z}|{
x(3)
=3
M=
x(6/2) + x(6/2+1)
2
LEFTSKEWED
x<M
SYMMETRIC
x=M
RIGHTSKEWED
M<x
Sensitive to outliers
xmin
Q1
25%
12
MEDIAN
(Q2)
25%
24
xmax
Q3
25%
31
IQR=18
25%
42
58
Quartiles split the ranked data into four segments with an equal number
of values per segment
Example: Given observations 22, 18, 17, 16, 16, 13, 12, 21, 11 (n = 9), first rank
the data 11, 12, 13, 16, 16 , 17, 18, 21, 22, then identify the positions
Q1 = x(2.5) = x(3) = 12
Q2 = 16
Q3 = x(7.5) = x(8) = 21
Population variance
2 =
PN
i=1
(xi )2
N
Sample variance
2 =
faster to calculate
}|
{
zP
n
2
x )2
)
i=1 xi n(
i=1 (xi x
=
n
n
Pn
divided by n
2 =
n1 2
s
n
I
I
= 2
=
2
s = s2
Has the same units as the original data, whilst variance is in units2
124
= 15.5
8
n
X
i=1
n
X
y =
124
= 15.5
8
z =
124
= 15.5
8
i=1
n
X
i=1
sx2
Pn
=
i=1
xi2 n(
x )2
2000 8(15.5)2
78
=
=
= 11.1429 sx = 3.3381
n1
81
7
1928 8(15.5)2
6
sy2 =
= = 0.8571 sy = 0.9258
81
7
2
2068
8(15.5)
146
sz2 =
=
= 20.8571 sz = 4.5670
81
7
11
12
13
14
15
16
17
18
19
20
21
18
19
20
21
19
20
y = 15.5 sy = 0.9
11
12
13
14
15
16
17
z = 15.5 sz = 4.6
11
12
13
14
15
16
17
18
21
and
s2 =
Pk
i=1
xi2 ni n
x2
n1
Empirical rule
If the data is bell-shaped (normal), that is, symmetric and with light
tails, the following rule holds:
I
s
|
x|