Professional Documents
Culture Documents
2
Populations & Samples
In a recent survey, 10000 DU students at Delhi
were asked if they studied regularly 6 hours.
350 of the students said yes. Identify the
population and the sample.
Responses of students
in survey (sample)
3
Parameters & Statistics
A parameter is a numerical description of a population
characteristic.
Parameter Population
Statistic Sample
4
Parameters & Statistics
• Decide whether the numerical value describes a
population parameter or a sample statistic.
Descriptive Inferential
statistics statistics
Involves the Involves using a
organization, sample to draw
summarization, conclusions about a
and display of data. population.
7
Descriptive and Inferential
Statistics
• In a study, volunteers who had less than 6 hours of sleep were
four times more likely to answer incorrectly on a science test than
were participants who had at least 8 hours of sleep. Decide
which part is the descriptive statistic and what conclusion might
be drawn using inferential statistics.
Qualitative Quantitative
Data Data
Consists of Consists of
attributes, labels, numerical
or nonnumerical measurements or
entries. counts.
Designing a Statistical Study
X X X X ... X
1 2 3 N
N N
24 13 19 26 11
5
93
5
18. 6
Example with a range of data
• When you are given a Age of Number of
range of data, you need males students
to find midpoints.
• To find a midpoint, sum
14≤x<18 94,000
the two endpoints on the 18≤x<20 1,551,000
range and divide by 2. 20≤x<22 1,420,000
• Example 14≤x<18. The
midpoint (14+18)/2=16.
22≤x<25 1,091,000
• The total number of 25≤x<30 865,000
students is 5,542,000. 30≤x<35 521,000
Total 5,542,000
Continuing the previous example
• What we need to do is find the midpoints of the
ranges and then multiply then by the frequency.
So that we can compute the mean.
• The midpoints are 16, 19, 21, 23.5, 27.5, 32.5.
• The mean is
[16(94,000)+19(1,551,000)+21(1,420,000)+
23.5(1,091,000)+27.5(865,000)+32.5(521,000)]
/5,542,000.=22.94
The median.
• Here are a bunch of 10 point quizzes from
MATH F432:
• 9, 6, 7, 10, 9, 4, 9, 2, 9, 10, 7, 7, 5, 6, 7
• As you can see there are 15 data points.
• Now arrange the data points in order from
smallest to largest.
• 2, 4, 5, 6, 6, 7, 7, 7, 7, 9, 9, 9, 9, 10, 10
• Calculate the location of the median:
(15+1)/2=8. The eighth piece of data is the
median. Thus the median is 7.
BITS Pilani, Pilani Campus
The mode
• The mode is the most frequent number in a
collection of data.
• Example A: 3, 10, 8, 8, 7, 8, 10, 3, 3, 3
• The mode of the above example is 3, because 3
has a frequency of 4.
• Example B: 2, 5, 1, 5, 1, 2
• This example has no mode because 1, 2, and 5
have a frequency of 2.
• Example C: 5, 7, 9, 1, 7, 5, 0, 4
• This example has two modes 5 and 7. This is
said to be bimodal.
Mode -- Example
• The mode is 44.
35 41 44 45
• There are more 44s
37 41 44 46
than any other value.
37 43 44 46
39 43 44 46
40 43 44 46
40 43 45 48
BITS Pilani, Pilani Campus
BITS Pilani, Pilani Campus
• Find the mean, median, and Score Number of
mode of the following data: students
• Mean =
[3(10)+10(9)+9(8)+8(7)+10(6)+
2(5)]/42 = 7.57 10 3
• Median: find the location 9 10
(42+1)/2=21.5 Use the 21st and
22nd values in the data set. 8 9
• The 21st and 22nd values are 8
and 8. Thus the median is 7 8
(8+8)/2=8.
• The modes are 6 and 9 since 6 10
they have frequency 10.
5 2
Measures of Dispersion
• Measures of dispersion are descriptive
statistics that describe how similar a set of
scores are to each other
– The more similar the scores are to each other, the
lower the measure of dispersion will be
– The less similar the scores are to each other, the
higher the measure of dispersion will be
– In general, the more spread out a distribution is,
the larger the measure of dispersion will be
Measures of Dispersion
• Which of the 125
100
distributions of scores 75
has the larger 50
25
dispersion? 0
1 2 3 4 5 6 7 8 9 10
• The upper
distribution has more 125
100
dispersion because 75
50
the scores are more 25
0
spread out 1 2 3 4 5 6 7 8 9 10
• That is, they are less
similar to each other
Measures of Dispersion
• There are three main measures of dispersion:
– The Range
– The Quartile
– Variance / Standard Deviation
The Range
• The range is defined as the difference
between the largest score in the set of data
and the smallest score in the set of data, XL -
XS
• What is the range of the following data:
4 8 1 6 6 2 9 3 6 9
• The largest score (XL) is 9; the smallest score
(XS) is 1; the range is XL - XS = 9 - 1 = 8
Range
• The difference between the largest and the
smallest values in a set of data
• Simple to compute 35 41 44 45
• Ignores all data points
37 41 44 46
except the
two extremes 37 43 44 46
• Example: 39 43 44 46
Range
Largest - Smallest 40 43 44 46=
48 - 35 = 13 40 43 45 48
Quartiles
Measures of central tendency that divide a
group of data into four subgroups
Q1 Q2 Q3
• Q1: 25 109114
i (8) 2 Q1 1115
.
100 2
50 116121
• Q2: i (8) 4 Q2 1185
.
100 2
75 122125
• Q3: i (8) 6 Q3 1235
.
100 2
Deviation from the Mean
• Data set: 5, 9, 16, 17, 18
• Mean:
X 65
13
N 5
Deviations from the mean: -8, -4, 3, 4, 5
+5
+3 +4
-8 -4
0 5 10 15 20
Population Variance
• Average of the squared deviations from the
arithmetic mean
X X
X
X
2
2
2
5 -8 64
9 -4 16 N
16 +3 9 130
17 +4 16 5
18 +5 25 2 6 .0
0 130
Population Standard Deviation
• Square root of the
variance
X
2
X X
2
X
2
N
5 -8 64 130
9 -4 16
5
16 +3 9
2 6 .0
17 +4 16
18 +5 25
2
0 130
2 6 .0
5 .1
Coefficient of Variation
• Ratio of the standard deviation to the mean,
expressed as a percentage
• Measurement of relative dispersion
C.V . 100
Coefficient of Variation
29
1
84
2
1
4.6 2
10
100 100
CV
. .
1
1
CV
. .
2
2
1 2
4.6 10
100 100
29 84
1586
. 1190
.
Measures of Shape
• Skewness
– Absence of symmetry
– Extreme values in one side of a distribution
• Kurtosis
– Peakedness of a distribution
• Box and Whisker Plots
– Graphic display of a distribution
– Reveals skewness
Skewness
3 Md
S
• If S < 0, the distribution is negatively skewed
(skewed to the left).
• If S = 0, the distribution is symmetric (not skewed).
• If S > 0, the distribution is positively skewed
(skewed to the right).
Coefficient of Skewness
1
23 2
26 3
29
M
d1 26 M
d2 26 M
d3 26
1
12.3 2
12.3 3
12.3
3 1 M
d1
3 2 M d2
3 3 M
d3
S 1
S 2
S 3
1 2 3
Leptokurtic
Mesokurtic
Platykurtic
Box and Whisker Plot
• Outer Fences
– Lower outer fence = Q1 - 3.0 IQR
– Upper outer fence = Q3 + 3.0 IQR
Box and Whisker Plot
Minimum Q1 Q2 Q3 Maximum
Skewness: Box and Whisker Plots, and
Coefficient of Skewness
S<0 S=0 S>0
9.69, 13.16, 17.09, 18.12, 23.7, 24.07, 24.29, 26.43, 30.75, 31.54,
35.07, 36.99, 40.32, 42.51, 45.64, 48.22, 49.98, 50.06, 55.02, 57.00,
58.41, 61.31, 64.25, 65.24, 66.14, 67.68, 81.40, 90.80, 92.17, 92.42,
100.82, 101.94, 103.61, 106.28, 106.8, 108.69, 114.61, 120.86, 124.54,
143.27, 143.75, 149.64, 167.79, 182.5, 192.55, 193.53, 271.57, 292.61,
312.45, 352.09, 371.47, 444.68, 460.86, 563.92, 690.11,826.54, 1529.35
The data set consists of observations on shower-flow rate (L/min) for n=129
houses in Delhi.
4.6, 12.3, 7.1, 7.0, 4.0, 9.2, 6.7, 6.9, 11.5, 5.1, 11.2, 10.5, 14.3, 8.0, 8.8, 6.4,
5.1, 5.6, 9.6, 7.5, 7.5, 6.2, 5.8, 2.3, 3.4, 10.4, 9.8, 6.6, 3.7, 6.4, 8.3, 6.5, 7.6,
9.3, 9.2, 7.3, 5.0, 6.3, 13.8, 6.2, 5.4, 4.8, 7.5, 6.0, 6.9, 10.8, 7.5, 6.6, 5.0, 3.3,
7.6, 3.9, 11.9, 2.2, 15.0, 7.2, 6.1, 15.3, 18.9, 7.2, 5.4, 5.5, 4.3, 9.0, 12.7, 11.3,
7.4, 5.0, 3.5, 8.2, 8.4, 7.3, 10.3, 11.9, 6.0, 5.6, 9.5, 9.3, 10.4, 9.7, 5.1, 6.7,
10.2, 6.2, 8.4, 7.0, 4.8, 5.6, 10.5, 14.6, 10.8, 15.5, 7.5, 6.4, 3.4, 5.5, 6.6, 5.9,
15.0, 9.6, 7.8, 7.0, 6.9, 4.1, 3.6, 11.9, 3.7, 5.7, 6.8, 11.3, 9.3, 9.6, 10.4, 9.3,
6.9, 9.8, 9.1, 10.6, 4.5, 6.2, 8.3, 3.2, 4.9, 5.0, 6.0, 8.2, 6.3, 3.8, 6.0.
n(T ) 46
0.46
n( S ) 100
n=100000
n( SampleSpace S) 100000
n( H ) 49868 , n(T ) 50132
n( H ) 49868
0.49868
n( S ) 100000
n(T ) 50132
0.50132
n( S ) 100000
n=10000000000
n( SampleSpace S) 1000000000 0
n( H ) 0.5, n(T ) 0.5
n( H )
0.5
n( S )
n (T )
0 .5
n( S )
Uniform Distribution
n
n( SampleSpace S)
n( H ) , n(T )
n( H ) n(T )
n( S ) n( S )
n( H ) n( H )
Lim
n( S ) n n( S )
n(T ) n(T )
Lim
n( S ) n n( S )
Theoretical Probability
n( H )
P ( H ) Lim 0.5
n n( S )
n(T )
P (T ) Lim 0.5
n n( S )
Mutually Exclusive Events
H T
n( H T ) 0
n( H T )
P( H T ) 0
n( S )
P( H T ) P( H ) P(T ) 1
Axioms of Probability
• 1. 0 P( E ) 1
• 2. P( S ) 1
P( H ) 0.5
P(T ) 0.5
P( H ) P(T )
Equally Likely
1 Uniform Distribution
P( Ei )
N
E E
C
P( E ) 1 P( E )
C
EE S
C
S
C
E
E
E F P( E ) P( F )
S
F
E
Probability as a measure of belief
• Belief in proposition (f) can be measured in
terms of number between 0 (impossible) and
1 (certain).
• f has a probability between 0 and 1 , does not
mean it is true to some degree, but means
that we are ignorant of its truth value.
Probability as a measurement of
uncertainty
• Uncertainty : The lack of certainty, a state of limited
knowledge where it is impossible to exactly describe
the existing state, a future outcome, or more than
one possible outcome. ...
• Quantification of uncertainty in terms of probability.
• Uncertainty arises in partially observable and/or
stochastic environments, as well as due to ignorance,
indolence, or both.
Exp: Rolling two dice
Sample Space
Equally Likely
Non Uniform Distribution
Observations
• (i) The outcomes (1, 1), (2, 2), (3, 3), (4, 4), (5,
5) and (6, 6) are called doublets.
AC A
(3,6), (4,5), (4,6), (5,4), (5,5)
(5,6), (6,3), (6,4), (6,5), (6,6)
10 5
P( A)
36 18
26 13
P( A )
C
36 18
(3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (5,1), (5,2), (5,3), (6,1), (6,2)
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6)
BC B
(6,1), (6,2), (6,3), (6,4), (6,5), (6,6)
6 1
P( B)
36 6
30 5
P( B )
C
36 6
(3,1), (3,2), (3,3), (3,4), (3,5), (3,6), (4,1), (4,2), (4,3),
(4,4), (4,5), (4,6), (5,1), (5,2), (5,3), (5,4), (5,5), (5,6)
(1,1), (1,2), (1,3), (1,4), (1,5), (1,6), (2,1), (2,2), (2,3), (2,4), (2,5), (2,6)
P( D) P( B | A)
4 2
10 5
A
4 1 (6,3), (6,4), (3,6), (4,5), (4,6),
P( A B)
36 9 (5,4), (5,5), (5,6)
(6,5), (6,6)
B P(C ) P( A | B)
4 2
6 3
(6,1), (6,2)
2 1
P(C C ) P( AC | B)
6 3 6 3
P( D C ) P( B C | A)
10 5
(3,1), (3,2), (3,3), (3,4), (3,5), (4,1), (4,2), (4,3), (4,4), (5,1), (5,2), (5,3)
2
5 1 P( A | B)
P( A) P( B) 3
18 6
13
2 1 P( A )
C
P( B | A) P( A B) 18
5 9
5 1 3
P( B ) C P( A | B)
C P( B C | A)
6 3 5
1 1
P( A) P( B | A) P( B) P( A | B)
9 9
P( A B) P( A) P( B | A) P( B) P( A | B)
P( A B) P( A B)
P( A | B) P( B | A) Conditional Probability
P( B) P( A)
Tree Diagram
3
P( B | A)
C
5
5
P( A) 2
18 P( B | A)
5
2
1 P( A | B)
P( B) 3
6
1
P( A | B)
C
3
Tree Diagram
Multiplication Rule
n n1
P( Ei ) P( E1 ) P( E2 | E1 ) P( E3 | E1 E2 )...P( En | Ei )
i 1 i 1
Thanks