You are on page 1of 243

STATISTICS

--------------------------------------------
REVIEW NOTES/ EXERCISES

Compiled by:
Melissa E. Agulto, Ph.D.
Professor, Central Luzon State University
2015 Review Class
(a)
STATISTICS: REVIEW NOTES and
EXERCISES
I. BASIC CONSIDERATIONS
II. ELEMENTS OF SAMPLING AND
DESCRIPTIVE STATISTICS
III. SETS AND PROBABILITY
IV. RANDOM VARIABLES
V. SAMPLING DISTRIBUTIONS
VI. ESTIMATION
VII. TESTS OF HYPOTHESES
VIII. REGRESSION AND CORRELATION
I. BASIC CONSIDERATIONS
Statistics, Defined

Statistics deals with:


a. Collection of data
b. Presentation of data
c. Analysis and interpretation of data
d. A set of numerical information
e. All of the above
Fields/Phases of Statistics

Identify whether the following are within the scope of


Descriptive or Inferential Statistics:

1. Collect data, e.g., Survey


2. Present data, e.g., Tables and graphs
3. Estimation, e.g., Estimate the population mean weight
using the sample mean weight
4. Hypothesis testing, e.g., Test the claim that the population
mean weight is 120 pounds
5. Characterize data, e.g., Sample mean
Statistical Inference

The main objective of statistical inference is to:


a. Give the descriptive measures of the parent population
b. Make inferences about the population based
on population values
c. Give inferences about the population based on sample
values of population
d. Give the descriptive measures of the sample data
Exercises: Identify each of the following examples as
attribute (qualitative) or numerical (quantitative)
variables.

1. The residence hall for each student in a statistics


class
2. The amount of gasoline pumped by the next 10
customers at the local gas station
3. The amount of rainfall collected from a storm in each
of the rainfall stations maintained in a watershed
4. The color of the t-shirt worn by each of 20 students.
5. The length of time to complete a statistics homework
assignment
6. The province in which each truck is registered when
stopped and inspected at a weigh station.
4 Basic Levels of Measurement Scales
Qualitative Scales:
1. Nominal No rank or order to the categories
–Gender
2. Ordinal All the characteristics of a nominal scale,
plus there is a ranking among the categories:
–First place, Second place, Third place
Quantitative Scales:
3. Interval
• Designates an equal-interval ordering
• No true zero point
–Fahrenheit temperature scale: 0 C does not mean no temp.
–30 degrees C is not twice as warm as 15 degrees C
4. Ratio
• All the above plus, a true zero point
– Wealth: PhP 0 means no money
– PhP100 is twice as much as PhP50
Exercises
CLASSIFY THE FOLLOWING AS TO QUALITATIVE OR QUANTITATIVE
MEASUREMENT. THEN STATE THE LEVEL OF MEASUREMENT.

• Eye Color (blue, brown, green, hazel)


• Rating scale (poor, good, excellent)
• GPA (2.52, 1.52)
• Temperature
• Student’s level of standing (freshman, sophomore,
junior , senior)
• Nationality
• Sound level (40 decibels, 80 db)
• Score in a quiz
• Academic rank of faculty in a university
Instructor, Asst Prof, Assoc Prof, Professor
Quantitative Data
Discrete Data – those that are
usually associated with count
values
Continuous Data – those that are
usually associated with
measurement values
Classify the following data as either discrete or continuous:

1. Number of AE board exam reviewees


2. Number of times you have been absent from the
review class
3. Temperature (33.7 degrees C)
4. Amount of rainfall
5. Travel speed of a vehicle
6. Rainfall duration
7. Number of thunderstorm occurrences in
a month
8. Time devoted for reading review materials in a day
Data Gathering

Ways of Data Gathering:

Objective Method – done by getting


measurements, making direct observations
Subjective Method – done by asking the
respondent for the data/information required
Use of Existing Records – done by utilizing the
data previously gathered by certain
persons/agencies
Data Gathering
Types of Data:

Primary Data – data gathered by the user directly


from the units in the universe
Secondary Data – data gathered not directly from
the units in the universe
Raw Data – the data as gathered/collected
Array – arrangement of raw numerical data in
ascending or descending order of
magnitude
Grouped Data – arrangement/organization of data
in a frequency distribution form
Types of Graphs
• Common graphical representations of tabulations of
data: (a) _________ (b) ___________

(a) (b)
(c ) _________
(d) __________
(e) Stem-and-Leaf Plot

The following data represents the ages of 30


students in a statistics class. Display the data in
a stem-and-leaf plot

.
Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Stem-and-Leaf Plot

Ages of Students
Key: 1|8 = 18
1 888999
2 0011124799 Most of the values lie between 20
3 002234789 and 39.
4 469
5 14
This graph allows us to see the
shape of the data as well as the
actual values.
Exercise. Try your own Stem and Leaf Plot with the
following temperatures (deg F)

77 80 82 68
57 50 62 69
67 70 65 76
87 82 83 79

Temperatures
Tens Ones
5 07
6 25789
7 0679
8 02237
(f) _______

Ages of Students

15 18 21 24 27 30 33 36 39 42 45 48 51 54 57

From this graph, we can conclude that most of the values


lie between 18 and 32.
A ______________is a table that shows classes or intervals of
data with a count of the number in each class. The frequency f of a
class is the number of data points in the class.

Class Frequency, f
1–4 4
Lower and 5–8 5
Upper Class
________ 9 – 12 3 Frequencies
13 – 16 4
17 – 20 2
The _____________ is the distance between lower (or upper)
limits of consecutive classes.

Class Frequency, f
1–4 4
5–1=4 5–8 5
9–5=4 9 – 12 3
13 – 9 = 4 13 – 16 4
17 – 13 = 4 17 – 20 2

The class _______ is 4.

The __________ is the difference between the maximum and


minimum data entries.
Constructing a Frequency Distribution
The following data represents the ages of 30 students in a statistics
class. Construct a frequency distribution that has five classes.

Ages of Students
18 20 21 27 29 20
19 30 32 19 34 19
24 29 18 37 38 22
30 39 32 44 33 46
54 49 18 51 21 21
Continued.
Constructing a Frequency Distribution

1. The number of classes (5) is stated in the problem.

2. The minimum data entry is 18 and maximum entry is 54, so the


range is 36. Divide the range by the number of classes to find
the class width.

36 = 7.2 Round up to 8.
Class width =
5

Continued.
Constructing a Frequency Distribution

3. The minimum data entry of 18 may be used for the lower limit of
the first class. To find the lower class limits of the remaining
classes, add the width (8) to each lower limit.

The lower class limits are 18, 26, 34, 42, and 50.
The upper class limits are 25, 33, 41, 49, and 57.

4. Make a tally mark for each data entry in the appropriate class.

5. The number of tally marks for a class is the frequency for that
class.
Continued.
Constructing a Frequency Distribution
Number of
Ages students
Ages of Students
Class Tally Frequency, f
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
Check that the
50 – 57 2 sum equals the
number in the
 f  30
sample.
Midpoint

Find the midpoints for the “Ages of Students” frequency


distribution. Another term for midpoint is ____________.
Ages of Students
Class Frequency, f Midpoint
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
50 – 57 2
 f  30
Midpoint

Find the midpoints for the “Ages of Students” frequency


distribution.
Ages of Students
Class Frequency, f Midpoint
18 + 25 = 43
18 – 25 13 21.5
43  2 = 21.5
26 – 33 8 29.5
34 – 41 4 37.5
42 – 49 3 45.5
50 – 57 2 53.5
 f  30
Relative Frequency

Find the relative frequencies for the “Ages of Students” frequency


distribution.

Relative
Class Frequency, f Frequency
18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
50 – 57 2
 f  30
Relative Frequency

Find the relative frequencies for the “Ages of Students” frequency


distribution.

Relative Portion of
Class Frequency, f Frequency students
18 – 25 13 0.433 f 13

26 – 33 8 0.267 n 30
34 – 41 4 0.133  0.433
42 – 49 3 0.1
50 – 57 2 0.067
f
 f  30  1
n
Cumulative Frequency
The cumulative frequency of a class is the sum of the frequency
for that class and all the previous classes.

Ages of Students
Cumulative
Class Frequency, f Frequency
18 – 25 13
26 – 33 + 8
34 – 41 + 4
42 – 49 + 3
50 – 57 + 2
 f  30
Cumulative Frequency
The cumulative frequency of a class is the sum of the frequency
for that class and all the previous classes.

Ages of Students
Cumulative
Class Frequency, f Frequency
18 – 25 13 13
26 – 33 +8 21
34 – 41 +4 25
42 – 49 + 3 28
Total number of
50 – 57 + 2 30 students
 f  30
Drawn below is a _______________ for the “Ages of
Students” frequency distribution. The class boundaries are
used..

14 13 Ages of Students
12
10
8
8

f 6
4
4 3
2 2

0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
Frequency Histogram

Draw a frequency histogram for the “Ages of Students”


frequency distribution. Use the class boundaries.

14 13 Ages of Students
12
10
8
8

f 6
4
4 3
2 2

0
17.5 25.5 33.5 41.5 49.5 57.5
Broken axis
Age (in years)
A_______________ is a line graph that emphasizes the
continuous change in frequencies.

14
Ages of Students
12
10
8 Line is extended to
the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints
Frequency Polygon

A frequency polygon is a line graph that emphasizes the


continuous change in frequencies.
14
Ages of Students
12
10
8 Line is extended to
the x-axis.
f 6
4
2
0
13.5 21.5 29.5 37.5 45.5 53.5 61.5
Broken axis
Age (in years) Midpoints
Drawn below is a______________ . It has the same shape
and the same horizontal scale as the corresponding
frequency histogram.

0.5
0.433
(portion of students)
Relative frequency

0.4 Ages of Students


0.3
0.267
0.2
0.133
0.1
0.1 0.067
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Relative Frequency Histogram
A relative frequency histogram has the same shape and
the same horizontal scale as the corresponding
frequency histogram.

0.5
0.433
(portion of students)
Relative frequency

0.4 Ages of Students


0.3
0.267
0.2
0.133
0.1
0.1 0.067
0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
A_______________is a line graph that displays the
cumulative frequency of each class at its upper class
boundary.

30 Ages of Students
Cumulative frequency
(portion of students)

24

18
The graph ends at
the upper
12 boundary of the
last class.
6

0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
Cumulative Frequency Graph
A cumulative frequency graph or ogive, is a line graph
that displays the cumulative frequency of each class at its
upper class boundary.

30 Ages of Students
Cumulative frequency
(portion of students)

24

18
The graph ends at
the upper
12 boundary of the
last class.
6

0
17.5 25.5 33.5 41.5 49.5 57.5
Age (in years)
II. ELEMENTS OF SAMPLING AND
DESCRIPTIVE STATISTICS
Sampling Methods
Non-probability Sampling – the elements of the
universe or population have no known
chance of being taken in the sample
Probability Sampling –assigns a known probability
of selection for all possible samples; allows
for the computation of sampling error, or the
error in inference inherent to the fact that
what was observed was only a sample
Classification of Sampling Methods

Sampling
Methods

Probability Non-
Samples probability

Systematic Stratified Convenience Snowball

Simple
Cluster Judgment Quota
Random
Sampling Procedures

Probability Sampling Procedures:


Simple Random Sampling – the elements of the universe
or population have equal chance of being included
in the sample; applicable when the universe is
believed to be homogeneous
Stratified Random sampling – the elements of the
universe/population are first grouped into strata
and simple random samples are taken from each
stratum; applicable under the following situations:
information is required for certain subdivisions of
the population; the population is extremely
heterogeneous; the problem of sampling may
differ in different parts of the population
Sampling Procedures

Probability Sampling Procedures:


Cluster Sampling – the elements are grouped into
clusters, for example, geographical location, and
a simple random sample of clusters is selected
and all the elements of the selected clusters are
included in the sample
Systematic Sampling – adopts a skipping pattern in the
selection of the sample units; the only sampling
scheme that allows sample selection without a
sampling frame
Sampling Procedures

Non-Probability Sampling Procedures:


Purposive Sampling - the process whereby the
researcher selects a sample based on experience
or knowledge of the group to be sampled …called
“judgment” sampling; relies upon belief
that participants fit characteristics.

Quota Sampling – the process whereby a researcher


gathers data from individuals possessing
identified characteristics; the process continues
until the researcher has the required quota;
emphasizes representation of specific characteristics.
Sampling Procedures

Non-Probability Sampling Procedures:


Convenience Sampling – relies upon convenience and
access

Snowball sampling - selecting a few individuals who


can identify other individuals who can identify
still other individuals who might be good
participants for a study; relies upon respondent
referrals of others with like characteristics.
Basic Terms
Exercise: A college professor is interested in learning
about the average age of his students in his hydrology
class. Identify the basic terms in this situation.

The population is __________..


A sample is __________.
The variable is __________.
One data would be the__________.
The parameter of interest is __________.
The statistic is the__________.
Exercise: A college professor is interested in learning
about the average age of his students in his hydrology
class. Identify the basic terms in this situation.

The population is the age of all his students in his hydrology class
A sample is any subset of that population. For example, we might
select 10 students and determine their age.
The variable is the “age” of each student in the class.
One data would be the age of a specific student.
The parameter of interest is the “average” age of all students in his
hydrology class.
The statistic is the “average” age for all students in the sample.
Some Descriptive Statistics
MEASURES OF CENTAL TENDENCY – values
computed from the data that tend to center or
cluster around
Arithmetic Mean – the arithmetic average of all the values
Ungrouped data

X = ∑ Xi / n
where n = sample size
Grouped data
X = ∑ fi Xi / ∑ fi

where fi = frequency of the ith class


Xi = class mark for the ith class
Mean
Exercise:
• The following are the ages of all seven employees of
a small company:

53 32 61 57 39 44 57
Calculate the population mean.
Mean
Exercise:
• The following are the ages of all seven employees of
a small company:

53 32 61 57 39 44 57
Calculate the population mean.

x 343 Add the ages and


 
N 7 divide by 7.
 49 year s

The mean age of the employees is 49 years.


Arithmetic Mean of Group Data
Class midpoint

Class x f
18 – 25 21.5 13
26 – 33 29.5 8
34 – 41 37.5 4
42 – 49 45.5 3
50 – 57 53.5 2
n = 30

The mean age of the students is _____ 3 years.


Arithmetic Mean of Group Data
Class midpoint

Class x f (x · f )
18 – 25 21.5 13 279.5
26 – 33 29.5 8 236.0
34 – 41 37.5 4 150.0
42 – 49 45.5 3 136.5
50 – 57 53.5 2 107.0
n = 30 Σ = 909.0

(x  f )  909  30.3


x 
n 30
The mean age of the students is 30.3 years.
Weighted Mean
Exercise:

Grades in a statistics class are weighted as follows:


Tests are worth 50% of the grade, homework is worth 30% of
the grade and the final is worth 20% of the grade. A student
receives a total of 80 points on tests, 100 points on
homework, and 85 points on his final. What is his current
grade?
Weighted Mean
Begin by organizing the data in a table.

Source Score, x Weight, w xw


Tests 80 0.50 40
Homework 100 0.30 30
Final 85 0.20 17

(x w ) 87  0.87
x  
w 100

The student’s current grade is 87%.


Mode

Find the mode of the ages of the seven employees.

53 32 61 57 39 44 57

__________ is a data entry that is far removed from the other


entries in the data set.
Mode
The mode of a data set is the data entry or category that occurs
with the greatest frequency. If no entry is repeated, the data set has
no mode. If two entries occur with the same greatest frequency,
each entry is a mode and the data set is called bimodal.

Exercise:
Find the mode of the ages of the seven employees.
53 32 61 57 39 44 57
The mode is 57 because it occurs the most times.

An outlier is a data entry that is far removed from the other entries
in the data set.
Exercise : Find Mode

Slope Angle Midpoint Frequency


(°) (x) (f)
0-4 2 6
5-9 7 12
10-14 12 7
15-19 17 5
20-24 22 0
Total n = 30
Exercise : Find Mode

Mo = LMo + [d1/(d1+d2)] (w)


Mo = 4.5 + [6/(6+5)] (5)

Slope Angle Midpoint Frequency


(°) (x) (f)
0-4 2 6
5-9 7 12
10-14 12 7
15-19 17 5
20-24 22 0
Total n = 30
Mode – the value that occurs most frequently
(absolute mode). There could be several
modes; a relative mode is a value that occurs
more frequently than neighboring values even if
it is not an absolute mode

Mo = LMo + [d1/(d1+d2)] (w)

where: LMo = lower limit of the modal class


d1 = the difference sign neglected between
the frequency of the modal class and
the frequency of the preceding class
d2 = the difference, sign neglected,
between the frequency of the modal
class and the frequency of the
following class
w = width of the modal class
Median
Exercise:

Calculate the median age of the seven employees.

53 32 61 57 39 44 57

The median age of the employees is _____ years.


Median
Exercise:

Calculate the median age of the seven employees.

53 32 61 57 39 44 57

To find the median, sort the data.


32 39 44 53 57 57 61

The median age of the employees is 53 years.


Example: Find Median

Age in years Number of births Cumulative number of


births
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
All ages 5763 -
Example: Find Median

Md = LMd + [(N+1)/2 – S] ] w = 24.5 + [ 2882 – 2585] 5


fMd 1737

Age in years Number of births Cumulative number of


births
14.5-19.5 677 677
19.5-24.5 1908 2585
24.5-29.5 1737 4332
29.5-34.5 1040 5362
34.5-39.5 294 5656
39.5-44.5 91 5747
44.5-49.5 16 5763
All ages 5763 -
Median – Half of the observations should have a
value less than the median and half should
have a value greater than the median

For grouped data,

(N+1)/2 - S
Md = LMd + w
fMd

Where:
LMd = lower limit of the median class
N = number of observations in the sample
S = sum of the frequencies in all classes
preceding the median class
fMd = frequency of the median class
w = width of the median class
Finding Quartiles

The quiz scores for 15 students is listed below. Find the first,
second and third quartiles of the scores.

28 43 48 51 43 30 55 44 48 33 45 37 37 42 38

About one fourth of the students scores _____ or less;


about one half score _____ or less; and
about three fourths score _____8 or less.
Finding Quartiles

Exercise:
The quiz scores for 15 students is listed below. Find the first,
second and third quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38

Order the data.


Lower half Upper half
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55

Q1 Q2 Q3

About one fourth of the students scores 37 or less; about one half score
43 or less; and about three fourths score 48 or less.
MEASURES OF DISPERSION – values used to
describe the extent of dispersion or variability
of data
Range – the difference between the largest and the
smallest measurement in the data set
R = Xmax - Xmin

Mean Deviation, MD – the arithmetic mean of the absolute


deviations from the mean

∑ │Xi - X│
MD =
n
Exercise : Consider the following set of test scores for Rob in Statistics

40, 58, 60, 82

Range = 82 – 40
= 42 i

Mean = (40+58+60+82)/4
= 240/4 = 60
Mean Deviation = 44/4
_ = 11
X / X – X/
40 20
58 2
60 0
82 22
Total 44
Population Variance and Standard Deviation

Exercise: Find the population variance and standard deviation for


the following set of data. The population mean is 61.

x
56
58
61
63
67
Σx = 305
Finding the Population Variance and
Standard Deviation
Find the population variance and standard deviation for the
following set of data. The population mean is 61.

Always positive!

Deviation Squared SS = Σ(x – μ)2 = 74


x
x–μ (x – μ)2
x  μ 
2
56 –5 25 74
2    14.8
58 –3 9 N 5
61 0 0
63 2 4
 
x
μ 1
4
.
83.8
2
3.85
67 6 36 N

Σx = 305 Σ(x – μ) = 0 Σ(x – μ)2 = 74


σ  3.85
Coefficient of Variation, CV
S
CV = ------- x 100

Exercise: Calculate the coefficient of variation for the following
sample data: 2, 4, 8, 6, 10, and 12.

Solution:
X (X−X¯)2
2 (2−7)2=25 X¯ = ∑X/ n = 42/6 = 7
4 9
8 1 S = √ ∑(X−X¯)2 / (n –1) = √70 / 5 = 3.74
6 1
10 9 CV = (3.74 / 7) (100) = 53.43%
12 25
∑X=42 ∑(X−X¯)2=70
Interquartile Range

The interquartile range (IQR) of a data set is the difference


between the third and first quartiles.
Interquartile range (IQR) = Q3 – Q1.

The quartiles for 15 quiz scores are listed below. Find the
interquartile range.

Q1 = 37 Q2 = 43 Q3 = 48

(IQR) = Q3 – Q1 The quiz scores in the middle


= 48 – 37 portion of the data set vary by at
= 11 most 11 points.
Exercises. Given the following sample set of numbers:
10 20 30 40 50

The arithmetic mean is


a. 30 b. 25 c. 37.5 d. 150

The sample variance is


a. 30 b. 250 c. 299 d. 1000

The standard deviation is


a. 5.48 b. 15.81 c. 14.14 d. 31.61

The range is
a. 50 b. 30 c. 40 d, 10

The standard deviation of the mean or standard error is


a. 1.096 b. 7.071 c. 3.162 d. 6.324
Box and Whisker Plot

Five-number summary
• The minimum entry 28
• Q1 37
• Q2 (median) 43
48
• Q3
55
• The maximum entry
Box and Whisker Plot

Five-number summary
• The minimum entry 28
• Q1 37
• Q2 (median) 43
48
• Q3
55
• The maximum entry
Quiz Scores

28 37 43 48 55

28 32 36 40 44 48 52 56
Exercises:

The recorded data in its original collected form is


a. Array
b. Event
c. Set
d. Raw data

A value computed from a population


a. Statistic
b. Population
c. Random variable
d. None of the above
MEASURES OF SKEWNESS

If Mean > Mode, the skewness is positive.


If Mean < Mode, the skewness is negative.
If Mean = Mode, the skewness is zero.

Symmetric
MEASURES OF SKEWNESS – values measuring the
extent of departure of the distribution from
symmetry
a. Pearson’s First Coefficient of Skewness
Skewness = ( X - Mo) / S
b. Pearson’s Second Coefficient of Skewness, SK
SK = 3 ( X – Md)/ S

If SK = 0, distribution of data is symmetric


SK < 0, distribution is negatively skewed, or
the frequency curve of the distribution
has a longer tail to the left of the central
maximum than to the right
SK > 0, distribution is positively skewed
c. Moment Coefficient of Skewness

For Population Skewness, α3

m3 ∑ ( X - X)3 / n
α3 = ------ = -------------------
S3 S 3
= m3 / m23/2
where
m3 = ∑(x−x̅)3 / n   and   m2 = ∑(x−x̅)2 / n
m3 is called the third moment of the data set.
m2 is called the second moment of the data set;
also known as the variance,
or the square of the standard deviation.
c. Moment Coefficient of Skewnessr the

For the Sample Skewness, a3

[n(n-1)]1/2
a3 = ----------------------- α3
(n-2)
Exercise:
Here are grouped data for heights of 100 randomly selected
male students, adapted from Spiegel and Stephens (1999, 68).

A histogram shows that the data


are skewed left, not symmetric.
But how highly skewed are they,
compared to other data sets?
To answer this question,
compute the skewness.

Height Class Frequ-


(inches) Mark, x ency, f
59.5–62.5 61 5
62.5–65.5 64 18
65.5–68.5 67 42
68.5–71.5 70 27
71.5–74.5 73 8
Solution:
Begin with the sample size and sample mean. (The sample size
was given, but it never hurts to check.)

n = 5+18+42+27+8 = 100
x̅ = (61×5 + 64×18 + 67×42 + 70×27 + 73×8) ÷ 100
x̅ = 9305 + 1152 + 2814 + 1890 + 584) ÷ 100
x̅ = 6745÷100 = 67.45

Height Class Mark Frequency


(inches) x f
59.5–62.5 61 5
62.5–65.5 64 18
65.5–68.5 67 42
68.5–71.5 70 27
71.5–74.5 73 8
Finally, the population skewness is

α3 = m3 / m23/2 = −2.6933 / 8.52753/2 = −0.1082

Class Frequency
Mark, x
xf (x−x̅) (x−x̅)²f (x−x̅)³f
f
61 5 305 -6.45 208.01 -1341.68
64 18 1152 -3.45 214.25 -739.15
67 42 2814 -0.45 8.51 -3.83
70 27 1890 2.55 175.57 447.70
73 8 584 5.55 246.42 1367.63
∑ 6745 n/a 852.75 −269.33
x̅, m2, m3 67.45 n/a 8.5275 −2.6933
The sample skewness:
[n(n-1)]1/2
a3 = ----------------------- α3
(n-2)

= [√(100×99) / 98] [−2.6933 / 8.52753/2]


= −0.1098

Interpreting the skewness number?


Bulmer (1979) — a classic — suggests this rule of thumb:
If skewness is less than −1 or greater than +1,
the distribution is highly skewed.
If skewness is between −1 and −½ or between +½ and +1,
the distribution is moderately skewed.
If skewness is between −½ and +½,
the distribution is approximately symmetric.
With a skewness of −0.1098, the sample data
are approximately symmetric.
MEASURES OF KURTOSIS – a value that measures the
flatness or peakedness of the distribution of data,
usually taken relative to a normal curve

• Leptokurtic – a distribution having a relatively high peak


• Platykurtic – a flat-topped distribution
• Mesokurtic – a distribution which is not very peaked or
very flat-topped like the normal distribution
Leptokurtic
Platykurtic
Mesokurtic
a. Moment Coefficient of Kurtosis , a4

m4 ∑ ( X - X)4 / n
a4 = -------- = -------------------
m22 ( S2)2

where
m4 = ∑(x−x̅)4 / n
   
m2 = ∑(x−x̅)2 / n = S2

If a4 = 3, distribution is normal
b. Excess Kurtosis
The excess kurtosis is generally used because
the excess kurtosis of a normal distribution is 0
For the Population Excess Kurtosis , K
κ = ( a4 - 3)
If K = 0, distribution is normal
K > 0, distribution is leptokurtic
K < 0, distribution is platykurtic
The sample excess kurtosis uses the formula below,
which comes from Joanes and Gill:

(n – 1)
k = --------------------- [(n + 1) κ + 6]
(n - 2) (n – 3)
Exercise:
Let’s continue with the example of the heights,
and compute the kurtosis of the data set.
n = 100, x̅ = 67.45 inches, and the variance m2 = 8.5275 in²
The kurtosis is
a4 = m4 / m2² = 199.3760/8.5275² = 2.7418
and the excess kurtosis is
κ = 2.7418−3 = −0.2582
And the sample excess kurtosis is:
k = [99/(98×97)] [101×(−0.2582)+6)] = −0.2091
Class Frequency
x−x̅ (x−x̅)4f
Mark, x ,f
61 5 -6.45 8653.84
This sample is
slightly platykurtic: 64 18 -3.45 2550.05
its peak is just a bit 67 42 -0.45 1.72
shallower than the 70 27 2.55 1141.63
peak of a normal
73 8 5.55 7590.35
distribution.
∑ n/a 19937.60
m4 n/a 199.3760
III. SETS AND PROBABILITY
Definition

Set – a well-defined collection of objects, e.g. the rivers


in the Philippines, the monthly rainfall in CLSU
Element – each object in a set; a member of the set
Equal Sets – sets having exactly the same elements
Null Set or Empty Set – a set that contains no elements;
the symbol “Ø” denotes this empty set
Subset – if every element of a set A is also an element
of a set B, then A is called a subset of B
B = { 2, 4, 6, 8, 10, 12 }
A = { 4, 6, 10 }
or A C B where “ C” reads “ is a subset of”
or “is contained in”
Definition
Sample Space – a set whose elements represent all
possible outcomes of an experiment. For example,
in tossing a die,
S = { 1, 2, 3, 4, 5, 6 }
Sample Point – an element of a sample space
Event – a subset of a sample space
Simple Event – if an event is a set containing only one
element of the sample space
Compound Event – an event which can be expressed as
the union of simple events
Let S = { heart, spade, club, diamond }
A = { heart } is a simple event of drawing a
heart from a deck of 52 cards
B = { heart U diamond } is a compound
event of drawing a red card
Sample Space Examples

Experiment Sample Space

Toss a Coin, Note Face {Head, Tail}


Toss 2 Coins, Note Faces {HH, HT, TH, TT}
Select 1 Card, Note Kind {2♥, 2♠, ..., A♦} (52)
Select 1 Card, Note Color {Red, Black}
Play a LawnTennis Game {Win, Lose, Tie}
Inspect a Part, Note Quality {Defective, Good}
Observe Gender {Male, Female}
Event Examples
Experiment: Toss 2 Coins. Note Faces.

Sample Space: HH, HT, TH, TT

Event Outcomes in Event

1 Head & 1 Tail HT, TH


Head on 1st Coin HH, HT
At Least 1 Head HH, HT, TH
Heads on Both HH
Set Operations
The intersection of two sets A and B is the set
of elements that are common to A and B.
Let A = { 1, 2, 3, 4, 5 }
B = { 2, 4, 6, 8 }
Then,
A ∩ B = { 2, 4 }

Let P = { a, e, i, o, u }
Q = { x, y, z }
Then,
P∩Q=Ø Sets P and Q are disjoint;
that is P and Q have no
elements in common
Set Operations
The union of two sets A and B is the set of
elements that belong to A or to B, or to both.
Let A = { 1, 3, 5, 8, 10 }
B = { 3, 6, 8 }
Then,
A U B = { 1, 3, 5, 6, 8, 10 }
Set Operations
If A is a subset of the universal set U, then the
complement of A with respect to U is the set of all
elements of U that are not in A. The complement of
A is denoted by A’.
From the above definitions/operations,
it should be noted that:
 A∩Ø = Ø
 A U Ø = A
 A ∩A’ = Ø
 A U A’ = U
 U’ = Ø
 Ø’ = U
 (A’)’ = A
Venn diagrams for

(a) Event (not E )


(b) Event (A ∩ B ) or Event (A & B)
(c) Event (A U B) or Event (A or B)
Counting Principle

Exercise. How many sample points are in the sample


space when a pair of dice is thrown once?
Multiplication Rule

If an operation can be performed in n1 ways, and if


for each of these a second operation can be
performed in n2 ways, then the two operations can
be performed together in (n1)( n2) ways.

Exercise. How many sample points are in the sample


space when a pair of dice is thrown once?

Solution. The first die can land in any one of 6 ways.


For each of these 6 ways, the second die can
also land in 6 ways. Therefore, the pair of
dice can land in (6) (6) = 36 ways
Exercise

There are 8 soil and water majors and


25 crop processing majors

How many ways are there to pick one soil and


water major and one crop processing major?

Total is 8 * 25 = 200
Exercise. Participants at a large conference are
offered 6 sightseeing tours on each of 3
days.

In how many ways can a person arrange to


go on a sightseeing tour planned by this
conference:
(a) with repetition,
(b) without repetition.
Exercise. Participants at a large conference are
offered 5 sightseeing tours on each of 3
days.

In how many ways can a person arrange to


go on a sightseeing tour planned by this
conference:
(a) with repetition,
(b) without repetition.

Solution.
a. N = (6)(6)(6) = 216 ways
b. N = (6)(5)(4) = 120 ways
Exercise

How many strings of 4 decimal digits…

a) Do not contain the same digit twice?

b) End with an even digit?


Exercise
How many strings of 4 decimal digits…

a) Do not contain the same digit twice?


We want to chose a digit, then another
that is not the same, then another…
First digit: 10 possibilities
Second digit: 9 possibilities (all but first digit)
Third digit: 8 possibilities
Fourth digit: 7 possibilities
Total = 10*9*8*7 = 5040

b) End with an even digit?


First three digits have 10 possibilities
Last digit has 5 possibilities
Total = 10*10*10*5 = 5000
Exercise
Suppose a computer requires 8 characters for a
password. The first character must be a letter, but
the remaining seven characters can be either a
letter or a digit (0 thru 9). The password is not
case-sensitive. How many passwords are
possible on this computer?
Exercise
Suppose a computer requires 8 characters for a
password. The first character must be a letter, but
the remaining seven characters can be either a
letter or a digit (0 thru 9). The password is not
case-sensitive. How many passwords are
possible on this computer?

26 • 367 = 2.037 x 1012


Exercise

There are 8 soil and water majors and 25 crop


processing majors
How many ways are there to pick one soil and
water major or one crop processing major?

Total is 8 + 25 = 33
Permutations
The number of permutations of n distinct objects
taken n at a time is:
n!

Exercise. How many distinct permutations can be


made from the letters of the word “help”

Solution.
N = n!
= 4!
= (4)(3)(2)(1)
= 24
Exercise:

How many different surveys are required to cover all


possible question arrangements if there are 7 questions
in a survey?

7! = 7 · 6 · 5 · 4 · 3 · 2 · 1

= 5040 surveys
Exercises:
Your little brother just broke into your Facebook account
and now you need to change your password. You want it to
be 5 characters long. The first 3 characters are to be letters
and the last 2 characters are to be numbers.
How many passwords are possible?
Answer: 26x26x26x10x10=1,757,600
What if you don’t want to repeat letters and numbers?
Answer: 26x25x24x10x9=1,404,000

You want to upload 6 pictures on Facebook. How many


permutations are possible for the 6 pictures?

Answer: 6x5x4x3x2x1=6!=720
Permutations
The number of permutations of n distinct objects
taken r at a time is:
n!
nPr = ---------
(n-r)!

Exercise. How many three digit numbers can be


formed from the digits 2 3 4 5 6 7 and 8
if each digit can be used only once?
Solution.
7! 7! (7)(6)(5)(4!)
7P3 = -------- = ------ = --------------- = (7)(6)(5) = 210
(7-3)! 4! 4!
Permutation of n Objects Taken r at a Time
Exercise:

You are required to read 5 books from a list of 8. In


how many different orders can you do so?
n P r  8 P5

 8!
(8  5)!  8!
3!

= 8  7  6  5  4  3  2 1
3  2 1
 6720 wa ys
Exercise
An inspector randomly selects 2 of 5
parts for inspection. In a group of 5
parts, how many permutations of 2
parts can be selected?
Exercise
An inspector randomly selects 2 of 5
parts for inspection. In a group of 5
parts, how many permutations of 2
parts can be selected?

5! 5! (5)(4)(3)(2)(1) 120
P 
2
5
   20
(5  2)! 3! (3)(2)(1) 6

Again let the parts be designated A, B, C, D, E. Thus we could


select:

AB BA AC CA AD DA AE EA BC CB BD DB BE EB CD DC CE EC
DE and ED
Permutations
The number of permutations of n distinct objects
arranged in a circle is
( n – 1)!

Exercise. In how many ways can 11 different bushes


be planted in a circular arrangement?

Solution.
N = (11 – 1)! = 10! = 3,628,800 ways
Permutations
The number of distinct permutations of n things of
which n1 are of one kind, n2 are of a second kind,
…, nk of a kth kind is:
n!
---------------------
n1! n2! … nk!
Exercise. How many distinct permutations can be
formed from the letters of the word
“PHILIPPINES”
Solution.
P=3 H=1 I=3 L=1 N=1 E =1 S =1
11!
P3, 3, 1, 1, 1, 1
11 or N = ------------------------- = 1,108,800 ways
3! 3! 1! 1! 1! 1!
Exercise

How many different vertical arrangements are


there of 9 flags if 4 are white, 3 are blue and 2 are
red?

9! 9•8•7•6•5•4! 9•8•7•6•5
----------- = ------------------ = --------------- = 1260
4!•3!•2! 4!•3!•2! 3•2•1•2•1
Exercise

Your mom just got Facebook… You want to


disguise your name by scrambling up the letters
in you first and last name so she can’t find you.
If your name is, Hannah Christopherson,
how many new first and last names can you
make with the letters?

Hannah: 6!/(2!2!2!)=90
Christopherson: 14!/(2!2!2!2!)=5,448,643,2000
Permutations

Exercise:
A combination lock will open when the right
choice of three numbers (from 1 to 30,
inclusive) is selected. How many different
lock combinations are possible assuming no
number is repeated?

30! 30!
30 p3    30 * 29 * 28  24360
( 30  3)! 27!
Permutations

Exercise:
From a club of 24 members, a President,
Vice President, Secretary, Treasurer and
Historian are to be elected. In how many
ways can the offices be filled?

24! 24!
24 p5   
( 24  5)! 19!
24 * 23 * 22 * 21 * 20  5,100,480
Exercise:
In some states, license plates have six characters:
three letters followed by three numbers. How many
distinct such plates are possible? (hint: with
replacement)

263 different ways to choose the letters and 103 different


ways to choose the digits 

total number = 263 x 103 = 17,576 x 1000 =


17,576,000
Combinations
The number of combinations of n distinct objects
taken r at a time is
n!
nCr = --------------
(n – r)! r!

Exercise. How many ways are there to select 3


candidates from 9 equally qualified recent
graduates for openings in an engineering
firm?
Solution.
9!
9C3 = --------- = 84
6! 3!
Exercise

If there are 8 researchers and 3 of them are to be


chosen to go to a meeting, how many different
groupings can be chosen?

8  7  6  5  4  3  2 1 8  7  6
8 C3    56
(3  2 1)  (5  4  3  2 1) 3  2 1
Exercise

You are having a party this weekend and


you want to invite 25 of your 30 Facebook
friends. How many groups of 25 can you
choose from your 30 friends?

Answer : C25=142,506
30
Combinations
Exercise:
A student must answer 3 out of 5 essay
questions on a test. In how many
different ways can the student select the
questions?

5! 5! 5 * 4
5 C3     10
3! (5  3)! 3!2! 2 * 1
Combinations
Exercise:
A basketball team consists of two centers, five
forwards, and four guards. In how many ways can
the coach select a starting line up of one center,
two forwards, and two guards?

Center: Forwards: Guards:


2! 5! 5 * 4 4! 4 * 3
2C1   2 5 2
C    10 4 C2   6
1 !1 ! 2!3! 2 * 1 2!2! 2 * 1

2 C1 * 5 C 2 * 4 C 2

Thus, the number of ways to select the


starting line up is 2*10*6 = 120.
Probability
 A value between 0 and 1, inclusive of limits, that
measures how likely a particular event will occur
A priori Approach/ Classsical/ Theoretical
– defines the probability of an event A as the
number of sample points in the event A, n(A),
divided by the number of sample points in the
sample space, n(S)
P(A) = n(A) / n(S)
Exercise. One card is selected at random from 50
cards carrying numbers from 1 to 50. Find
the probability that the number of the card
is (a) divisible by 10; (b) an odd number.
Probability
Empirical (or statistical) probability is based on
observations obtained from probability
experiments. The empirical frequency of an event
E is the relative frequency of event E.
Fr equ en cy of E ven t E
P (E ) 
Tot al fr equ en cy
f
Exercise:

n
A travel agent determines that in every 50 reservations she makes, 12
will be for a cruise.
What is the probability that the next reservation she makes will be for a
cruise?

12
P(cruise) = 0.24
50
Probability

Subjective Approach – uses the perception of the


person to determine the chance of occurrence
of an event; an intelligent “guessing”

Exercise:
A farmer predicts that the probability he gets a good
harvest is 0.75.
Some Probability Rules

a. The probability of an event is nonnegative and


never exceeds one.
0 ≤ P(Ei) ≤ 1

b. The sum of the probabilities of all possible


outcomes in the sample space is 1.
∑ P(Ei) = 1

c. If A and A’ are complementary events, then


P(A’) = 1 - P(A)
Exercise. One card is selected at random from 50
cards carrying numbers from 1 to 50. Find
the probability that the number of the card
is
(a) divisible by 10;
(b) an odd number.
Probability
Solution.

Let T be the event that the card carries a


number that is divisible by 10.
T = { 10, 20, 30, 40, 50}
P (T) = 5/50 = 1/10

Let O be the event that the card has an odd


number.
O = {1, 3, 5, …, 49}
P(O) = 25/50 = ½
Exercise:
The following frequency distribution represents the ages of
30 students in a statistics class. What is the probability that
a student is between 26 and 33 years old?

Ages Frequency, f

18 – 25 13
26 – 33 8
34 – 41 4
42 – 49 3
50 – 57 2
Probabilities with Frequency Distributions
Exercise:
The following frequency distribution represents the ages of
30 students in a statistics class. What is the probability that
a student is between 26 and 33 years old?

Ages Frequency, f

8
18 – 25 13 P (age 26 to 33) 
30
26 – 33 8
 0.267
34 – 41 4
42 – 49 3
50 – 57 2

 f  30
Exercise. A coin is tossed 6 times in succession. What
is the probability that at least 1 head occurs?

Solution. Let E be the event that at least 1 head occurs.


The sample space S consists of 26 = 64
sample points, since each toss can result in
2 outcomes. Now,
P(E) = 1 - P(E’) where E’ is the event that
no head occurs. This can
happen in only one way
When all tosses
result in a tail
Then, P(E) = 1 - 1/64 = 63/64
Some Probability Rule

The more general rule for the union of probabilities is


P(E1 U E2) = P(E1) + P(E2) - P(E1 ∩ E2)

Exercise. The probability that a student passes


mathematics is 2/3 and the probability that
he passes English is 4/9. If the probability
of passing at least one course is 4/5, what is
the probability that he will pass both
course?
Solution.
P(M ∩ E) = P(M) + P(E) - P(M U E)
= 2/3 + 4/9 - 4/5
= 14/45
Some Probability Rule

For conditional probabilities, i.e., signifying the joint


occurrence of events

P(E1 ∩ E2)
P(E1/E2) = --------------
P(E2)

If events E1 and E2 are independent, P(E1/E2) = P(E1)


Consider the following example of events that are not
independent or mutually exclusive:
An urban drainage canal reaches flood stage each
summer with relative frequency of 0.10; power failures in
industries along the canal occur with a probability of 0.20.
Experience shows that when there is flood the chances of a
power failure are raised to 0.40.

The probability statements are:


P(flood) = P(F) = 0.10
P(no flood) = P(F’) = 0.90
P(power failure) = P(P) = 0.20
P(no power failure) = P(P’) = 0.80
P(power failure given that a flood occurs) =
P(P/F) = 0.40
Find the probability of a flood or a power failure
occurring
The probability statements are:
P(flood) = P(F) = 0.10
P(no flood) = P(F’) = 0.90
P(power failure) = P(P) = 0.20
P(no power failure) = P(P’) = 0.80
P(power failure given that a flood occurs) =
P(P/F) = 0.40

The probability of a flood or a power failure


occurring is then:
P(F U P) = P(F) + P(P) - P(F ∩ P)
= P(F) + P(P) - P(P/F) x P(F)
= 0.10 + 0.20 - (0.40 x 0.10)
= 0.10 + 0.20 - 0.04
= 0.26
Other Exercises

a. What is the probability of getting a Jack or a Spade if


one card is drawn from a deck of 52 cards?

Let; A be the event that a Jack will be drawn, P(A) = 4/52


B be the event that a Spade will be drawn, P(B) = 13/52
A∩B be the event that card drawn is jack and spade,
P(A∩B) = 1/52

Then,
P(AUB) = P(A) + P(B) - P(A∩B)
= 4/52 + 13/52 - 1/52
= 16/52 = 4/13
Other Exercises

b. If 3 coins are tossed, what is the probability that at


least 1 head occurs?

P(at least one head) = 1 - P(no head)


= 1 - 1/8
= 7/8

c. A box contains 6 white balls and 4 black balls. If


three balls are to be drawn from the box, what is
the probability that all three are white balls?

P(W1,W2,W3) = (6/10) (5/9) (4/8)


= 120/720 = 1/6
IV. RANDOM VARIABLES
Concept of Random Variable

From performing random experiments, our outcomes


are treated as variables whose values occur by chance,
and thus are referred to as random variables.

Random Variable – a rule or function whose value is a


real number determined by each element in the
sample space .
Concept of Random Variable

Exercise. In the random experiment of tossing two coins,


we describe the event of “outcome of tails” by a numerical
characteristic.
Sample Space Real Number
HH 0
HT 1
TH 1
TT 2
The random variable Y = number of tails of tossing 2
coins
Y = 0 is equivalent to the event {HH)
Y = 1 is equivalent to the event {HT, TH)
Y = 2 is equivalent to the event { TT }
Probability Distribution
Discrete Probability Distribution
- the list of all the possible values of a discrete random
variable together with their associated probabilities.
- a graph, table or formula that specifies the probability
associated with each possible outcome the random
variable can assume.
• The sum of the probabilities for all the possible
values of the random variable in a probability
distribution is 1.

• Examples of discrete probability distributions are:


Uniform Poisson
Binomial Negative Binomial
Multinomial Geometric
Hypergeometric
What is a Probability Distribution?

PROBABILITY DISTRIBUTION A listing of all the outcomes of an


experiment and the probability associated with each outcome.

Experiment:
Toss a coin three times.
Observe the number of
heads. The possible results
are:
Zero head,
One head,
Two heads, and
Three heads.

What is the probability


distribution for the number of
heads?
Probability Distribution of Number of Heads
Observed in 3 Tosses of a Coin
Probability Density Function

Let X be a one-dimensional discrete random


variable, then the probability function P ( X  x )
is called the probability density function if it
satisfies
(a)
P( x)  0
(b)
 P ( x)  1
x
Probability Mass Function (pmf)

x p(x)

1 p(x=1)=1/6

2 p(x=2)=1/6

3 p(x=3)=1/6

4 p(x=4)=1/6

5 p(x=5)=1/6

6 p(x=6)=1/6

1.0
Cumulative Distribution Function

Let X be a random variable, then the


function

F ( x)  Pr( X  x)
is called the cumulative distribution
function of X.
Cumulative Distribution Function (CDF)

1.0 P(x)
5/6
2/3
1/2
1/3
1/6

1 2 3 4 5 6 x
Cumulative Distribution Function

x P(x≤A)

1 P(x≤1)=1/6

2 P(x≤2)=2/6

3 P(x≤3)=3/6

4 P(x≤4)=4/6

5 P(x≤5)=5/6

6 P(x≤6)=6/6
Exercise

Which of the following are probability functions?

a.      f(x)=.25 for x=9,10,11,12

b.      f(x)= (3-x)/2 for x=1,2,3,4

c. f(x)= (x2+x+1)/25 for x=0,1,2,3


Answer (a)

a.      f(x)=.25 for x=9,10,11,12


x f(x) Yes, probability
function!
9 .25
10 .25
11 .25

12 .25
1.0
Answer (b)

b.      f(x)= (3-x)/2 for x=1,2,3,4

x f(x)
Though this sums to 1,
you can’t have a
1 (3-1)/2=1.0 negative probability;
therefore, it’s not a
2 (3-2)/2=.5 probability function.

3 (3-3)/2=0

4 (3-4)/2=-.5
Answer (c)

c. f(x)= (x2+x+1)/25 for x=0,1,2,3

x f(x)

0 1/25
1 3/25 Doesn’t sum to 1. Thus, it’s
not a probability function.
2 7/25

3 13/25
24/25
Exercise

 The number of ships to arrive at a harbor on any


given day is a random variable represented by x. The
probability distribution for x is:

x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1

Find the probability that on a given day:


a.    exactly 14 ships arrive p(x=14)= .1
b.    At least 12 ships arrive p(x12)= (.2 + .1 +.1) = .4
c.    At most 11 ships arrive p(x≤11)= (.4 +.2) = .6
Exercise

If you toss a die, what’s the probability that


you roll a 3 or less?

a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Exercise

If you toss a die, what’s the probability that


you roll a 3 or less?

a. 1/6
b. 1/3
c. 1/2
d. 5/6
e. 1.0
Exercise

Two dice are rolled and the sum of the face


values is six. What is the probability that at
least one of the dice came up a 3?

a. 1/5
b. 2/3
c. 1/2
d. 5/6
e. 1.0
Exercise

Two dice are rolled and the sum of the face


values is six. What is the probability that at
least one of the dice came up a 3?

a. 1/5 How can you get a 6 on two dice? 1-5,


b. 2/3 5-1, 2-4, 4-2, 3-3
c. 1/2 One of these five has a 3.
d. 5/6 1/5
e. 1.0
Binomial Probability Distribution

A widely occurring discrete probability distribution

Characteristics of a Binomial Probability Distribution


1. There are only two possible outcomes on a
particular trial of an experiment.
2. The outcomes are mutually exclusive,
3. The random variable is the result of counts.
4. Each trial is independent of any other trial
The Binomial Distribution

 The Binomial Probability Distribution


– p = P(S) on a single trial
– q=1–p
– n = number of trials
– x = number of successes

 n  x n x
P ( x)    p q
 x
Binomial Distribution Properties

Properties:
 Mean = np
 Variance = npq
q  p
 Skewness = npq
 Kurtosis= 1  6 pq
3
npq
Exercise

In a random sample of 100 BSAEn students, 20 said they


opposed the plan to charge a new development fee.
An unbiased estimate of the proportion of students who
support the payment of the new development fee is:

a. 20/100
b. 20
c. 80/100
d. 80
Binomial Distribution - Exercise

 Exercise: The number X of seeds that


germinate in n=10 independent trials with
p=0.8 is b(10,0.8), that is,
10 
P( X  x)    0.8  0.2 ;
x 10  x
x  0,1,2,....,10
x 
 Distribution function
6
10 
F ( X  6)  P ( X  6)     0.8  0.2
x 10  x
 
x 0  x 
Binomial Distribution: Exercise

 If I toss a coin 20 times, what’s the probability


of getting of getting 2 or fewer heads?
Binomial Distribution: Exercise

 If I toss a coin 20 times, what’s the probability


of getting of getting 2 or fewer heads?

20
 0 20 20 ! 
 (.
5) (.
5)  (.
5)209
.5x10 7

0 20
!0!
20
 1 19 20 !  
 (.
5) (.
5)  (.
5)2020x9.5x
10 7
1.9x
105

1 19
!
1!
20
 2 18 20 !  
 (.
5) (.
5)  (.
5)20190
x9.5x
10 7
1.8x
10 4

2 18
!
2!
4
1.8x
10
Binomial Distribution
Exercise
Binomial Distribution
Exercise
Exercise

What’s the probability of getting exactly 5


heads in 10 coin tosses?

 10  5 5
a.  (.50) (.50)
0

b.  10  5
 (.50) (.50)
5

5
c.  10  10 5
 (.50) (.50)
5
d.
 10  10 0
 (.50) (.50)
 10 
Exercise

What’s the probability of getting exactly 5


heads in 10 coin tosses?

 10  5 5
a.  (.50) (.50)
0

b.  10  5
 (.50) (.50)
5

5
c.  10  10 5
 (.50) (.50)
5
d.
 10  10 0
 (.50) (.50)
 10 
Exercise

A coin toss can be thought of as an example of


a binomial distribution with N=1 and p=.5. What
are the expected value and variance of a coin
toss?

a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Exercise

A coin toss can be thought of as an example of


a binomial distribution with N=1 and p=.5. What
are the expected value and variance of a coin
toss?

a. .5, .25
b. 1.0, 1.0
c. 1.5, .5
d. .25, .5
e. .5, .5
Hypergeometric Probability Distribution

1. An outcome on each trial of an experiment is


classified into one of two mutually exclusive
categories—a success or a failure.

2. The probability of success and failure changes from


trial to trial.

3. The trials are not independent, meaning that the


outcome of one trial affects the outcome of any other
trial.

Note: Use hypergeometric distribution if experiment is


binomial, but sampling is without replacement from a
finite population where n/N is more than 0.05
Hypergeometric Distribution Formula

Consider an urn with N balls, M of which are white and N-M are red. Suppose
that we draw a sample of n balls at random (without replacement) from
the urn, then the probability of getting k white balls out of n (k<n) will
follow hypergeometric distribution.

pdf: Hg(N,M,n)

M  N  M 
   
P(X  k)   k  n  k ; k  0 ,1, 2 ,...., min( n , M )
N 
 
n 
Hypergeometric Distribution

Exercise: In a pond there were 200 fishes. A


catch of 50 fishes made and returned them
alive into the pond marking each with a red
spot. After a reasonable period of time, another
catch of 30 fishes was made. what is the
probability that exactly 11 of the spotted fishes
were caught?
Hypergeometric Distribution

 50   150 
   
 k   30  k 
P(X  k)  ; k  0 ,1, 2 ,...., 30
 200 
 
 30 

 50   150 
   
 11   19 
P ( X  11 ) 
 200 
 
 30 
Exercise: Milk Container Contents

 Suppose that milk is shipped to retail outlets in boxes


that hold 16 milk containers. One particular box, which
happens to contain six under weight containers, is
opened for inspection, and five containers are chosen at
random.
Exercise: Milk Container Contents

 Suppose that milk is shipped to retail outlets in boxes


that hold 16 milk containers. One particular box, which
happens to contain six under weight containers, is
opened for inspection, and five containers are chosen at
random.
 The distribution of the number of underweight milk
containers in the sample chosen by inspector 
Hypergeometric distribution with N=16, r=6, and n=5.
Exercise: Milk Container Contents

 The probability that the inspector chooses


exactly two underweight containers is
 6   10   6!   10! 
     
 2   3   2!4!   3!7! 
P  X  2    0.412
 16   16! 
   
 5  5!11! 
 The expected number of underweight containers
chosen by the inspector is
nr 5  6
E X     1.875
N 16
Probability Distribution
Probability Density Function – a smooth curve or
function f(X), describing the distribution of a
continuous random variable; the area under the
curve is equal to one

Examples of continuous probability distributions are:


Normal Lognormal Pearson
Exponential Extreme Value Beta
Gamma
Probability Density Function
• A function f(x) is called a probability density function (over
the range a ≤ x ≤ b if it meets the following requirements:

1) f(x) ≥ 0 for all x between a and b, and

f(x)

area=1

a b x

2) The total area under the curve between a and b is 1.0


Normal Distribution

Features:
• Has a bell-shaped curve
• Has a mean μ and variance σ2 , X ~ N (μ, σ2 )
• Normal curve is symmetric
• Mean, median, and mode coincide at the center of
the curve
• Asymptotic to the X-axis
• Has a total area equal to one
The Normal Distribution

• The normal distribution is the most important of all


probability distributions. The probability density function of a
normal random variable is given by:

It looks like this:


• Bell shaped,
• Symmetrical around
the mean …
The Normal Distribution
Important things to note:

The normal distribution is fully defined by two parameters:


its standard deviation and mean

The normal distribution is bell shaped and


symmetrical about the mean

Unlike the range of the uniform distribution (a ≤ x ≤ b)


Normal distributions range from minus infinity to plus infinity
The Normal Distribution
• The normal distribution is described by two
parameters:
• its mean and its standard deviation .
Increasing the mean shifts the curve to the right…
The Normal Distribution
• The normal distribution is described by two parameters:
its mean and its standard deviation
. Increasing the standard deviation “flattens” the curve…
The Normal Distribution
• A normal distribution whose mean is zero and standard
deviation is one is called the standard normal distribution.

0
1

• Any normal distribution can be converted to a standard


normal distribution with simple algebra. This makes
calculations much easier.
Normal Distribution

Standard Normal Distribution


• The transformation of the normal random variable X
to a standard normal variable Z takes the form:
X - μ
Z = ------------- where Z ~ N (μ= 0, σ2 = 1 )
σ
Standard Deviation and Normal
Distribution
Calculating Normal Probabilities
• P(45 < X < 60) ?
…mean of 50 minutes and a
standard deviation of 10 minutes…

0
Calculating Normal Probabilities
• P(–.5 < Z < 1) looks like this:

• The probability is the area


under the curve…

• We will add up the


two sections:
• P(–.5 < Z < 0) and
0
• P(0 < Z < 1)
–.5 … 1
Calculating Normal Probabilities
• How to use Table … [other forms of normal tables exist]

This table gives probabilities P(0 < Z < z)


First column = integer + first decimal
Top row = second decimal place

P(0 < Z < 0.5)

P(0 < Z < 1)

P(–.5 < Z < 1) = .1915 + .3414 = .5328

The probability time is between


45 and 60 minutes = .5328
Using the Normal Table
• What is P(Z > 1.6) ?
P(0 < Z < 1.6) = .4452

0 1.6

P(Z > 1.6) = .5 – P(0 < Z < 1.6)


= .5 – .4452
= .0548
Using the Normal Table
• What is P(Z < -2.23) ?

P(0 < Z < 2.23)

P(Z < -2.23) P(Z > 2.23)

-2.23 0 2.23

P(Z < -2.23) = P(Z > 2.23)


= .5 – P(0 < Z < 2.23)
= .0129
Using the Normal Table
• What is P(Z < 1.52) ?

P(Z < 0) = .5 P(0 < Z < 1.52)

0 1.52

P(Z < 1.52) = .5 + P(0 < Z < 1.52)


= .5 + .4357
= .9357
Using the Normal Table
• What is P(0.9 < Z < 1.9) ?

P(0 < Z < 0.9)

P(0.9 < Z < 1.9)

0 0.9 1.9

P(0.9 < Z < 1.9) = P(0 < Z < 1.9) – P(0 < Z < 0.9)
=.4713 – .3159
= .1554
Exercise
• The return on investment is normally distributed with
a mean of 10% and a standard deviation of 5%.
What is the probability of losing money?

• We want to determine P(X < 0).


Exercise
• The return on investment is normally distributed with
a mean of 10% and a standard deviation of 5%.
What is the probability of losing money?

• We want to determine P(X < 0). Thus,

 X   0  10 
P ( X  0)  P   
  5 
 P ( Z   2)
 .5  P ( 0  Z  2)
 .5  .4772
 .0228
Finding Values of Z
• What value of z corresponds to an area under the curve
of 2.5%? That is, what is z.025 ?

Area = .50 Area = .025

Area = .50–.025 = .4750

If you do a “reverse look-up” on Table for .4750,


you will get the corresponding zA = 1.96
Since P(z > 1.96) = .025, we say: z.025 = 1.96
Finding the Area or Probability
• P (Z < Zo) = area to the left of Zo
Exercise. P (Z< 1) = _____
P (Z< - 2) = _____

• P (Z > Zo) = area to the right of Zo


Exerise. P (Z > 3) = _____ P (Z > - 1) = _____

• P (Z1 < Z < Z2)


Exercise. P (-2 < Z < 1) = _____
P ( 1 < Z < 2) = _____
Finding the Area or Probability
• P (Z < Zo) = area to the left of Zo
Example. P (Z< 1) = ________
P (Z< - 2) = ________

• P (Z > Zo) = area to the right of Zo


Example. P (Z > 3) = _______
P (Z > -1) = _________

• P (Z1 < Z < Z2)


Example. P (-2 < Z < 3) = _________
P ( -1 < Z < 1) = _________
Finding the Value of Zo,
Given Area or Probability

• P (Z < Zo) = 0.025 ; Zo = _____

• P (Z > Zo) = 0.84; Zo = _____


Probability and Normal Distributions
Normal Distribution Standard Normal Distribution

μ = 10 μ=0
σ=5 σ=1

P(x < 15) P(z < 1)

x z
μ =10 15 μ =0 1

Same area

P(x < 15) = P(z < 1) = Shaded area under the curve
= 0.8413
Probability and Normal Distributions
Exercise:
The average on a statistics test was 78 with a standard deviation of 8.
If the test scores are normally distributed, find the probability that a
student receives a test score less than 90.

μ = 78 x - μ 90 - 78
σ=8 z  =
σ 8
= 1.5
P(x < 90)

The probability that a


x student receives a test
μ =78 90 score less than 90 is
z
μ =0 ?
0.9332.
1.5

P(x < 90) = P(z < 1.5) = 0.9332


Probability and Normal Distributions
Exercise:
The average on a statistics test was 78 with a standard deviation of 8.
If the test scores are normally distributed, find the probability that a
student receives a test score greater than than 85.

x - μ 85 - 78
μ = 78 z = =
σ 8
σ=8
= 0.875  0.88
P(x > 85)
The probability that a
x student receives a test
μ =78 85 score greater than 85 is
z
μ =0 0.88
?
0.1894.

P(x > 85) = P(z > 0.88) = 1  P(z < 0.88) = 1  0.8106 = 0.1894
Probability and Normal Distributions
Exercise:
The average on a statistics test was 78 with a standard deviation of 8.
If the test scores are normally distributed, find the probability that a
student receives a test score between 60 and 80.

x - μ 60 - 78 = -2.25
z1 = =
σ 8
P(60 < x < 80) x - μ 80 - 78 = 0.25
z2  =
σ 8
μ = 78
σ=8
The probability that a
x student receives a test
60 μ =78 80 score between 60 and 80
z
2.25
? μ =0 0.25
?
is 0.5865.

P(60 < x < 80) = P(2.25 < z < 0.25) = P(z < 0.25)  P(z < 2.25)
= 0.5987  0.0122 = 0.5865
Transforming a z-Score to an x-Score

To transform a standard z-score to a data value, x, in a given population,


use the formula

x  μ + zσ.
Exercise:
The monthly electric bills in a city are normally distributed with a
mean of P120 and a standard deviation of P16. Find the x-value
corresponding to a z-score of 1.60.

x  μ + zσ
= 120 + 1.60(16)
= 145.6
We can conclude that an electric bill of P145.60 is 1.6 standard
deviations above the mean.
Finding a Specific Data Value
Exercise:
The weights of bags of chips for a vending machine are normally
distributed with a mean of 1.25 ounces and a standard deviation
of 0.1 ounce. Bags that have weights in the lower 8% are too
light and will not work in the machine. What is the least a bag of
chips can weigh and still work in the machine?
P(z < ?) = 0.08

8% P(z < 1.41) = 0.08


z
?
1.41 0 x  μ + zσ
x

? 1.25
 1.25  (1.41)0.1
1.11
 1.11
The least a bag can weigh and still work in the machine is 1.11 ounces.
V. SAMPLING DISTRIBUTIONS
Terminologies:
• Sampling Distribution - the probability distribution
of statistic
• Standard Error of the Statistic – the standard
deviation of the sampling distribution
The Central Limit Theorem

The sampling distribution of sample means has a mean equal to the population
mean.

μx  μ Mean of the
sample means
The sampling distribution of sample means has a standard deviation equal to the
population standard deviation divided by the square root of n.

σ
σx  Standard deviation of the
sample means
n
This is also called the
standard error of the mean.
The Mean and Standard Error

Exercise:
The heights of fully grown bushes have a mean height of 8 feet and a standard
deviation of 0.7 feet. 38 bushes are randomly selected from the population, and
the mean of each sample is determined. Find the mean and standard error of
the mean of the sampling distribution.

Standard deviation (standard


error)
Mean
μx  μ σ
σx 
n
=8
0.7
= = 0.11
38
Continued.
Normal Distribution
Exercise.
An electrical firm manufactures light bulbs that have a
length of life that is approximately normally distributed,
with mean equal to 800 hours and a standard deviation
of 40 hours. Find the probability that a random sample
of 16 bulbs will have an average life of less than 775
hours.
Solution.
_
X - μ 775 - 800
Z = ------------ = -------------- = - 2.5
σ / (n)1/2 40/(16)1/2
_
P( X < 775) = P(Z < -2.5)
= 0.006
Exercise. The generator sets of manufacturer A have a
mean life time of 6.5 years and a standard deviation of
0.9 year, while those of manufacturer B have a mean
lifetime of 6.o years and a standard deviation of 0.8
year. What is the probability that a random sample of 36
generator sets from manufacturer A will have a mean
lifetime that is at least 1 year more than the mean
lifetime of a sample of 49 generator sets from
manufacturer B?
Normal Distribution
Solution.
_ _
(X1 – X2) - (μ1 – μ2)
Z =
√ (σ12/n1) + (σ22/n2)

(1.0) - ( 6.5 – 6.0)


= = 2.646
√ (0.81/36) + (0.64/49)
_ _
P ( X1 – X2 ≥ 1.0) = P (Z ≥ 2.646)

= 1 - P( Z < 2.646)
= 1 - 0.9959 = 0.0041
t- Distribution
If X and S2 are the mean and variance, respectively, of
a random sample of size n taken from a normal
population having the mean μ and unknown variance
σ2, then _
X - μ
t =
S / (n)1/2

Is a value of a random variable T having the t-


distribution with df = n – 1 degrees of freedom
The t Distributions
t- Distribution

Exercise. A manufacturer of light bulbs claims that


his bulbs willburn on the average 500 hours. To
maintain this average, he tests 25 bulbs each month.
If the computed t-value falls between –t 0.05 and t
0.05, he is satisfied with his claim. What conclusion
should he draw from a sample that has a mean of
518 hours and a standard deviation of 40 hours?
Assume the distribution of burning times to be
approximately normal.
t- Distribution
Solution. From t-table t 0.05 = 1.711 for df = 24.
Therefore the manufacturer is satisfied with
his claim if a sample of 25 bulbs yields a t-
value between -1.711 and 1.711.
_
X - μ
t =
S / (n)1/2
This is a value well above
518 - 500
1.711. Hence, the
= = 2.25
manufacturer is likely to
40 / (25)1/2
conclude that his bulbs
are a better product than
what he thought.
Chi-Square Distribution
If S2 is the variance of a random sample of size n
taken from a normal population having the variance
σ2, then
(n – 1) S2
X 2 = -------------
σ2
is a value of a random variable X2 having the Chi-
square distribution with df = n –1.
Characteristics of the Chi-Square Distribution

… it is positively skewed
… it is non-negative
… it is based on degrees of
freedom
…when the degrees of freedom
change a new distribution is
created
Chi-Square Distribution

Exercise. A manufacturer of car batteries guarantees


that his batteries will last, on the average, 3 years with
a standard deviation of 1 year. If 5 of these batteries
have lifetimes of 1.9, 2.4, 3.0, 3.5, and 4.2 years, is the
manufacturer still convinced that his batteries have a
standard deviation of 1 year?
Chi-Square Distribution
Solution.
S2 = [(1.92 + … + 4.22) - (1.9 + … + 4.2)2/ 5]/(5 – 1)
= 0.815
(n – 1) S2
χ2 = -------------
σ2
= (5 – 1) (0.815) / 1 = 3.26

This is a value from a χ 2 – distribution with df = 4. Since


95% of the χ 2 values with df = 4 fall between 0.484 and
11.143, the computed value with σ2 = 1 is reasonable, and
therefore the manufacturer has no reason to suspect that
the standard deviation is different than 1 year.
F-Distribution

If S12 and S22 are the variance of independent random


samples of size n1 and n2 taken from normal populations
with variances σ12 and σ22 , respectively, then
S1 2 / σ 1 2 σ 22 S12
F = =
S2 2 / σ 2 2 σ 12 S 22
Is a value of a random variable having the F-distribution
with df1 = n1 – 1 and df2 = n2 - 1
F-Distribution

It should be noted that writing Fα (df1, df2) for fα with df1


and df2 degrees of freedom and taking α as the area to
the right of f-value then

1
F 1- α (df1, df2) =
F α (df2, df1)
F-Distribution
Exercise. If S12 and S22 represent the variances of
independent random samples of size n1 = 25 and n2 = 31,
taken from normal populations with variances σ12 = 10
and σ22 = 15, respectively, find P (S12 / S22 > 1.26)
F-Distribution
Exercise. If S12 and S22 represent the variances of
independent random samples of size n1 = 25 and n2 = 31,
taken from normal populations with variances σ12 = 10
and σ22 = 15, respectively, find P (S12 / S22 > 1.26)
Solution.
σ22 S12
F =
σ12 S22

15
= (1.26) = 1.89
10
From the F-distribution table:
P (S12 / S22 > 1.26) = 0.05
F-distribution Table
Critical values of F are found in a series of tables
for different values of 
For any given , tables are read as follows:
Given n1 = 5, n2 = 9, F=3.84

Numerator Degrees of Freedom


1
1 2 3 4 5 6 7 8 9
2
Denominator Degrees of Freedom

1 161.4 199.5 215.7 224.6 230.2 234.0 236.8 238.9 240.5


2 18.51 19.00 19.16 19.25 19.30 19.33 19.35 19.37 19.38
3 10.13 9.55 9.28 9.12 9.01 8.94 8.89 8.85 8.81
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.77
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18
10.236
10.236
Finding Values for the F-distribution

Find the F-value:


For  =0.1, degrees of freedom in the numerator
= 8 and degrees of freedom in the denominator =
4.
F0.1,8,4 = 3.95
For  =0.025 and  =0.975 degrees of freedom in
the numerator = 20 and degrees of freedom in
the denominator = 15.
1 1
F.025,20,15 = 2.76; F.975,20,15  
F.025,15,20 2.57
= 0.39
Secondary Data
• Secondary data – data someone else
has collected
• Examples of Sources
PAGASA – weather observations
CLSU-OAd – CLSU student databases
• Limitations
When was it collected? For how l long? Data
set complete? Confounding problems?
Reliable?
• Advantages
No need to reinvent the wheel; Saves money,
time; may be accurate; has great exploratory
value
Primary Data
• Primary data – data you collect
• Examples of Sources
Surveys
Experiments
• Limitations
Have time? Money? Uniqueness Research error
Exercise
The access code to a house's security system consists of 5
digits. Each digit can be 0 through 9. How many different
codes are available if

a.) each digit can be repeated?


b.) each digit can only be used once and not repeated?
a.) Because each digit can be repeated, there are 10 choices for each of
the 5 digits.

10 · 10 · 10 · 10 · 10 = 100,000 codes

b.) Because each digit cannot be repeated, there are 10 choices for the first
digit, 9 choices left for the second digit, 8 for the third, 7 for the fourth
and 6 for the fifth.

10 · 9 · 8 · 7 · 6 = 30,240 codes
Exercise
Say that you have 42 friends on Facebook. Any
time you access your profile, 6 of your friends will
appear in a box below your picture. Each time
these friends are randomly generated out of your
42 friends. How many different arrangements of 6
people can show up?
Answer:
42x41x40x39x38x37= 3,776,965,920

What if your friend Ally or friend Bobby must be in


position one, that is, they are your only 2 choices
for that first spot?
(note: you still have 42 friends)
Answer:
2x41x40x39x38x37= 179,855,520
Exercise
How many ways are there to order 5 people to sit
in 5 chairs?
There are 5! ways to order 5 people in 5 chairs
(since a person cannot repeat)
# of permutations = 5 x 4 x 3 x 2 x 1 = 5! 5! 5!
  5!
( 5  5 )! 0!
What if you had to arrange 5 people in only 3 chairs
(meaning 2 are out)?

5 x4 x3 
5 x 4 x 3 x 2 x1 5!
 
2 x1 2!
5!
( 5  3 )!
Combinations

• How many unique 2-card sets out of 52


cards? 52 x51  52!
2 (52  2)!2!

• 5-card sets? 52 x51x50 x 49 x 48 52!



5! (52  5)!5!
• r-card sets? 52!
(52  r )! r!
• r-card sets out of n-cards? n n!
 
 r  ( n  r )! r!

You might also like