You are on page 1of 123

11

CHAPTER 1
RANDOM VARIABLES AND PROBABILITY DISTRIBUTION

Objectives
The learner should be able to
1. illustrate a random variable;
2. distinguish between a discrete and a continuous random variable;
3. illustrate a probability distribution for a discrete random variable and its properties;
4. construct the probability mass function of a discrete random variable and its
corresponding histogram;
5. compute probabilities corresponding to a given random variable;
6. illustrate, calculate and interpret the mean and variance of a discrete random
variable; and
7. solve problems involving mean and variance of probability distribution.

Statistics
• a branch of applied mathematics concerned with collecting, organizing, and interpreting
data. The data are represented by means of graphs.
• the mathematical study of the likelihood and probability of events occurring based on
known quantitative data or a collection of data
• attempts to infer the properties of a large collection of data from inspection of a
sample of the collection thereby allowing educated guesses to be made with a
minimum of expense

Lesson 1.1 Random Variable

Random variable is a variable that is subject to random variations so that it can take on
multiple different values, each with an associated probability.
It is a set of possible values from a random experiment.

Example: In tossing a coin once, we could get HEAD or TAIL.


Giving them the values : Head = 0 and Tail = 1 , and random variable as “X”

In short : X = { 0, 1 }
So:
• Experiment : tossing a coin
• Event : Head or Tail
• Values to each event : Head = 0 ; Tail = 1
• Random Variable ( X ) : set of values

Note: Other values of choice can be used for Head and Tail. Ex. Head = 5 and Tail = 10.

Population ( N ) – is a collection of individuals, objects or numerical data obtained by


measurements, whose properties need to be analyzed.

1
Sample ( n ) – is a subset of a population. It refers to a set of data collected from a larger set
called population.

Example: In a population of 2500 people, what will be the actual sample size if it is 15% of
the total population ?
Answer : n = 0.15 ( N )
= 0.15 ( 2500 )
n = 375

Two Types of Random Variable


A. Discrete Random Variable
- it is a variable that can assume only a finite or specific number of values
- a variable that can be counted or is associated with the set of counting numbers
Examples:
- number of students in a classroom
- number of blue cars in a parking area
B. Continuous Random Variable
- it is a variable that takes an infinite number of possible values
- a variable that is measurable
Examples:
- height
- weight
- time

Lesson 1.2 Probability Distribution for a Discrete Random Variable

Probability ( Pr )
• the chance or likelihood that a certain event will occur
• based on reasoning written as a ratio of the number of favorable outcomes to the
number of possible outcomes : Pr (x) = number of favorable outcomes
number of possible outcomes
• expressed in fraction ( e.g. 3/4 ), in decimal ( 0.75 ) or in percentage ( 75% )

Probability Distribution for a Discrete Random Variable


It consists of the values that the variables assume and the probabilities associated with
the values. It can be presented by using a graph, table, or notation formula.

Note: A discrete probability distribution Pr ( x ) must satisfy the following requirements:


• 0 ≤ Pr (x) ≤ : this implies that the probability of each event in the sample
space must be from 0 to 1.
• Σ Pr (x) = 1 ; this implies that the sum of the probabilities is equal to 1.

2
11

Example 1: Toss a coin 2 times. Present the discrete probability distribution using a notation
formula, a table and a graph.
Solution:
Possible outcomes: 1st toss

2nd toss H T

H T H T

A. Notation formula
Sample space ( S ) = { HH, HT, TH, TT }

If “ x” is the random variable for the number of Head , then x assumes the value of
0 head, 1 head and 2 heads.
Hence ,
• the probability of getting 0 Head : Pr ( x = no head ) = 1/4
• the probability of getting 1 Head : Pr ( x = 1 head ) = 2/4 or 1/2
• the probability of getting 2 Heads: Pr ( x = 2 heads ) = 1/4

B. Table
The probability distribution is constructed by listing the outcomes and determining the
probability value for each of the outcomes.

Event Outcome No. of Heads Probability, Pr(x)


1 HH 2 1/4 or 0.25
2 HT 1 1/4 or 0.25
3 TH 1 1/4 or 0.25
4 TT 0 1/4 or 0.25
Σ Pr (x) = 1

C. Graph ( histogram )
Shown below is the histogram of the discrete probability distribution.

0.75

Pr (X ) 0.50

0.25

0 1 2
( x – number of heads )

3
Example 2 : In a class of 40 students, 30 students passed in all subjects, 5 failed in one
subject, 3 failed in two subjects and 2 failed in three subjects. Find the probability
distribution of the variable for number of subjects a student from the given class
has failed in.
Solution :

A. Notation Formula
Probability of failing in 0 subjects : P ( X = 0 ) = 30/40 = 0.75
Probability of failing in 1 subject : P ( X = 1 ) = 5/40 = 0.125
Probability of failing in 2 subjects : P ( X = 2 ) = 3/40 = 0.075
Probability of failing in 3 subjects : P ( X = 3 ) = 2/40 = 0.05

B. Table

x 0 1 2 3

P(X) 0.75 0.125 0.075 0.05

C. Graph
0.80 0.75
0.70
0.60
0.50
Pr ( X ) 0.40
0.30
0.20 0.125
0.10 0.075 0.05
0.0
0 1 2 3
X – number of failed subjects

Activity
Construct a probability distribution table for the values of the variables and
the corresponding probabilities when :

1. two dice are rolled once and simultaneously

2. three coins are tossed once and simultaneously

4
11

Lesson 1.3 Mean, Variance, and Standard Deviation for a Probability Distribution

Mean
The mean of the probability distribution is different from the mean on measures of
central tendency. The mean of the probability distribution is obtained as the sum of the
product of the possible outcomes and the probability of the outcome. In mathematical notation
it is represented as:

µ = Σ [x • Pr ( x )]

where;
µ - mean of the probability
x – possible outcome
Pr(x)- probability of the outcome

Variance and Standard Deviation


The variance ( s2 ) and the standard deviation ( s ) of the probability distribution are
used to measure the spread or variability of infinite number of values.

Formula:

Variance : s2 = Σ [ x2 • Pr (x)] -- µ2

Standard Deviation: s = √ Σ [ x2 • Pr (x)] - µ2 or s = √ s2

Example 1:

When 3 coins were tossed once and simultaneously and the occurrences of the number
of heads were recorded, what will be the mean of the occurrences of the number of heads?
Compute also the variance and the standard deviation.

Solution:

Possible outcomes:
S = { HHH, HHT, HTH, HTT, THH, THT, TTH, TTT }
Note:
The probability of each of the sample points is 1/8.
Pr ( X ) X ( Number of Heads )
1/8 0 ( No Head )
3/8 1 Head
3/8 2 Heads
1/8 3 Heads

Σ Pr ( x ) = 1

5
Solution:
A. mean
µ = Σ [ x • Pr ( x )]
= Σ [ 0(1/8) + 1(3/8) + 2(3/8) + 3(1/8) ]
= Σ [ 0 + 3/8 + 6/8 + 3/8 ]
= 12/8
µ = 3/2 or 1.5
B. variance

s2 = Σ [ x2 • Pr (x)] -- µ2
= Σ [ 02(1/8) + 12 (3/8) + 22(3/8) + 32(1/8) ] – 1.52
= Σ [ 0(1/8) + 1(3/8) + 4(3/8) + 9(1/8) ] – 2.25
= Σ [ 0 + 3/8 + 12/8 + 9/8 ] – 2.25
= 24/8 – 2.25
= 3 – 2.25
2
s = 0.75
C. standard deviation
s = √ s2
= √ 0.75
s = ± 0.87

Activity
A distribution of the number of students that obtain an average score of 85 and above
for 3 years and the corresponding probabilities is shown below. Find the mean , variance and
standard deviation of the probability.

No. of students Probability Pr(x)


10 0.40
16 0.25
12 0.35

6
11

CHAPTER 1 REVIEW TEST

I. Tell whether the following is an example of discrete or continuous random variable.


____________________1. depth of a river
____________________2. expenses of the administration on school fares
____________________3. number of grade 11 STEM students fool for this school year
____________________4. result of blood test
____________________5. length of sleep of a baby
____________________6. number of vehicles passing EDSA
____________________7. travel time from Muntinlupa to Bicol
____________________8. number of employees of Lyceum of Alabang
____________________9. tire pressure
___________________10. room temperature

II. From a deck of card, find the probability of the following :


____________1. a red card will be picked up
____________2. an ace card will be picked up
____________3. 7 of hearts will be chosen
____________4. a black queen will be drawn
____________5. a face card and a nine of clubs will be picked up

III. A family has three children. Using B to stand for boy and G to stand for girl, find the
probability that :

____________1. all are boys


____________2. there are 2 girls
____________3. there are exactly two boys given that the first child is a girl
____________4. that the family has at least two boys
____________5. there are two boys and a girl

IV. Construct a probability distribution table and draw a histogram for the canteen’s customers
who will order 1, 2, 3 number of viand with a probabilities of 0.45, 0.35, and 0.20 respectively.

Table : Histogram:

7
V. A research was conducted to determine the total number of cars each household has in a
certain village. The result of survey is shown below. Solve for the mean, variance, and standard
deviation of the distribution.

Number of Car Probability Pr(x)


0 0.28
1 0.37
2 0.18
3 0.12
4 0.05

VI. The maximum life span of an iron is 5 years. Find the mean, variance, and standard
deviation given distribution of the life span below.

Life Span Probability


1 0.10
2 0.15
3 0.20
4 1.25
5 0.30

VII. Find the mean of the distribution if the probabilities that a family children will have 0, 1, 2, or
3 boys are 1/5 , 4/15, 1/6, and 11/30 respectively.

8
11

CHAPTER 2

NORMAL DISTRIBUTION

Objectives
The learner should be to
1. illustrate a normal random variable and its characteristics,
2. construct a normal curve,
3. identify regions under the normal curve corresponding to different standard normal
values,
4. convert a normal random variable to a standard normal variable and vice versa, and
5. compute probabilities and percentages using the standard normal table.

Lesson 2.1 Properties of a Normal Distribution

The normal distribution is a theoretical distribution since there is no continuous


random variable that fits a normal distribution perfectly. Hence, there are variables that can
be described as normal distribution since their deviations from normal distribution are small
or insignificant. The properties of a normal distribution are given below:

1. The curve of a normal distribution is bell-shaped. It is symmetric distribution since the


data are evenly distributed from the mean forming same shape on both sides of a
vertical line passing through the center.

2. The mean, median, and mode of a normal distribution coincide to its center and have
equal values. Note that the distribution is unimodal since it has only one mode.

3. The tail ends of the curve are asymptotically extended on both sides of the x-axis.

4. The area under a normal curve represents the total population. The total area under a
normal curve is 100% or 1.00. Each sides of a normal curve measures 50% or 0.50.
The area under normal curve indicates probability.

5. The shape of a normal curve of a normal distribution depends on the values of the
mean and standard deviation. The negative standard deviations are located on the left
side of the curve and the positive standard deviations are located on right side of the
normal curve. The larger the standard deviation is, the more spread out the distribution.

Note:
The shape of a normal curve of a normal distribution depends on the following
conditions:
1. same means with different standard deviation
2. different means with different standard deviation
3. different means with same standard deviation

9
100 %

50 % 50 %

Empirical Rules of a Normal Distribution


1. There is approximately 34% of the area lies between the mean and the 1 standard
deviation as well as on the -1 standard deviation.
2. There is approximately 47.5% of the area lies between the mean and the 2 standard
deviation as well as on the -2 standard deviation.
3. There is approximately 49.5% of the area lies between the mean and the 3 standard
deviation as well as on the -3 standard deviation.

Lesson 2.2 Areas under Normal Curve


A standard normal distribution table is use to find the values of areas under a normal
curve that indicate a proportion of the areas being considered. To find the areas under a
normal curve with normal distributed variables, we shall transform them into normal
distributed variable by using the formula of standard score (Z):

10
11

Standard Score Formula ( Z ):

Z=X–X
s
where: x - value of the variable

x - the mean

s - the standard deviation

Steps in Finding the Area under a Normal Curve


1. Draw a sketch of the normal curve and shade the desired area
2. Transform the variable (x) into standard score (Z) if necessary.
3. Find the area of the variable in a normal distribution table.

Note:
1. The standard score (z) represents the number of standard deviation the variable x is away
from the mean.
2. The area under normal curve is always positive since there is no such thing as a
negative value for an area.

4. Add or subtract the areas taken from the table of normal distribution if necessary.

Note:
Add the values of the areas if and only if the standard scores ( Z ) are located on
opposite sides of the distribution.
Subtract the values of the areas if and only if the standard scores ( Z ) are located
on the same side of the distribution.

Example 1:

Problem : Find the area between Z = 0 and Z = 2

Solution:
Draw a sketch of the normal curve and shade the desired area.

11
From Table 1 ( Areas Under Normal Curve ):

X .00 .01

0.0 .0000

2.0 0.4772

Area Z=0 = 0.0000

Area Z= 2 = 0.4772

Area net = Area Z= 2 - Area Z=0


= 0.4772 – 0.0000
= 0.4772 or 47.72 %

Example 2:

Problem : Find the area between Z = -1 and Z = - 3


Solution:
Draw a sketch of the normal curve and shade the desired area.

From Table 1 ( Areas Under Normal Curve ):


Area Z=-1 = 0.3413
Area Z=-3 = 0.4987

Area net = Area Z=-3 - Area Z=-1 ( Subtraction since both Zs lie
= 0.4987 – 0.3413 on the same side )
= 0.1574 or 15.74 %

12
11

Example 3:

Problem : Find the area between Z = 0.25 and Z = 1.86


Solution:
Draw a sketch of the normal curve and shade the desired area.

From Table 1 ( Areas Under Normal Curve ):


Area Z=0.25 = 0.0987
Area Z=1.86 = 0.4686

Area Net = Area Z=0.25 - Area Z=1.86


= 0.0987 – 0.4686
= -.0.3699
= +0.3699 or 36.99 % ( positive since there is no
negative value for area )
Example 4:

Problem : Find the area between Z = 2.0 and Z = -1.0


Solution:
Draw a sketch of the normal curve and shade the desired area.

From Table 1 ( Areas Under Normal Curve ):


Area Z=2.0 = 0.4772
Area Z= - 1.0 = 0.3413

Area Net = Area Z=2.0 + Area Z= - 1.0 ( Add since standard scores
= 0.4772 + 0.3413 are located on opposite
= 0.8185 or 81.85 % sides )

13
Example 5:

Problem : Find the area left of Z = - 0.72


Solution:
Draw a sketch of the normal curve and shade the desired area.

-0.72

A Z= -0.72 = 0.2642

Area left of Z( - 0.72 ) = 0.5000 – 0.2642


= 0.2358 or 23.58 %

14
11

Exercises
With the following given, find the net area under a normal distribution. Draw a sketch of
the normal curve and shade the desired area.
1. between Z = 0 and Z = 2.47

2. between Z = 0 and Z = -1.13

3. between Z = 1.91 and Z = 2.92

4. between Z = 2.56 and Z = -1.89

5. between Z = -0.12 and Z = -2.81

6. area to the right of Z = 2.15

7. area below Z = 1.5

8. area to the left of Z = 0.95

9. area above Z = - 1.9 and below Z = 1.0

10. area between Z = 1.2 and Z = 2.25

15
Lesson 2.3 Probability Distribution Curve

The Probability distribution is used to treat data for normally distributed variables. Note that
there are no gaps in a continuous distribution.

Example 1:

Problem: Find the probability of Pr (Z ≥ 1.25)


Solution:
The Pr (Z ≥ 1.25) means to find the probability under a normal curve that is greater than
or equal to 1.25. Since 1.25 is located on the right side of the normal curve and this side
corresponds to 0.50 or 50% of the distribution and the area of Z = 1.25 from the mean is
0.3944 or 39.44% we should subtract 0.3944 to 0.5000 and we will obtain 0.1056.

Pr ( Z ≥ 1.25 ) = 0.5 – 0.3944 = 0.1056 or 10.56 %

Graph of Pr (Z ≥ 1.25): solid line

Example 2:

Problem : Find the probability of Pr (-1.28 ≤ Z ≤ 0.50)


Solution:
The Pr (-1.28 ≤ Z ≤ 0.50) means to find the probability under a normal curve that is
greater than or equal to -1.28, but less than or equal to 0.50

Area Z= - 1.28 = 0.3997


Area Z= 0.50 = 0.1915

Area Net = Area Z= - 1.28 + Area Z= 0.50 ( Add the two areas since they are
= 0.3997 + 0.1915 located on opposite sides of the
= 0.5912 or 59.12 % distribution curve )

16
11

Graph of Pr (-1.28 ≤ Z ≤ 0.50)

Area = 0.5912 or 59.12 %

Z= - 1.28 Z= 0..50

Lesson 2.4 Applications of a Normal Curve Area

Example 1:
1600 Grade 11 students took statistics examination and they obtained a mean score of
82% and a standard deviation of 6%. If the data are normally distributed, find the number of
students who obtained:
a. a score of 87% and above
b. a score of 80% to 90%
c. a passing score
Solution:
a. a score of 87% and above

x = 87 ; s = 5 ; x = 82

Step 1: Transform the variable (x) into standard score (Z) if necessary.

Z=X–X
s
Z = 87 – 82
5
Z = 5 or 1
5
Hence : 87% is equivalent to 1.00.

Step 2: Find the area of the variable in a normal distribution table : Area Z=1.0 = 0.3413

17
LYCEUM OF ALABANG

Step 3: Subtract the area taken from the table which is equivalent to 0.3413 from
0.5000 which represents as the total area of the right side of the distribution
Area Net = 0.5 – 0.3413
= 0.1587 or 15.87 %

Therefore, the probability of the students who obtained a score of 87% and above is 0.1587.

Step 4: Multiply the computed probability to the total number of students who took the
examination to get the total students who got a score of 87% and above.

No. of students who got a score of 87% and above = 0.1587 x 1600
= 253.92 or 254 students

Note: Rounded off to whole number since there is no such person to be considered as
decimal.

b. a score of 80% to 90%


x1 = 80% x2 = 90%
s1 = 5% s2 = 5%
x = 82% x = 82%
Step 1: Transform the variable (x) into standard score (Z) if necessary.

Z 1= X1 – X Z 2= X2 – X
s s
= 80 – 82 = 90 – 82
5 5
Z 1 = - 0.4 Z 2 = 1.6

Hence :
80% is equivalent to – 0.4 ( Z )
90% is equivalent to 1.6.( Z )

Step 2: Find the area of the variables in the normal distribution table.
Area Z= -0.4 = 0.1554
Area Z= 1.6 = 0.4452

Area Net = Area Z= -0.4 + Area Z= 1.6


= 0.1554 + 0.4452
= 0.6006 or 60.06 %

18
LYCEUM OF ALABANG

Therefore, the probability of the students who obtained a score of 80% to 90% is 0.6006.

Step 3: Multiply the computed probability to the total number of students who took the
examination to get the total students who got a score of 80% to 90%

No .of students who got a score of 80% to 90%.= 0.6006 x 1600


= 961

c. a passing score

Since the passing score is 75%, therefore the value of the variable x is 75%

x = 75% s = 5% ẍ = 82%

Step 1: Transform the variable (x) into standard score (Z) if necessary.

Z=X–X
s
Z = 75 – 82
5
Z = -7
5
Z = - 1.4
Hence , 75% is equivalent to – 1.4 ( Z ).

Step 2: Find the area of the variable in a normal distribution table.

Area Z = - 1.4 = 0.4192


Area (right side ) = 0.5000

Area Net = 0.4192 + 0.5000


= 0.9192 or 91.92 %
Therefore, the probability of the students who obtained a passing score of 75% is 0.9192.

Step 3: Multiply the computed probability to the total number of students who took the
examination to get the total students who got a passing score.

19
LYCEUM OF ALABANG

Number of students who got a passing score = 0.9192 x 1600


= 1471

Example 2:
The weights of 1-year-old baby are approximately normally distributed, with a mean of
22.8 lbs and a standard deviation of about 2.15. If there were 164 randomly selected 1-year-old
babies, how many babies weigh at least 20 pounds?

Solution:
Given: x = 20 lbs. s = 2.15 x = 22.8 lbs

Step 1: Transform the variable (x) into standard score (Z)

Z=X–X
s
Z = 20 – 22.8
2.1 5
Z = -2.8
2.15
Z = - 1.3
Hence, 20 lbs. is equivalent to -1.30 ( Z )

Step 2: Find the area of the variable in a normal distribution table.


Area Z = - 1.30 = 0.4032

Since we were ask to find the number of babies that weigh at least 20lbs., it means that
we are going to solve the probability of babies that weigh 20 lbs. and above. See figure
below.

Area Net = 0.4032 + 0.5000


= 0.9032 or 90.32 %

Therefore, the probability of the babies who weigh 20 lbs. and above is 0.9032 or 90.32%

20
LYCEUM OF ALABANG

Step 3: Multiply the computed probability to the total number of babies to get the total
babies that weigh 20 lbs. and above.

Number of babies that weigh 20 lbs. and above = 0.9032 x 164


= 148

Example 3:
The tests for an individual's intelligence quotient (IQ) are designed to be normally
distributed, with a mean of 100 and a standard deviation of 15. In 1916, psychologist Lewis M.
Thurman set a guideline of 120 for "potential genius". Using this information, what percentage
of individuals are "potential geniuses"?

Solution:

Based on the information, we can consider a man as a potentially genius if he will obtain a
score of 120 and above. See figure below.

x = 120 s = 15 x = 100

21
LYCEUM OF ALABANG

Step 1: Transform the variable (x) into standard score (Z) .

Z=X–X
s
Z = 120 – 100
15
Z = 20
15
Z = 1.33
Hence, 20 lbs. is equivalent to 1.33 ( Z )

Step 2: Find the area of the variable in a normal distribution table.

Area Z = 1.33 = 0.4082


Area ( right side ) = 0.500
Step 3: Find the difference of the two areas since they both lie on the right side of the
distribution.
Area Net = 0.5000 – 0.4082
= 0.0918 or 9.18 %

Therefore, the percentage of individuals that are "potential geniuses" is 9.18%.

Note: No need to compute for the number of potential geniuses since only the percentage is
being asked.

22
LYCEUM OF ALABANG

CHAPTER TEST

I. Sketch the graph and find the net area under a normal curve distribution that lies between or
on :
1. Z = 0 to Z = 1.23

2. Z = 0 to Z = -1.72

3. Z = -0.34 to Z = 2.61

4. left side of 2.13

5. right side of 0.57

II. Using a standard normal distribution, find the probability of the following Z :

1. Pr (Z ≤ 2.56)

2. Pr (-1.25 ≤ Z ≤ 1.17)

3. Pr (0.25 ≤ Z ≤ 1.94)

4. Pr (-2.64 ≤ Z ≤ -1.72)

5. Pr (-2.58 ≤ Z ≤ 1.26)

23
LYCEUM OF ALABANG

III. Solve the following problem.

1. 7,618 took the entrance examination in a certain university, and obtained a mean score of
84%. If the scores where normally distributed and has 7% standard deviation, how many
students obtained a score of:

a. 85% to 90%

b. 87% and higher

c. 74% and below

2. The distribution weights of items produced by a manufacturing process can be


approximated by a normal distribution with a mean of 90 grams and a standard deviation of
1 gram. If you are going to select one item, what is the probability that it weighs:

a. 90.5 grams and above

b. 89.67 grams and below

24
LYCEUM OF ALABANG

CHAPTER 3 : SAMPLING AND SAMPLING DISTRIBUTION

Objectives
The learner should be able to
1. illustrate random sampling,
2. distinguish between parameter and statistics,
3. find the mean and variance of the sampling distribution of the sample mean,
4. illustrate the Central Limit Theorem, and
5. solve problems involving sampling distributions of the sample mean.

Lesson 3.1 Random Sampling

Since it is impossible to study an entire population (every individual in a country, all


college students, every geographic area, etc.), researchers typically rely on sampling to
acquire a section of the population to perform an experiment or observational study. It is
important that the group selected be representative of the population, and not biased in a
systematic manner. For example, a group comprised of the wealthiest individuals in a given
area probably would not accurately reflect the opinions of the entire population in that area.
For this reason, randomization is typically employed to achieve an unbiased sample. The most
common sampling designs are simple random sampling and stratified random sampling.

Simple Random Sampling


It is the basic sampling technique where we select a group of subjects (a sample) for
study from a larger group (a population). Each individual is chosen entirely by chance and
each member of the population has an equal chance of being included in the sample. Every
possible sample of a given size has the same chance of selection. Simple random sampling is
most appropriate when the entire population from which the sample is taken is homogeneous.

Stratified Random Sampling


It is obtained by taking samples from each stratum or sub-group of a population. There
may often be factors which divide up the population into sub-populations (groups / strata) and
we may expect the measurement of interest to vary among the different sub-populations. This
has to be accounted for when we select a sample from the population in order that we obtain a
sample that is representative of the population. Stratified sampling techniques are generally
used when the population is heterogeneous, or dissimilar, where certain homogeneous, or
similar, sub-populations can be isolated (strata). Some reasons for using stratified sampling
over simple random sampling are:

a) the cost per observation in the survey may be reduced;


b) estimates of the population parameters may be wanted for each sub- population; and
c) increased accuracy at given cost.

Lesson 3.2 Populations, Samples, Parameters, and Statistics

The field of inferential statistics enables you to make educated guesses about the
numerical characteristics of large groups. The logic of sampling gives you a way to test
conclusions about such groups using only a small portion of its members.

25
LYCEUM OF ALABANG

A population is a group of phenomena that have something in common. The term


often refers to a group of people, as in the following examples:

• all registered voters in Muntinlupa City


• all regular employees of Lyceum of Alabang
• all students of Lyceum of Alabang under the strand of STEM

Often, researchers want to know things about populations but do not have data for
every person or thing in the population. If a company's customer service division wanted to
learn whether its customers were satisfied, it would not be practical (or perhaps even possible)
to contact every individual who purchased a product. Instead, the company might select a
sample of the population. A sample is a smaller group of members of a population selected to
represent the population. In order to use statistics to learn things about the population, the
sample must be random.

* parameter - is a characteristic of a population

* statistic - is a characteristic of a sample

For example, say you want to know the mean income of the subscribers to a particular
magazine—a parameter of a population. You draw a random sample of 100 subscribers and
determine that their mean income is $27,500 (a statistic). You conclude that the population
mean income μ is likely to be closed to $27,500 as well. This example is one of statistical
inference.

Lesson 3.3 Mean, Variance and Standard Deviation of Probability Distribution

The mean of a discrete random variable X is a weighted average of the possible values
that the random variable can take. Unlike the sample mean of a group of observations, which
gives each observation equal weight, the mean of a random variable weights outcome i ,
according to its probability, pi. The common symbol for the mean (also known as the expected
value of X) is µ , formally defined by:

µ = Σ xi pi
where :
µ - mean
i - mean of a random variable weights outcome
pi - probability of each mean random variable outcome

The law of large numbers states that the observed random mean from an increasingly
large number of observations of a random variable will always approach the distribution mean .
That is, as the number of observations increases, the mean of these observations will become
closer and closer to the true mean of the random variable. This does not imply, however, that
short term averages will reflect the mean.

26
LYCEUM OF ALABANG

Variance
The variance of a discrete random variable X measures the spread, or variability, of the
distribution, and is defined by:

s2 = Σ ( x – x )2
n-1

where

s2 - variance
x - random variable
x - mean of the sample values
n - size of the random variable

Standard Deviation ( s ) - is the square root of the variance.

s = Σ ( x – x )2 or s = √ s2
n-1

Example 1:

Problem : 5, 8, and 9 are the scores obtained by 3 selected students in a particular quiz. By
using a random distribution with random variable size of r = 2, solve the following:

a. population mean
b. variance and standard deviation.

Solution

a. population mean

Step 1: Determine the number of sample values of size r = 2 using the combination formula :

nCr = n!
( n – r )! r!

where: C - number of sample values


n – number of samples
r - size of the combination

3C2 = 3!
( 3 – 2 )! 2!
= 6
1( 2 )
3C2 =3 ( there 3 sample values and the probability of each
is 1/3 or 0.33 )

27
LYCEUM OF ALABANG

Step 2: Calculate the mean, x, of each sample values as tabulated below.

Sample Number Sample Mean ( x ) Probability,


Values ( x ) Pr ( x )
1 5 & 8 6.5 1/3
2 5 & 9 7 1/3
3 8 & 9 8.5 1/3

Step 3: Compute the population mean, 𝝁, of the sampling distribution.

7.333…

b. variance and standard deviation

Prepare a table that shows a sample variance and standard deviation


Sample Sample Mean ( ) Sample Sample
Number Values(x) Variance (s2) Standard
Deviation (s)

1 5&8 6.5

2 5&9 7

3 8&9 8.5

Compute the sample variance ( s2 ) and standard deviation ( s or √ s2 ) of each sample values.

1. 5 & 8 2. 5 & 9

𝒔𝟐= ⁡𝟒. 𝟓 𝒔𝟐 = ⁡𝟖

28
LYCEUM OF ALABANG

s= s=

= = √8

s = 2.12 s = 2.83

3. 8 & 9

𝒔𝟐= ⁡𝟎. 𝟓

s=
s=

s = 0.71

Complete the table with obtained values.

Sample Sample Mean ( ) Sample Sample


Number Values(x) Variance (s2) Standard
Deviation (s)
1 5&8 6.5 4.5 2.12
2 5&9 7 8 2.83
3 8&9 8.5 0.5 0.71

Example 2:

Problem : A population consists of 5 values such as 11, 13, 15, 17, and 19. Compute the
following with a random variable size of r = 3:

a. population mean

b. variance and standard deviation

29
LYCEUM OF ALABANG

Solution:

a. population mean

Step 1: Determine the number of sample values of size r = 3 using the combination formula.

nCr = n!
( n – r )! r!

3C2 = 5!
( 5 – 3 )! 3!
= 5x4x3x2x1
(2x1)( 3x21x )
3C2 = 10 ( there 10 sample values and the probability of each
is 1/10 or 0.10 )

Step 2: Calculate the mean, , of each sample values as tabulated below.

Sample Sample Mean ( ) Probability Pr(x)


Number Values(x)

1 11, 13, 15 13 0.10

2 11,13, 17 13.67 0.10

3 11,13, 19 14.33 0.10

4 11,15, 17 14.33 0.10

5 11, 15, 19 15 0.10

6 11, 17, 19 15.67 0.10

7 13, 15, 17 15 0.10

8 13, 15, 19 15.67 0.10

9 13, 17, 19 16.33 0.10

10 15, 17, 19 17 0.10

30
LYCEUM OF ALABANG

Since there are mean values that are common, the table below may use to make the solution
of the population mean simpler.
Mean ( ) Frequency (f) Probability Pr(x)
13 1 0.10
13.67 1 0.10
14.33 2 0.20
15 2 0.20
15.67 2 0.20
16.33 1 0.10
17 1 0.10
1.00

Step 3: Compute the population mean, 𝝁, of the sampling distribution.

𝜇=∑ 𝑖𝑝𝑖

𝝁=13(0.10)+13.67(0.10)+14.33(0.20)+ 15(0.20) +15.67(0.20) +16.33(0.10) + 17(0.10)

𝝁 = 1.3 +1.367 + 2.866 + 3.0 + 3.134 + 1.633 + 1.7

𝝁= 15

b. variance and standard deviation

Sample Sample Mean ( ) Sample Sample


Number Values(x) Variance (s2) Standard
Deviation (s)
1 11, 13, 15 13
2 11,13, 17 13.67
3 11,13, 19 14.33
4 11,15, 17 14.33
5 11, 15, 19 15
6 11, 17, 19 15.67
7 13, 15, 17 15
8 13, 15, 19 15.67
9 13, 17, 19 16.33
10 15, 17, 19 17
Compute the sample variance and standard deviation of each sample values.

1. 11, 13, 15 2. 11, 13, 17

31
LYCEUM OF ALABANG

s2 = 4 s2 = 9.33335

s = √ s2 s = √ s2
s=√4 s = √ 9.33335
s=2 s = 3.0551

Activity:

Continue computation and complete the table below.


Sample Number Sample Values(x) Mean ( ) Sample Sample
Variance (s2) Standard
Deviation (s)
1 11, 13, 15 13 4 2
2 11,13, 17 13.67 9.33335 3.0551
3 11,13, 19 14.33
4 11,15, 17 14.33
5 11, 15, 19 15
6 11, 17, 19 15.67
7 13, 15, 17 15
8 13, 15, 19 15.67
9 13, 17, 19 16.33
10 15, 17, 19 17

Lesson 3.4 Central Limit Theorem

Central limit theorem is a statistical theory that states that given a sufficiently large
sample size from a population with a finite level of variance, the mean of all samples from the
same population will be approximately equal to the mean of the population. If random samples
of a large sample size n that increase without limit are taken from a population with a specific
mean (𝝁) and standard deviation (s), the sampling distribution of the sample mean ( ) is
approximately normally distributed with a mean (𝝁) and standard deviation of

where: 𝑠 x - standard deviation of the sample mean


s - standard deviation of the population
n - sample size

32
LYCEUM OF ALABANG

To compute the value of z we used:

z=x–𝝁
𝑠x

where: - sample mean

𝝁 - population mean

s x - standard deviation of the sample mean

Note:
1. For any sample size n, the sampling distribution of a sample mean is a normal
distribution if the original variable is normally distributed.
2. For a sample size of 30 or more, it is required to use a normal distribution to
estimate the distribution of a sample mean if the original variable is normally distributed.

Example 1:

Problem : The mean raw score of Grade 11 students in Statistics examination was 20 with a
standard deviation of 4. If 36 students are randomly selected, find the probability
that the mean score of the students is higher than 21.
Solution:

Step 1: Compute the standard deviation of the sample mean

s=3 n = 36

= 3
√ 36
sx = 1/2 or 0.5

Step 2: Identify the parts of the problem.


µ = 20 x > 21 s x = 0.5

Step 3: Compute the z score

z=x–𝝁
𝑠 x
= 21 – 20
0.5
z=2

33
LYCEUM OF ALABANG

Step 4: Draw the graph.

Step 5: Find the area of the variable in a normal distribution table ( Area z=2 = 0.4772 ).

Area net = Area Right side – Area z = 2


= 0.5000 – 0.4772
= 0.0228

Therefore, the probability of obtaining sample that has a raw score of higher than 21 is
0.0228 or 2.28%.

Example 2:

Problem: The average amount of salt in mg. for certain instant noodle per cup sold in the
market is 200 mg. with a standard deviation of 10 mg. Assume that the variable is
distributed, and if a single cup noodle is selected , find the probability that the
of salt in the noodle will be more than 210 mg.
Solution:
Step 1: Compute the standard deviation of the sample mean

s = 10 n=1

s X = 10

Step 2: Identify the parts of the problems.

𝝁 = 200

= 210

𝑠 = 10

34
LYCEUM OF ALABANG

Step 3: Compute the z score

z=x–𝝁
𝑠 x
= 210 – 200
10
z=1

Step 4: Draw the graph

Step 5: Find the area of the variable in a normal distribution table ( Area z = 1 = 0.3413 )

Area net = Area Right side – Area z = 1


= 0.5000 – 0.3413
= 0.1587

Therefore, the probability of obtaining a sample noodle that contains 210 mg. of salt is
0.1587 or 15.87 %.

Example 3:

Problem: The average consumption of rice of a rural male adult person in a year is 96 kilos. If
the standard deviation is 20 kilos and the distribution is approximately normal, find
the probability that the mean of the sample will be less than 102 kilos in a year if a
sample of 49 individual male adults chosen.
Step 1: Compute the standard deviation of the sample mean .
s = 20 n = 49

𝒔 x = 𝟐. 𝟖𝟓𝟕𝟏

35
LYCEUM OF ALABANG

Step 2: Identify the parts of the problem

µ = 96 x = 102 𝒔 = 𝟐. 𝟖𝟓𝟕𝟏

Step 3: Compute the z score

z=x–𝝁
𝑠 x
= 102 – 96
2.8571
z = 2.1

Step 4: Draw the graph

Step 5: Find the area of the variable in a normal distribution table ( Area z=2.1 = 0.4821 ).

Area net = Area Left side + Area z = 2.1


= 0.5000 + 0.4821
= 0.9821

Therefore, the probability of obtaining 49 samples that consume less than 102 kilos of
rice is 0.9821 or 98.21%.

Example 4:

Problem: The average life span of TV sets manufactured by company X is 10.5 years and the
standard deviation is 1.8 years. If a random sample of 50 TV sets are chosen, find
the probability that the mean life span of its TV sets is 10 to 11 years.

36
LYCEUM OF ALABANG

Solution:

Step 1: Compute the standard deviation of the sample mean

s = 1.8 n = 50

s x = 0.2546

Step 2: Identify the parts of the problem


µ = 10.5 µ = 10.5
x = 10 x = 11
s x = 0.2546 s x = 0.2546

Step 3: Compute the Z score. Since there were two ( 2 ) sample means, we are going to
compute two values of Z.

z=x–𝝁 z=x–𝝁
𝑠 x 𝑠 x

= 10 – 10.5 = 11 – 10.5
0.2546 0.2546
z = - 1.96 z = 1.96

Step 4: Draw the graph.

37
LYCEUM OF ALABANG

5: Find the area of the variable in a normal distribution table.

Area z = - 1.96 = 0.4750 ; Area z = 1.96 = 0.4750

Area net = Area z = - 1.96 + Area z = 1.96


= 0.4750 + 0.4750
= 0.95

Therefore, the probability that the mean life span of its TV sets range from 10 to 11
years is 0.9500 or 95%.

38
LYCEUM OF ALABANG

CHAPTER TEST

Solve the following:

A. Random Sampling

1. To obtain a random sample of 25, a researcher selects every 20th hamburger to determine
the fat content of the hamburger a burger store sells. Will his sample have the
characteristic of a random sample? Explain why or why not?

B. Mean, Variance and Standard Deviation

1. A population of N = 3 consists of the following values : 10,12,and 14,. Estimate the


population mean by using a sampling distribution with a random variable of size r = 2
Prepare the probability distributions of the sample means of the population N = 3

Sample Values Sample Mean

2. A population consists of N = 4 , with values of 4, 8,12, and 16. Construct sampling


distribution of the sample variances and the sample standard deviations using a random
variable of size r =3 . Prepare a probability distribution of the of the sample standard
deviations.

Sample Values Sample Mean Variance ( s2 ) Std. deviation


(x) (s)

39
LYCEUM OF ALABANG

C. Central Limit Theorem

1. The average age of public jeepneys plying in Metro Manila is 15 years. Assume that the
standard deviation is 4 years. If a random sample of 64 public jeepneys are chosen, find
the probability that the mean of jeepney’s age is :

a. between 12 to 19 years

a. over 18 years

40
LYCEUM OF ALABANG

CHAPTER 4

ESTIMATION OF PARAMETERS

Objectives
The learner should be able to
1. illustrate point and interval estimations,
2. distinguish between point and interval estimation,
3. identify the appropriate form of the confidence interval estimator for the population
mean ,
4. illustrate and construct a t – distribution,
5. identify regions under the t – distribution corresponding to different t-values,
6. identify point estimator for the population proportion,
7. compute for the point estimate of the population proportion, and
8. compute for the confidence interval estimate of the population proportion .

Parameter Estimation
It refers to the process of using sample data (in reliability engineering, usually times-to-
failure or success data) to estimate the parameters of the selected distribution.

Lesson 4.1 Point Estimate

Point estimate is a rule or formula that describes an estimate. A point of a population


parameter is a single value of statistic. An estimator is not expected to estimate the population
mean exactly; it must certainly be the nearest possible value.

Point estimator is used to estimate a population parameter and does not provide
information as to how close the estimate is to the population parameter. It is always obtained
by constructing an internal estimate by subtracting tor adding a value called the margin of error
from or to a point estimator.

In simple terms, any statistic can be a point estimate. A statistic is an estimator of some
parameters in a population. For example:

• The sample standard deviation, (s), is a point estimate of the population standard
deviation (σ).

• The sample mean(̄x) is a point estimate of the population mean,( μ)

• The sample variance (s2) is a point estimate of the population variance(σ2).

Efficient estimator – is an estimator having the least variance.

Unbiased estimator
It is an accurate statistic that’s used to approximate a population parameter. “Accurate”
in this sense means that it’s neither an overestimate nor an underestimate. If an overestimate
or underestimate does happen, the mean of the difference is called a “bias.”

41
LYCEUM OF ALABANG

If the estimator (i.e. the sample mean , x ) equals the parameter (i.e. the population
mean , 𝝁 ), then it’s an unbiased estimator or when the mean of the statistic’s sampling
distribution is equal to the population parameter.

A researcher can obtain unbiased estimators by avoiding bias during sampling and data
collection.
For example, to figure out the average amount people spend on food per week, it is
impossible to survey the whole population of over 100 million. So, it is more convenient to take
a random sample of around 1,000. After the survey, it was found out that the average amount
people spend per week is Php 2000 per person. Is this an unbiased estimator? Possibly. It all
depends on how the sample was taken . For example:
• Were the questions unbiased? For example, an ambiguous question like “How
much do you spend on groceries a week?” might seem simple enough. But some
people could take this to mean “How much did you spend this week on
groceries?” (if it’s the middle of the month, people might spend less) or “How
much money did you spend on your household groceries this week?” (be clear
that you’re asking per person, not per household.
• Was the sample chosen in an unbiased way (i.e. a simple random sample).
• Has any population members been excluded? For example, if you are performing
an internet survey, you may be excluding the poorest 25% of people who do not
have internet.

Example:

Problem: In the year 2015, the municipal registrar reported that the average matrimonial age
for male person is µ = 26.8 years old. Data on April 6, 2016 showed the ages of 5
male persons getting married are 24, 28, 21, 31 and 27, while on May 3, 2016 the
ages of 8 male persons getting married are 20,33,30,28,35,21,27 and 24. Determine
the :
a. unbiased estimator
b. most efficient estimator

Solutions:

Since Population mean for male persons (𝝁)= 26.8 years old.

a. unbiased estimator
Step 1: determine the sample mean nearest to the population mean.

Sample Mean 1 (April 6, 2016) = 24 +28 + 21 + 31 + 27


5
= 26.2

Sample Mean 2 (May 3, 2016) = 20 + 33 + 30 + 28 + 35 + 21 + 27 + 24


8
= 27.25

Since the sample mean 2 of 27.25 years is nearer to the population mean of 26.8 years
(difference of 0.45) than sample mean 1 of 26.2 years (difference of 0.6). Hence, the
unbiased estimator is sample mean 2.

42
LYCEUM OF ALABANG

b. most efficient estimator : using sample variance formula

variance ( s2 ) = Σ ( x – x )2
n-1

s2 ( variance1 ) = Σ ( 24 - 26.2 )2 + ( 28 – 26.2 )2+( 21 – 26.2 )2+ ( 31 – 26.2 )2+ ( 24 – 26.2 )2


5

s2 ( variance 1 ) = 14.7

s2 ( variance2 ) = Σ (20- 27.25 )2 + ( 33 – 27.25 )2+( 30 – 27.25 )2+ ( 28 –27.25 )2+ ( 35 – 27.5 )2+ ( 21 – 27.25 )2 + ( 27 – 27.25 ) 2+ ( 24 – 27.25 )2
8

s2 ( variance 2 ) = 25.44

Sample 1 is more efficient estimator since its sample variance is smaller than the
sample variance of sample 2.

Lesson 4.2. INTERVAL ESTIMATOR

Interval Estimate
It is a range (interval) of values that is likely to contain the true value of the parameter.
An interval estimate is associated with the degree of confidence.

Degree of Confidence (α)


It is a measure to determine if the population parameter is within the interval. Therefore,
it describes the probability that corresponds to the two tails of the normal curve distribution
shown below.

43
LYCEUM OF ALABANG

Critical
It is a factor used to compute the margin of error to determine the interval estimate of
the population parameter.
The central limit theorem states that the sampling distribution of a statistic will be
nearly normal if the sample size is large enough.

Lesson 4.3 Estimating a Population Parameter

Consider estimating and determining the sample size by applying proportion instead of
using the means. Let as assume that normal distribution can be used as approximation to the
distribution. All outcomes classified in one or two other categories are typically referred to as
success or failure. We have independent trials; and in each trial the probability of success is
denoted by p, and the probability of failure is denoted by q. If the conditions np ≥ 5 and nq ≥ 5
are both satisfied, we can use normal distribution.

To solve for sample proportion, we shall use:

where (reads as p hat) - sample proportion of success


x - the sample size
N - the total population

and q =1- p

q - sample proportion of failure

Margin of Error ( E )

E = z α p (1–p)
2 n

where : E – margin of error


p - proportion of successes
n - sample size
z α - critical value ( equal to 1.96 for 95% confidence interval , CI )
2

Example 1:
Problem : A survey was conducted among grade 12 students of Lyceum of Alabang, and
found out that there were 980 students out of 1600 who will pursue their study in
college.

44
LYCEUM OF ALABANG

a. Determine the proportion of students who will pursue their studies in college.

b. Find the interval estimate of proportion at 95% level of confidence of all Grade
12 student who will pursue college.

Solution:

a. the proportion of students who will pursue their studies in college.

= 0.6125 or 61.25%

Hence, 61.25% is the proportion of students who will pursue their studies in college.
The sample proportion is the best point estimate of the population proportion.

b. interval estimate of proportion at 95% level of confidence of all Grade 12 student who will
pursue college.

Step 1: compute the value of

=1–

= 1 – 0.6125

= 0.3875

Step 2: determine the value of the following.

= 0.3875

= 0.6125

Note: z is obtained through the use of Table 2 at 95% level of confidence.

45
LYCEUM OF ALABANG

Step 3: compute the value of E, margin of error.

E = z α p (1–p)
2 n

0.6125 ( 1 – 0.6125 )
= 1.95996
1600
= 1.95996 0.6125 ( 0.3875 )
1600
= 1.95996 ( 0.0122 )
E = 0.0239

Therefore, the interval estimate P =0.6125 0.0239 or 0.5886 < P < 0.6364.

Lesson 4.4 Determining Sample Size

To determine the sample size ( n ) needed to approximate the population proportion,


we shall use the following formula:
2
n = zα p (1–p)
2 E2

In the absence of p and q , we can assign the value of 0.5 for p . Since q=1-0.5 or
q=0.5, their product will be 0.25. The formula for sample size (n) is, thus,

Sample size (n) =

Example 2:

Problem : A survey was conducted to 1600 Grade 12 high school students to find out, who can
pursue college. Find the sample size (n) using a margin of error of 0.03 and
confidence level of 95%. Compute the sample size if:

a. the point estimate or prior value of 0.6125 ( from example 1 );

b. there is no prior information of the value of P.

Solution:

a. With prior value of P

46
LYCEUM OF ALABANG

Step 1: compute the value of

=1–

= 1 – 0.6125

= 0.3875

Step 2: determine the value of the following.

= 0.2875

= 0.6125

E = 0.03

Step 3: compute the sample size (n)

2
n = zα p (1–p)
2 E2

n = ( 1.95996 )2 ( 0.6125 ) ( 0.3875 )


( 0.03 ) 2
n = 1013

Hence, 1013 Grade 12 students who have prior information will be included in the survey.

b. With no prior in function on the value of P:

Sample size (n) =

n = ( 1.95996 )2 ( 0.25 )
( 0.03 )2

n = 1067

47
LYCEUM OF ALABANG

Therefore, the sample will be composed of 1,067 Grade 12 students, with no prior information
on the survey.

Lesson 4.5 T-Distribution

When the frequency distribution of a population is normal then the t-distribution can be solved
using :

t-value - a number that represents the number of std.


deviations a value falls from the mean on a t-distribution
x - sample mean
µ - population mean
n - sample size
s - standard deviation

Values of t can be obtained by locating the value of the degree of freedom, n-1. The
value of the degree of freedom is the number of scores that can vary after certain restrictions
are met.

degree of freedom for t-distribution ( df ) = n - 1

t-distribution properties and conditions :

1. the sample must be less than or equal to 30

2. α ( confidence interval ) is unknown

3. the distribution should be normal

Example 3:
Problem: The scores of 6 students in 5 quizzes are: 82, 85, 78, 75, 87, 73. For these scores:
n = 6, ẍ = 80 and s = 5.59. Construct the 95% estimate interval for the 5
scores.
Solution:

1. Solve the margin of error, E.

Degree of freedom ( df ) = n -1
=6–1
= 5

48
LYCEUM OF ALABANG

a
From Table 2: Level of Confidence : t = 2.57058
2

E=
E = 2.57058•
E = 2.57058•
E = 2.57058 • 2.2821
E = 5.866 or 5.9

Therefore, the estimate interval is 74.1< µ < 85.9

Lesson 4.6 Degree of Confidence or Certainty


The degree of confidence or certainty is the probability that the population parameter is
within the confidence interval, usually expressed in percentage value. The most commonly
used degree of confidence are 99% ( α= 0.01), 95% = (α = 0.05) and 90%(α = 0.10). Among
the three, 95% is mostly used because of its good balanced.
A critical value is the number on the border line that is likely to occur from those that are
unlikely to occur.

These five critical values of z are summarized in the following table.

α = tail central area =


area 1 – 2α zα
0.10 0.80 z.10 = 1.28
0.05 0.90 z.05 = 1.645
0.025 0.95 z.025 = 1.96
0.01 0.98 z.01 = 2.33
0.005 0.99 z.005 = 2.58

49
LYCEUM OF ALABANG

Example 4:
Problem: Find the critical value if the degree of confidence is 95%.
Solution:

At 95% degree of confidence


α = 1.0 – 0.95
α = 0.05
For two-tailed test :

Note: Round off the confidence interval to nearest tenths.

Lesson 4.7 Margin of Error

The margin of error, E, in sample data that is used to estimate a population is the
probability, 1 – α. This is the difference between the observe sample mean (ẍ) and the true
value of the population. The margin of error which is also called the maximum error of the
estimate can be determined using the formula;

where : E - margin of error

- critical value

s - population of the standard deviation

n - sample size

To calculate the Margin of Error when the population standard deviation is unknown,
replace the population standard deviation with the sample standard deviation. And the
confidence level or estimate interval can be solved by using the formula:

confidence level = ± E or –E≤µ≤ +E

Example 5:

Problem: Given the body temperature, n = 105, ẍ = 98.10, and s = 0.61, for a degree of
confidence 95%, find the:

a. Margin of Error
b. Interval Estimate

50
LYCEUM OF ALABANG

Solution:

a. Margin of Error

The degrees of confidence level is 95%

E = 1.95996 •

E = 1.95996 •

E = 1.95996 • 0.0595

E = 0.117

b. Interval Estimate : µ = 98.10 ± 0.117

Therefore, the interval estimate is the interval 97.983 < µ < 98.217

51
LYCEUM OF ALABANG

( addendum )

Calculating the Confidence Interval

Step 1: Find the number of observations ( n ), calculate their mean ( x ), and standard
deviation ( s ).
Using the example:
n = 40
x = 175
s = 20
Note:
1. We should use the standard deviation of the entire population, but in many cases
we won't know it.
2. We can use the standard deviation for the sample if we have enough observations
(at least n=30).
Step 2: Decide what Confidence Interval we want: 95% or 99% are common choices.
Then find the "Z" value for that Confidence Interval from TABLE 2.

For 95% , the Z value is 1.960 ( 1.95998 rounded off )

Step 3: Use the Z value in this formula for the Confidence Interval

x ±Z s
√n

Where:
x is the mean
Z is the chosen Z-value from the table above
s is the standard deviation
n is the number of observations

And we have:
175 ± 1.960 × 20√40

Which is:
175cm ± 6.20cm

True Mean: is likely to be between 168.8cm and 181.2cm

• The value after the ± is called the margin of error


• The margin of error in this example is 6.20 cm

52
LYCEUM OF ALABANG

CHAPTER TEST

A. Find Z if:

1. α = 0.10

2. α = 0.01
B. Find Z for the value corresponding to significant (confidence) level of 90%.

1. If α = 0.05, find t for a sample (n) of 25 scores.

2. if α = 0.02, find t for a sample (n) of 40 scores.

C. Find the margin of error if the confidence (significant) level is 95%

1. n = 100, ẍ = 80, s = 2.5

2. n =250, ẍ = 300, s = 4.10

D. Solve the estimate interval of the population proportion.

1. n = 200, ẍ = 100, 95% confidence, s = 2.75

2. .n = 800,ẍ = 200, 95% confidence, s= 3.84

E. Solve the following problems:

1. The manager of a commercial bank wants to confirm his belief that the bank has very few
customers with regular savings account. Based on a survey of 150 randomly selected adults,
only 30 of them have regular savings account.For 99% significant (confidence) level, find the
interval estimates of adults with regular savings account.

2. A random sample of n = 75 observations from a quantitative population produced ẍ =29.7


and 𝑠2 = 10.8. Give the best point estimate for the population mean and calculate the margin of
error.

53
LYCEUM OF ALABANG

CHAPTER 5
TESTS OF HYPOTHESIS

Objectives
The learner should be able to :
1. illustrate null and alternative hypotheses,
2. illustrate level of significance and rejection region,
3. formulate the appropriate null and alternative hypotheses on a population mean
and proportion,
4. identify the appropriate form of the test statistic
5. compute for the test statistic value ,and
6. solve problems involving test of hypothesis .

Lesson 5.1 Hypothesis Test

A hypothesis test is a statistical test that is used to determine whether there is enough
evidence in a sample of data to infer that a certain condition is true for the entire population.
A hypothesis test examines two opposing hypotheses about a population :
a. null hypothesis ( Ho )
b. alternative hypothesis. ( Ha )

Null hypothesis
It is the statement being tested. Usually the null hypothesis is a statement of "no effect"
or "no difference".

Alternative hypothesis
It is the statement you want to be able to conclude is true. A common misconception is
that statistical hypothesis tests are designed to select the more likely of two hypotheses.
Instead, a test will remain with the null hypothesis until there is enough evidence (data) to
support the alternative hypothesis.

The general idea of hypothesis testing involves:


1. making an initial assumption.
2. collecting evidence.
3. deciding whether to reject or not reject the initial assumption based on the available
evidence

Example:
Problem: Is normal body temperature really 98.6 oF?
Solution:
Consider the population of many adults. A researcher hypothesized that the average
adult body temperature is lower than the often-advertised 98.6 degrees F. That is, the
researcher wants an answer to the question: "Is the average adult body temperature 98.6
degrees? Or is it lower?" To answer his research question, the researcher starts by assuming
that the average adult body temperature was 98.6 degrees F.

54
LYCEUM OF ALABANG

Then, the researcher went out and tried to find evidence that refutes his initial assumption. In
doing so, he selects a random sample of 130 adults. The average body temperature of the 130
sampled adults is 98.25 degrees.
Then, the researcher uses the data he collected to make a decision about his initial
assumption. It is either likely or unlikely that the researcher would collect the evidence he did
given his initial assumption that the average adult body temperature is 98.6 degrees:

▪ If it is likely, then the researcher does not reject his initial assumption that the average
adult body temperature is 98.6 degrees. There is not enough evidence to do otherwise.

▪ If it is unlikely, then:
➢ either the researcher's initial assumption is correct and he experienced a very
unusual event;
➢ or the researcher's initial assumption is incorrect.

Types of Test

1. Two-tailed test
A test to determine whether a population parameter has changed since the null
hypothesis can be rejected by observing a statistic that falls either the two tails of the sampling
distribution.

2. One-tailed test
It is use if the following conditions satisfy:
1. the sample data from the population that has a parameter less than the hypothesized
value
2. the sample data from the population that has a parameter greater than the
hypothesized value

The table below may help to clearly understand the tests :

Type Null Alternative


Right-tailed H0 : μ0 = μ1 HA : μ0>μ1
Left-tailed H0 : μ0 = μ1 HA : μ0<μ1
Two-tailed H0 : μ0 = μ1 HA : μ0 ≠ μ1

Lesson 5.2 Type I and Type II Errors

We make our decision based on evidence not on 100% guaranteed proof.

55
LYCEUM OF ALABANG

Note:
➢ If we reject the null hypothesis, we do not prove that the alternative
hypothesis is true.
➢ If we do not reject the null hypothesis, we do not prove that the null
hypothesis is true.
We merely state that there is enough evidence to behave one way or the other. This is
always true in statistics, because of this, whatever the decisions; there is always a chance
that we made an error.

Two types of errors in hypothesis testing:

Decision True False


Do not reject OK Type II
null ERROR
Reject null Type I OK
ERROR

• Type I error - when the null hypothesis is rejected even if it is true

• Type II error - when the null hypothesis is not rejected even if it is false

Right-tailed test

Left-tailed Test

56
LYCEUM OF ALABANG

Two-tailed test

Lesson 5.3 Hypothesis Testing Procedure

To analyze the conducted study, the following procedure must be followed:

1. State the hypothesis.


It includes the null hypothesis and the alternative hypothesis.
2. Select the appropriate test statistic and level of significance.
When testing a hypothesis of a proportion, we use the z-statistic or z-test
and the formula :

When testing a hypothesis of a mean, we use the z-statistic or we use the t-


statistic according to the following conditions :
a. If the population standard deviation, σ, is known and either the data is
normally distributed or the sample size n > 30, we use the normal distribution
(z-statistic).
b. When the population standard deviation, σ, is unknown and either the data is
normally distributed or the sample size is greater than 30 (n > 30), we use the t-
distribution (t-statistic

Note: The guideline for choosing the level of significance is as follows:


1. the 0.10 level for political polling
2. the 0.05 level for consumer research projects, and
3. the 0.01 level for quality assurance work

3. State the decision rules.


The decision rules state the conditions under which the null hypothesis will be
accepted or rejected. The critical value for the test-statistic is determined by the
level of significance. The critical value is the value that divides the non-reject region
from the reject region.
4. Compute the appropriate test statistic and make the decision.

When we use the z-statistic, we use the formula :

, if there is only one sample mean, or

57
LYCEUM OF ALABANG

, if there are two sample mean

When we use the t-statistic, we use the formula :


a. if there is only one sample mean

b. if there are two sample means

where: x1 – mean of sample 1


x2 – mean of sample 2
s1 – standard deviation of sample 1
s2 – standard deviation of sample 2
n1 – sample size of sample 1
n2 – sample size of sample 2
µ - population

Compare the computed test statistic with critical value.


• If the computed value is within the rejection region(s) - reject the null
hypothesis
• if not - do not reject the null hypothesis

5. Interpret the decision.


Based on the decision in Step 4, state a conclusion in the context of the
original problem.

58
LYCEUM OF ALABANG

Lesson 5.3.1 Hypothesis Test for a Proportion

Hypothesis test of a proportion varies on the following conditions

1. The sampling method is simple random sampling.


2. Each sample point can result in just two possible outcomes. We call one of these
outcomes a success and the other, a failure.
3. The sample includes at least 10 successes and 10 failures.
4. The population size is at least 20 times as big as the sample size.

The P-value is the probability of observing a sample statistic as extreme as the test
statistic. Since the test statistic is a z-score

where : P - the hypothesized value of population proportion in the null


hypothesis
p - the sample proportion
S - the standard deviation of the sampling distribution

Just in case, the standard deviation is not given the use the formula below to obtain the
value of the standard deviation.

where : n - the sample size

Hypothesis Test: Difference between Proportions

This is a test to determine whether the difference between two proportions is


significant. The test procedure, called the two-proportion z-test, is appropriate when the
following conditions are met:
▪ The sampling method for each population is simple random sampling.
▪ The samples are independent.
▪ Each sample includes at least 10 successes and 10 failures.
▪ Each population is at least 20 times as big as its sample.

This approach consists of four steps:

1. State the null and alternative hypotheses.

59
LYCEUM OF ALABANG

Every hypothesis test requires the analyst to state a null hypothesis and an
alternative hypothesis. The table below shows three sets of hypotheses. Each makes
a statement about the difference, d, between two population proportions, P1 and P2. (In
the table, the symbol ≠ means " not equal to ").

Set Null hypothesis Alternative hypothesis Number of tails


1 P1 - P2 = 0 P1 - P2 ≠ 0 2
2 P1 - P2> 0 P1 - P2< 0 1
3 P1 - P2< 0 P1 - P2> 0 1

The first set of hypotheses (Set 1) is an example of a two-tailed test, since an extreme
value on either side of the sampling distribution would cause a researcher to reject the null
hypothesis. The other two sets of hypotheses (Sets 2 and 3) are one-tailed tests, since an
extreme value on only one side of the sampling distribution would cause a researcher to reject
the null hypothesis.
When the null hypothesis states that there is no difference between the two population
proportions (i.e., d = 0), the null and alternative hypothesis for a two-tailed test are often stated
in the following form.

H0: P1 = P2
Ha: P1 ≠ P2

2. Formulate an analysis plan

The analysis plan describes how to use sample data to accept or reject the null
hypothesis. It should specify the following elements.

▪ Significance level. Often, researchers choose significance levels equal to 0.01, 0.05,
or 0.10; but any value between 0 and 1 can be used.

▪ Test method. Use the two-proportion z-test to determine whether the hypothesized
difference between population proportions differs significantly from the observed
sample difference.

3. Analyze sample data

Using sample data, complete the following computations to find the test statistic and its
associated P-Value.

➢ Pooled sample proportion. Since the null hypothesis states that P1=P2, we use
a pooled sample proportion (p) to compute the standard error of the sampling
distribution.

where: p1 - the sample proportion from population 1


p2 - the sample proportion from population 2
n1 - the size of sample 1
n2 - the size of sample 2

60
LYCEUM OF ALABANG

➢ Standard Error. Compute the standard error (SE) of the sampling distribution
difference between two proportions

where: p - the pooled sample proportion


n1 - the size of sample 1
n2 - the size of sample 2

➢ Test Statistic. The test statistic is a z-score (z) defined by the following
equation.

where: p1 – the proportion from sample 1


p2 - the proportion from sample 2
SE - the standard error of the sampling distribution

➢ P-value. The P-value is the probability of observing a sample statistic as


extreme as the test statistic.

The analysis described above is a two-proportion z-test.

4. Interpret results.
If the sample findings are unlikely, given the null hypothesis, the researcher
rejects the null hypothesis. Typically, this involves comparing the P-value to the
significance level, and rejecting the null hypothesis when the P-value is less than
the significance level.

Example:
Problem : Suppose the Drug Company develops a new drug, designed to prevent colds. The
company states that the drug is equally effective for men and women. To test this
claim, they choose a simple random sample of 100 women and 200 men from a
population of 100,000 volunteers. At the end of the study, 38% of the women caught
a cold; and 51% of the men caught a cold. Based on these findings, can we reject
the company's claim that the drug is equally effective for men and women? Use a
0.05 level of significance.
Solution:

Step 1: State the hypotheses


Null hypothesis: P1 = P2
Alternative hypothesis: P1 ≠ P2

Note: These hypotheses constitute a two-tailed test. The null hypothesis will be rejected if
the proportion from population 1 is too big or if it is too small.

61
LYCEUM OF ALABANG

Step 2: Formulate an analysis plan.

significance level : 0.05


test method: two-proportion z-test.

Step 3: Analyze sample data.


Using those measures, we compute the z-score test statistic (z).

pooled sample proportion : (

P = 0.467

Standard error (SE)

SE = ⁡√ 0.467 ∙ (0.533) ∙ (0.01 + 0.005)

SE = ⁡√ 0.2489 ∙ (0.015)

SE =⁡√0.00373

SE = 0.061

Z score :

Z = - 2.13

Since we have a two-tailed test, the P-value is the probability that the z-score is less
than -2.13 or greater than 2.13. Thus, the P-value = 0.0166 + 0.0166 = 0.0332.

62
LYCEUM OF ALABANG

Note: From Table 3 , when z = - 2.13 , P-value = 0.0166

Step 4: Interpret results.


Since the P-value (0.0332) is less than the significance level (0.05), we
cannot accept the null hypothesis.

Note: If you use this approach on an exam, you may also want to mention why
this approach is appropriate. Specifically, the approach is appropriate because
the sampling method was simple random sampling, the samples were
independent, each population was at least 10 times larger than its sample, and
each sample included at least 10 successes and 10 failures.

Lesson 5.3.2 Hypothesis Test for a Mean

Hypothesis Test of a Mean will be conducted when the following conditions are met:
▪ The sampling method is simple random sampling.
▪ The sampling distribution is normal or nearly normal.

We can say that the sampling distribution will be approximately normally distributed if
any of the following conditions apply :
▪ The population distribution is normal.
▪ The population distribution is symmetric, unimodal, without outliers, and the sample
size is 15 or less.
▪ The population distribution is moderately skewed, unimodal, without outliers, and the
sample size is between 16 and 40.
▪ The sample size is greater than 40, without outliers.

This approach consists of four steps:

➢ State the hypotheses. Every hypothesis test requires the analyst to state a null
hypothesis and an alternative hypothesis. The table below may use.

Set Null hypothesis Alternative hypothesis Tails


1 μ=M μ≠M Two-tailed
2 μ>M μ<M One-tailed
3 μ<M μ>M One-tailed

➢ Formulate an Analysis Plan. The analysis plan describes how to use sample data
to accept or reject the null hypothesis. It should specify the following elements.

Significance level. Often, researchers choose significance levels equal to


0.01, 0.05, or 0.10; but any value between 0 and 1 can also be used.

63
LYCEUM OF ALABANG

Test method. Use the one-sample t-test to determine whether the


hypothesized mean differs significantly from the observed sample mean.

➢ Analyze Sample Data. Using sample data, conduct a one-sample t-test. This
involves:

Standard Error. Compute the standard error(SE) of the sampling distribution.

Where: s - the standard deviation of the sample


N - he population size
n - the sample size

When the population size is much larger (at least 20 times larger) than the
sample size, the standard error can be approximated by:

Degrees of Freedom. The degrees of freedom (DF) are equal to the sample
size (n) minus one.

DF = n - 1.

Test Statistic. The test statistic is a t statistic (t) defined by the following
equation.

t=(x-µ)
SE

where: t - t-statistic value


x – sample mean
μ -the hypothesized population mean in the null hypothesis
SE -the standard error

P-value
P-value is the probability of observing a sample statistic as extreme as
the test statistic.

➢Interpret Results. This involves comparing the P-value to the significance level, and
rejecting the null hypothesis when the P-value is less than the significance level.

Example:
Problem :
An elementary school has 1000 students. The principal of the school thinks that the average
IQ of students at Bon Air is at least 110. To prove her point, she administers an IQ test to 20
randomly selected students. Among the sampled students, the average IQ is 108 with a

64
LYCEUM OF ALABANG

standard deviation of 10. Based on these results, should the principal accept or reject her
original hypothesis? Assume a significance level of 0.01. (Assume that test scores in the
population of engines are normally distributed.) Solution:

Step 1: State the hypotheses. The first step is to state the null hypothesis and an alternative
hypothesis.

Null hypothesis: μ ≥ 110


Alternative hypothesis: μ < 110

Note: The hypotheses constitute a one-tailed test. The null hypothesis will be rejected
if the sample mean is too small.

Step 2: Formulate an analysis plan.

Significance level : 0.01


Test method: one-sample t-test.

Step 3: Analyze sample data.

Standard error (SE)

SE = 2.236

Degrees of freedom (DF)= n - 1


DF = 20 - 1
DF = 19

t - test statistic (t )

t=

t=

t =-0.894

Logic of the Analysis:


Given the alternative hypothesis (μ <110), we want to know whether the observed
sample mean is small enough to cause us to reject the null hypothesis.

65
LYCEUM OF ALABANG

The observed sample mean produced a t - test statistic of -0.894. The P(t<-0.894) =
0.19. This means we would expect to find a sample mean of 108 or smaller in 19 percent of
our samples, if the true population IQ were 110. Thus the P-value in this analysis is 0.19

Step 4: Interpret results. Since the P-value (0.19) is greater than the significance level (0.01),
we cannot reject the null hypothesis.

Hypothesis Test : Difference between Means

This lesson explains how to conduct a hypothesis test for the difference between two
means. The test procedure, called the two-sample t-test, is appropriate when the following
conditions are met:
▪ The sampling method for each sample is simple random sampling.
▪ The samples are independent.
▪ Each population is at least 20 times larger than its respective sample.

Lesson 5.3.3 Z- test

A Z-test is any statistical test for which the distribution of the test statistic under the null
hypothesis can be approximated by a normal distribution. It is used for testing hypothesis
when the:
1. sample standard deviation is known
2. sample size is at least 30

It is also used when there is only one sample in the experiment that is known, and both
standard deviation and the mean of the population are known. Likewise, the two-sample z-test
is used to compare the population means between two groups.

When the data are normally distributed we shall follow the following steps:

1. Formulate the null hypothesis and the alternative hypothesis.


2. Specify the level of significance and decide whether two tailed test, or one tailed
test (right tailed test or left-tailed test) shall be use, and decide the test statistic to
be used, and find the critical value from TABLE 4.
3. Compute the value of the test statistic.
4. Graph computed z value and critical value, and make a decision .
5. State the conclusion.

Z- test Using One Sample Mean

Example 1:
Problem:
A researcher reported that the mean grade of Grade 11 students in statistics was 84%.
A random sample of 100 students showed a mean of 87% with a standard deviation of 4%. Is
there a significant difference between the grades of Grade 11 students? Use α = 0.05.
This problem may be computed in two ways. Its either you are going to use one-
tailed test or two-tailed test.

Solution 1: Using two-tailed test

66
LYCEUM OF ALABANG

Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ = 84
Ha: µ ≠ 84

Step 2: Specify the level of significance and decide whether two-tailed test, or one-
tailed test (right-tailed test or left-tailed test) shall be used, and decide the test
statistic to be used, and find the critical value from TABLE 4.

Level of significance: α = 0.05


Tailed test: two-tailed test
Test statistic: Z-test
Critical value: ±1.96 (taken from Z tabular value/ TABLE 4)

Step3: Compute the value of the test statistic.

z = ( x - µ )√ n
s

µ = 84
x = 87
n = 100
s=4

z = ( 87 – 84 ) √ 100
4
z = 7.5

Step 4: Graph computed z- value and critical value, and make a decision

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

67
LYCEUM OF ALABANG

Since the computed z was located in the rejected region therefore, null hypothesis
is rejected.

Step 5: State the conclusion.


The reported mean score of the grade 11 students at 84% is not true. There is a
significant difference of 3% between the hypothesized and sample mean.

Solution 2: Using one-tailed test

Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ = 84
Ha: µ >84 (since the sample mean is 87)

Step 2: Specify the level of significance and decide whether two-tailed test, or one-
tailed test (right-tailed test or left-tailed test) shall be use, and decide the test
statistic to be used, and find the critical value from TABLE 4.

Level of significance: α = 0.05


Tailed test: one-tailed test ( Right-tailed test )
Test statistic: Z-test
Critical value: + 1.645 (The value is taken from Z tabular value/ table 4, and
the positive value must be used since it conveys a right tailed test.)

Step 3: Compute the value of the test statistic.

z = ( x - µ )√ n
s

µ = 84
x = 87
n = 100
s=4

z = ( 87 – 84 ) √ 100
4
z = 7.5

Step 4: Graph the computed z value and critical value, and make a decision

68
LYCEUM OF ALABANG

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

Since the computed z was located in the rejected region therefore, null hypothesis
is rejected.

Step 5: State the conclusion.


The reported mean score of the grade 11 students at 84% is not true. There is a
significant difference of 3% between hypothesized and sample mean.

Example 2:
Problem :
A supermarket owner believes that the mean of family income of its customers is Php
45,000 per month. 49 customers were randomly selected and asked their family income. The
sample mean was Php 42,200 per month and the standard deviation was Php 2,800. Is there
enough difference to say that the mean family income per month is Php 45,000 per month at
1% significant level?

Solution:

Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ = 45,000

Ha: µ ≠45,000

Step 2: Specify the level of significance and decide whether two-tailed test, or one-
tailed test (right-tailed test or left-tailed test) shall be use, and decide the test
statistic to be used, and find the critical value from table 4.

Level of significance: α = 0.01

Tailed test: two-tailed test

Test statistic: Z-test

Critical value: ±2.575 (The value is taken from Z tabular value/ table 4)

Step 3: Compute the value of the test statistic.

z = ( x - µ )√ n
s

µ = 45,000
x = 42,200
n = 49
s = 2,800

69
LYCEUM OF ALABANG

z = ( 42,200 – 45,000 ) √ 49
2,800
z=7

Step 4: Graph the z value and critical value, and make a decision.

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

Since the computed z is located in the rejected region therefore, null hypothesis is
rejected.

Step 5: State the conclusion.


There is a significant difference between the hypothesized and
sample mean.

Z - Test Using Two Sample Means

Example 3:
Problem :
The average lifetime of 120 Brand X AA batteries and 120 Brand Y AA batteries were
found to be 9.1 hours and 9.6 hours respectively. Suppose the population standard deviations
of lifetimes are 1.9 hours of Brand X and 2.1 for Brand Y batteries, test the hypothesis using α =
0.05.

Solution 1: Using two-tailed test


Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ1 = µ2

Ha: µ1≠ µ2

Step 2: Specify the level of significance and decide whether two-tailed test or one-
tailed test (right-tailed test or left-tailed test) shall be used, and decide the test
statistic to be used, and find the critical value from Table 4.

Level of significance: α = 0.05

70
LYCEUM OF ALABANG

Tailed test: two-tailed test


Test statistic: Z-test
Critical value: ±1.96 (taken from Z tabular value/ Table 4)

Step 3: Compute the value of the test statistic.

, is used if there were two sample means

µ1 = 9.1
µ2 = 9.6
n1 = 120
n2 = 120
s1 = 1.9
s2 = 2.1

z = 9.1 – 9.6
(1.9)2 + (2.1)2
120 120
= - 0.5
√ 0.03 + 0.037

z = -1.93

Step 4: Graph the computed z value and critical value, and make a decision

critical value = - 1.96 critical value = + 1.96

z = - 1.93

71
LYCEUM OF ALABANG

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

Since the computed z is located in the accepted region therefore, null hypothesis
is accepted.

Step 5: State the conclusion.


There is no significant difference between hypothesized and sample mean.

Solution 2: Using one-tailed test

Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ1 = µ2 ; Ha: µ1< µ2

Step 2: Specify the level of significance and decide whether two-tailed test or one-
tailed test (right-tailed test or left-tailed test) shall be use, and decide the test
statistic to be used, and find the critical value from Table 4.

Level of significance: α = 0.05


Tailed test: one-tailed test ( Left-tailed test )
Test statistic: Z-test
Critical value: -1.645 (taken from Z tabular value/ table 4 and we
use negative since it was a left-tailed test)
Step 3: Compute the value of the test statistic.

, is used if there were two sample means

µ1 = 9.1
µ2 = 9.6
n1 = 120
n2 = 120
s1 = 1.9
s2 = 2.1

z = 9.1 – 9.6
(1.9)2 + (2.1)2
120 120
= - 0.5
√ 0.03 + 0.037

z = -1.93

72
LYCEUM OF ALABANG

Step 4: Graph the computed z value and critical value, and make a decision

critical value = - 1.96

z = - 1.93

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

Since the computed z is located in the accepted region therefore, null


hypothesis is accepted.

Step 5: State the conclusion.


There is no significant difference between the hypothesized and the sample
mean.

Lesson 5.3.4 T-test

A t-test is an analysis of two population means through the use of statistical


examination; a t-test with two samples is commonly used with small sample sizes, testing
the difference between the samples when the variances of two normal distributions are not
known.
A t-test looks at the t-statistic, the t-distribution and degrees of freedom to determine
the probability of difference between populations; the test statistic in the test is known as the
t-statistic. We may use t-test if

• the population variance is unknown ,and therefore has to be estimated from


the sample itself
• the sample size is less than 30, (n < 30)

73
LYCEUM OF ALABANG

To compute t-test, we shall follow the following steps:

1. Formulate the null hypothesis and the alternative hypothesis.


2. Specify the level of significance and decide whether two tailed test, or one tailed
test (right-tailed test or left-tailed test) shall be used; decide the test statistic to
be used; find the degrees of freedom; and find the critical value from TABLE
5.

Degrees of freedom = n -1 (for one sample mean)

Degrees of freedom = (n1 + n2) -2 (for two sample mean)

3. Compute the value of the test statistic.


4. Graph computed t value and critical value, and make a decision
5. State the conclusion.

T-test Using One Sample Mean

Example 1:
Problem:
A chemical company alleged that the average weight of its bag of chemical is 50 kgs.
With a standard deviation of 0.9 kg., a sample of 25 bags was taken and revealed a mean
weight of 48.1 kgs. If the significant level is 1%, is there a significant difference between the
weights of the chemical bags?

Solution 1: Using one-tailed test

Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ = 50 ; Ha: µ <50

Step 2: Specify the level of significance and decide whether two tailed test, or one
tailed test (right-tailed test or left-tailed test) shall be used, and decide the test
statistic to be used Find the degrees of freedom, and find the critical value
from Table 5 .

Level of significance: α = 0.01


Tailed test: one-tailed test left-tailed test
Test statistic: t-test
Degrees of freedom = (n -1) (for one sample mean)
= (29 – 1)
= 28

Critical value: - 2.467 (taken from Table 5)

74
LYCEUM OF ALABANG

Step 3: Compute the value of the test statistic.

,is used if there is only one sample mean

x = 48.1
µ = 50
n = 25
s = 0.9

t = ( 48.1 – 50 )√ 25
0.9
t= -10.56

Step 4: Graph the computed t value and critical value, and make a decision

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

Since the computed t is located in the rejected region therefore, null hypothesis is
rejected.

Step 5: State the conclusion.


There is a significant difference between the hypothesized and
sample mean.

Solution 2: Using two-tailed test

Step 1: Formulate the null hypothesis and the alternative hypothesis.

Ho: µ = 50 ; Ha: µ ≠50

75
LYCEUM OF ALABANG

Step 2: Specify the level of significance and decide whether two tailed test, or one tailed
test (right-tailed test or left-tailed test) shall be use, and decide the test statistic
to be used, find the degrees of freedom, and find the critical value from table 5

Level of significance: α = 0.01


Tailed test: two-tailed test
Test statistic: t-test
Degrees of freedom = (n -1) ( for one sample mean)
= (29 – 1)
= 28

Critical value: ± 2.763 (taken from Table 5)

Step 3: Compute the value of the test statistic.

,is used if there is only one sample mean

x = 48.1
µ = 50
n = 25
s = 0.9

t = ( 48.1 – 50 )√ 25
0.9

t = -10.56

Step 4: Graph the computed t value and critical value, and make a decision

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

76
LYCEUM OF ALABANG

Since the computed t is located in the rejected region therefore, null hypothesis is
rejected.

Step 5: State the conclusion.


There is a significant difference between the hypothesized and the
sample mean.

T-test Using Two Sample Mean

Example 2:
Problem :
A study was conducted to examine the relationship between the attitudes towards
mathematics and success at college level mathematics. Twenty-two man and twenty women
were identified as being at high risk of failure. The students were asked to responds to a series
of questions, and their answers were used to obtain a math anxiety score. Summary values
appear in the table below. Test the hypothesis using a 0.05 level of significance.

Gender n x s
Male 22 40.8 9.3
Female 20 37.5 10.2

Step 1:Formulate the null hypothesis and the alternative hypothesis.

Ho: µ1 = µ2 ; Ha: µ1> µ2

Step 2: Specify the level of significance and decide whether two tailed test, or one
tailed test (right-tailed test or left-tailed test) shall be use, and decide the test
statistic to be used, find the degrees of freedom, and find the critical value
from table 5

Level of significance: α = 0.05

Tailed test: one-tailed test (Right-tailed test )

Test statistic: t-test

Degrees of freedom = (n1 + n2) – 2 (for two sample mean)

= (22 + 20) – 2

= 42 – 2

= 40

Critical value: +1.684 (taken from Table 5 and the positive value
is used since it is a right-tailed test)

77
LYCEUM OF ALABANG

Step 3: Compute the value of the test statistic.

* Formula for two sample means

x1 = 40.8
x2 = 37.5
n1 = 22
n2 = 20
s1 = 9.3
s2 = 10.2

t= 40.8 - 37.5

( 22 – 1 )( 9.3 )2 + ( 20 – 1 ) (10.2 )2 ( 22 + 20 )
( 22 + 20 – 2 ) ( 22 )( 20 )

t= 1.1

Step 4: Graph computed t value and critical value, and make a decision .

critical value = + 1.684

t = 1.1

78
LYCEUM OF ALABANG

Since the computed t is located in the accepted region therefore, null hypothesis
is accepted.

Step 5: State the conclusion.


There is no significant difference between the hypothesized and the
sample mean.

Solution 2: Using two-tailed test

Step 1: Formulate the null hypothesis and the alternative hypothesis.


Ho: µ1 = µ2
Ha: µ1 ≠ µ2

Step 2:Specify the level of significance and decide whether two tailed test, or one
tailed test (right-tailed test or left-tailed test) shall be use, and decide the test
statistic to be used, find the degrees of freedom, and find the critical value
from table 5

Level of significance: α = 0.05


Tailed test: two-tailed test
Test statistic: t-test

Degrees of freedom = (n1 + n2) – 2 (for two sample mean)

= (22 + 20) – 2

= 42 – 2
= 40

Critical value: ± 2.021 (taken from Table 5 ; one is positive and the other
is negative since it is two-tailed )

Step 3: Compute the value of the test statistic.

* Formula for two sample means

x1 = 40.8
x2 = 37.5
n1 = 22
n2 = 20
s1 = 9.3
s2 = 10.2

79
LYCEUM OF ALABANG

t= 40.8 - 37.5

( 22 – 1 )( 9.3 )2 + ( 20 – 1 ) (10.2 )2 ( 22 + 20 )
( 22 + 20 – 2 ) ( 22 )( 20 )

t = 1.1

Sep 4: Graph the computed t value and critical value, and make a decision .

critical value = - 2.021 critical value = + 2.021

t = 1.1

Note: The shaded part is the rejected region and the part that has no shade is the
accepted region.

Since the computed t was located in the accepted region therefore, null hypothesis
is accepted.

Step 5: State the conclusion.


There is no significant difference between the hypothesized and the
sample mean.

80
LYCEUM OF ALABANG

CHAPTER TEST

Apply the appropriate test hypothesis steps and procedures for the following research
problems.

1. A supermarket owner believes that the mean income of its costumers is Php50,000 per
month. One-hundred costumers are randomly selected and asked of their monthly income.
The sample mean is Php48,500 per month and standard deviation is Php3,200.Is their
sufficient evidence to indicate that the mean income of the costumers of the supermarket is
Php50,000per month? Use α= 0.05.

81
LYCEUM OF ALABANG

2. It is reported that the average monthly salary of accounting graduates in the accounting
field is Php18,000. A dean of a certain university conducted a survey of 60 accounting
graduates and found their average salary at Php20,500 per month with standard deviation
of Php1,500 per month. Using α = 0.05, is there a significant difference between the
accounting graduates salaries?

82
LYCEUM OF ALABANG

3. A prospective MBA student was made to estimate the difference in the monthly salaries of
professors in private and state colleges. He claimed that the difference in the starting
salaries of MBA graduates of the two colleges were relevant. An independent study of a
simple random samples of the most recent MBA graduates of both colleges revealed the
following statistics:

Colleges Mean Standard Deviation Sample Size


Private 35,000php 1,800php 53
State 32,000php 1650php 49

83
LYCEUM OF ALABANG

4. A distributor claims that the average strength of the brand X rope exceeds the average
strength of the brand Y rope. To test its claim, 25 pieces of each brand are tested under
similar conditions. Brand X had an average strength of 90.7 kilograms with a standard
deviation of 7.82 kilograms, while brand Y have an average strength of 93.7 kilograms with
a standard deviation of 6.75 kilograms. Test whether the claim of the distributor is correct
at 5% level of significance.

84
LYCEUM OF ALABANG

5. Job satisfaction as a function of a work schedule was investigated in different factories. In


the first factory, the employees are on fixed shift system while in the second factory, the
workers have rotating shift system. Using the data in the table below, determine if there is
a significant difference in job satisfaction between the two groups of workers. Use α = 0.01.

Shift Schedule Mean Standard deviation Sample size


Fixed 7.43 2.42 23
Rotating 6.18 2.15 29

85
LYCEUM OF ALABANG

CHAPTER 6

CORRELATION AND REGRESSION ANALYSIS

Objectives
The learner should be able to
1. construct a scatter plot,
2. describe shape, trend, and variation based on scatter plot,
3. estimate strength of association between the variables based on scatter plot,
4. calculate the Pearson’s sample correlation coefficient,
5. solve problem involving correlation analysis,
6. identify the independent and dependent variables,
7. draw the best-fit line on a scatter plot,
8. calculate the slope and y-intercept of the regression line,
9. predict the value of the dependent variable given the value of the independent
variable, and
10. solve problems involving regression analysis.

Correlation analysis is used to quantify the association between two continuous


variables (e.g., between an independent and a dependent variable or between two
independent variables).
Regression analysis is a related technique to assess the relationship between an
outcome variable and one or more risk factors or confounding variables.
The outcome variable is also called the response or dependent variable and the risk
factors and confounders are called the predictors, or explanatory or independent variables.
In regression analysis, the dependent variable is denoted "y" and the independent
variables are denoted by "x".

Lesson 6.1 Types of Variables

Ambiguities in Classifying a Type of Variable

In some cases, the measurement scale for data is ordinal, but the variable is treated as
continuous. For example, a Likert scale that contains five values - strongly agree, agree,
neither agree nor disagree, disagree, and strongly disagree - is ordinal. However, where a
Likert scale contains seven or more value - strongly agree, moderately agree, agree, neither
agree nor disagree, disagree, moderately disagree, and strongly disagree - the underlying
scale is sometimes treated as continuous (although where you should do this is a cause of
great dispute).
It is worth noting that how we categorize variables is somewhat of a choice. Whilst we
categorized gender as a dichotomous variable (you are either male or female), social scientists
may disagree with this, arguing that gender is a more complex variable involving more than
two distinctions, but also including measurement levels like gender queer, intersex and
transgender. At the same time, some researchers would argue that a Likert scale, even with
seven values, should never be treated as a continuous variable.

86
LYCEUM OF ALABANG

Dependent and Independent Variables

Independent Variable
Sometimes called an experimental or predictor, it is a variable that is being
manipulated in an experiment in order to observe the effect on a dependent variable.

Dependent Variable
It is sometimes called an outcome variable. The dependent variable is simply a
variable that is dependent on an independent variable(s).

All experiments examine some kind of variable(s). A variable is not only something that
we measure, but also something that we can manipulate and something we can control for.

The dependent variable is just like the name sounds; it depends upon some factor
that you, the researcher, controls. For example:

• How well you perform in a race depends on your training.


• How much you weigh depends on your diet.
• How much you earn depends upon the number of hours you work.

Whatever event you are expecting to change is always the dependent variable. In
the first example above race performance is the variable you would expect to change if you
changed your training, so that’s the dependent variable. In the second example, the dependent
variable is weight and in the third example the dependent variable is the amount earned

Example:
Imagine that a tutor asks 100 students to complete a math test. The tutor wants to
know why some students perform better than others. Whilst the tutor does not know the
answer to this, she thinks that it might be because of two reasons: (1) some students spend
more time revising for their test; and (2) some students are naturally more intelligent than
others. As such, the tutor decides to investigate the effect of revision time and intelligence on
the test performance of the 100 students. The dependent and independent variables for the
study are:
Dependent Variable: Test Mark (measured from 0 to 100)
Independent Variables: Revision time (measured in hours) Intelligence (measured
using IQ score)

Categorical and Continuous Variables

Categorical variables are also known as discrete or qualitative variables. Categorical


variables can be further categorized as nominal, ordinal or dichotomous.

• Nominal variables are variables that have two or more categories, but which do not
have an intrinsic order. For example, a real estate agent could classify their types of
property into distinct categories such as houses, condos, co-ops or bungalows. So

87
LYCEUM OF ALABANG

"type of property" is a nominal variable with 4 categories called houses, condos, co-ops
and bungalows. Of note, the different categories of a nominal variable can also be
referred to as groups or levels of the nominal variable. Another example of a nominal
variable would be classifying where people live in the USA by state. In this case there
will be many more levels of the nominal variable (50 in fact).
• Dichotomous variables are nominal variables which have only two categories or levels.
For example, if we were looking at gender, we would most probably categorize
somebody as either "male" or "female". This is an example of a dichotomous variable
(and also a nominal variable). Another example might be if we asked a person if they
owned a mobile phone. Here, we may categorise mobile phone ownership as either
"Yes" or "No". In the real estate agent example, if type of property had been classified
as either residential or commercial then "type of property" would be a dichotomous
variable.
• Ordinal variables are variables that have two or more categories just like nominal
variables only the categories can also be ordered or ranked. So if you asked someone
if they liked the policies of the Democratic Party and they could answer either "Not very
much", "They are OK" or "Yes, a lot" then you have an ordinal variable. Why? Because
you have 3 categories, namely "Not very much", "They are OK" and "Yes, a lot" and
you can rank them from the most positive (Yes, a lot), to the middle response (They
are OK), to the least positive (Not very much). However, whilst we can rank the levels,
we cannot place a "value" to them; we cannot say that "They are OK" is twice as
positive as "Not very much" for example.

Continuous variables are also known as quantitative variables. Continuous variables can
be further categorized as either interval or ratio variables.

• Interval variables are variables for which their central characteristic is that they can be
measured along a continuum and they have a numerical value (for example,
temperature measured in degrees Celsius or Fahrenheit). So the difference between
20C and 30C is the same as 30C to 40C. However, temperature measured in degrees
Celsius or Fahrenheit is NOT a ratio variable.
• Ratio variables are interval variables, but with the added condition that 0 (zero) of the
measurement indicates that there is none of that variable. So, temperature measured
in degrees Celsius or Fahrenheit is not a ratio variable because 0C does not mean
there is no temperature. However, temperature measured in Kelvin is a ratio variable
as 0 Kelvin (often called absolute zero) indicates that there is no temperature
whatsoever. Other examples of ratio variables include height, mass, distance and
many more. The name "ratio" reflects the fact that you can use the ratio of
measurements. So, for example, a distance of ten meters is twice the distance of 5
meters.

Experimental and Non-Experimental Research

Experimental research: In experimental research, the aim is to manipulate an


independent variable(s) and then examine the effect that this change has on a dependent
variable(s). Since it is possible to manipulate the independent variable(s), experimental
research has the advantage of enabling a researcher to identify a cause and effect between
variables. For example, take 100 students completing a math exam where the dependent

88
LYCEUM OF ALABANG

variable is the exam mark (measured from 0 to 100), and the independent variables are
revision time (measured in hours) and intelligence (measured using IQ score). Here, it would be
possible to use an experimental design and manipulate the revision time of the students. The
tutor could divide the students into two groups, each made up of 50 students. In "group one",
the tutor could ask the students not to do any revision. Alternately, "group two" could be asked
to do 20 hours of revision in the two weeks prior to the test. The tutor could then compare the
marks that the students achieved.

Non-experimental research: In non-experimental research, the researcher does not


manipulate the independent variable(s). This is not to say that it is impossible to do so, but it will
either be impractical or unethical to do so. For example, a researcher may be interested in the
effect of illegal, recreational drug use (the independent variable(s)) on certain types of behavior
(the dependent variable(s)). However, whi possible, it would be unethical to ask individuals to
take illegal drugs in order to study what effect this had on certain behaviors. As such, a
researcher could ask both drug and non-drug users to complete a questionnaire that had been
constructed to indicate the extent to which they exhibited certain behaviors. While it is not
possible to identify the cause and effect between the variables, we can still examine the
association or relationship between them. In addition to understanding the difference between
dependent and independent variables, and experimental and non-experimental research, it is
also important to understand the different characteristics amongst variables.

Lesson 6.2 Nature of Bivariate Analysis

Bivariate analysis means the analysis of bivariate data. It is one of the simplest forms
of statistical analysis used to determine if there is a relationship between two sets of values. It
usually involves the variables X and Y.

• Univariate analysis is the analysis of one (“uni”) variable.


• Bivariate analysis s the analysis of exactly two variables.
• Multivariate analysis is the analysis of more than two variables.

The results from bivariate analysis can be stored in a two-column data table.

Example:

You might want to find out the relationship between the age of the students and their
academic achievement. The age would be your independent variable, X and the academic
achievement would be your dependent variable, Y.

Student Age Academic Achievement


1 15 85
2 16 86
3 14 89
4 15 84
5 18 79

89
LYCEUM OF ALABANG

Lesson 6.3 Scatter plot

Scatter Plot
It is a type of plot or mathematical diagram using Cartesian coordinate to display values
for typically two variables for a set of data. If the points are color-coded, one additional variable
can be displayed. The data is displayed as a collection of points, each having the value of one
variable determining the position on the horizontal axis and the value of the other variable
determining the position on the vertical axis.

A scatter plot can be used either when one continuous variable that is under the control
of the experimenter and the other depends on it or when both continuous variables are
independent. If a parameter exists that is systematically incremented and/or decremented by
the other, it is called the control parameter or independent variable and is customarily plotted
along the horizontal axis. The measured or dependent variable is customarily plotted along the
vertical axis.
A scatter plot can suggest various kinds of correlations between variables with a
certain confidence interval.

Example:
Plotting Weight vs. Height. Weight would be on y axis and height would be on the x
axis. Correlations may be positive (rising), negative (falling), or null (uncorrelated).

Pattern of dots

We can consider that there is a:

Positive correlation if the pattern of dots slopes from lower left to upper right.

90
LYCEUM OF ALABANG

Negative correlation if the pattern of dots slopes from upper left to lower right.

No correlation if the pattern of dots slopes is indefinite.

Lesson 6.4 Best-Fit Line

A line of best fit (or "trend" line) is a straight line that best represents the data on a
scatter plot. This line may pass through some of the points, none of the points, or all of the
points. A line of best fit can be drawn in order to study the relationship between the variables.
An equation for the correlation between the variables can be determined by established best-fit
procedures. For a linear correlation, the best-fit procedure is known as linear regression and is
guaranteed to generate a correct solution in a finite time. No universal best-fit procedure is
guaranteed to generate a correct solution for arbitrary relationships. A scatter plot is also very
useful when we wish to see how two comparable data sets agree with each other. In this case,
an identity line, i.e., a y =x line, or an 1:1 line, is often drawn as a reference. The more the
two data sets agree, the more the scatters tend to concentrate in the vicinity of the identity line;
if the two data sets are numerically identical, the scatters fall on the identity line exactly.

91
LYCEUM OF ALABANG

Lesson 6.5 Pearson’s Correlation Coefficient

Correlation between sets of data is a measure of how well are they related. The most
common measure of correlation in stats is the Pearson Correlation. The full name is the
Pearson Product Moment Correlation or PPMC. It shows the linear relationship between
two sets of data. In simple terms, it answers the question; Can I draw a line graph to
represent the data? Two letters are used to represent the Pearson correlation: Greek letter
rho (ρ) for a population and the letter “r”. It tells you whether there is a relationship between
the variables. To compute the value of Pearson Correlation we have the formula:

Formula 1:

where: r - the Pearson correlation coefficient

x - the independent variable

y - the dependent variable

Formula 2:

r= Σ(x–x)(y–y)

Σ ( x – x )2 Σ ( y – y )2

where: x – mean of x-variables


y – mean of y-variables

The results will be between -1 and 1. You will rarely see 0, -1 or 1 as a result. You’ll get
a number somewhere in between those values. The closer the value of r gets to zero, the
greater the variation the data points are around the line of best fit. To interpret the obtained
results the table below may use.

Possible result Interpretation


0.5 to 1.0 High positive correlation
0.3 to 0.5 Medium positive correlation
0.01 to 0.3 Low positive correlation
0.5 to 1.0 High negative correlation
0.3 to 0.5 Medium negative correlation
0.01 to 0.3 Low negative correlation

92
LYCEUM OF ALABANG

Example 1:
Problem :
Researchers want to know if there is a significant relationship between the ages of the person
to their glucose level. They use six (6) persons as their samples and obtained the data below.

Samples Age (s) Glucose Level


1 40 98
2 25 59
3 36 83
4 45 70
5 50 90
6 61 85

Solution :

Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.

Glucose
Samples Age (x) Level (y) xy x2 y2

1 40 98
2 25 59
3 36 83
4 45 70
5 50 90
6 61 85

Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 40 × 98 =
3920.

Glucose
Samples Age (x) Level (y) xy x2 y2

1 40 98 3920
2 25 59 1475
3 36 83 2988
4 45 70 3150
5 50 90 4500
6 61 85 5185

93
LYCEUM OF ALABANG

Step 3: Take the square of the numbers in the x column, and put the result in the x2 column.

Glucose
Samples Age (x) Level (y) xy x2 y2

1 40 98 3920 1600
2 25 59 1475 625
3 36 83 2988 1296
4 45 70 3150 2025
5 50 90 4500 2500
6 61 85 5185 3721

Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.

Glucose
Samples Age (x) Level (y) xy x2 y2

1 40 98 3920 1600 9604


2 25 59 1475 625 3481
3 36 83 2988 1296 6889
4 45 70 3150 2025 4900
5 50 90 4500 2500 8100
6 61 85 5185 3721 7225

Step 5: Add up all of the numbers in the columns and put the result at the bottom column. The
Greek letter sigma (Σ) is a short way of saying “sum of” or “summation of”.

Glucose
Samples Age (x) Level (y) xy x2 y2

1 40 98 3920 1600 9604


2 25 59 1475 625 3481
3 36 83 2988 1296 6889
4 45 70 3150 2025 4900
5 50 90 4500 2500 8100
6 61 85 5185 3721 7225

∑ 257 485 21218 11767 40199

94
LYCEUM OF ALABANG

Step 6: Substitute the values obtained to the formula and compute.

The result is 0.5108, which means the variables have a High positive correlation.

Example 2:
Problem :
Calculate the Pearson correlation coefficient of the obtained scores by 5 students in
algebra and trigonometry as given below:

Algebra 15 16 12 10 8
Trigonometry 18 11 10 20 17

Solution:
Complete the table by following steps 1 to 5.
x Y xy x2 y2
15 18 270 225 324
16 11 176 256 121
12 10 120 144 100
10 20 200 100 400
8 17 136 64 289
∑x = 61 ∑y = 76 ∑xy = 902 ∑x = 789
2
∑y = 1234
2

95
LYCEUM OF ALABANG

Step 6: Substitute the values obtained to the formula and compute.

The result is – 0.4241, which means the variables have a Medium negative correlation

96
LYCEUM OF ALABANG

Activity:

1. Evaluate Pearson correlation coefficient of the following values for x and y:

x 5 6 4 2
y 7 3 9 8

2. The scores of 6 pupils in two subjects : Physics and Chemistry are given below..
Calculate the coefficient of correlation.

Score Pupil A Pupil B Pupil C Pupil D Pupil E Pupil F

Chemistry 45 53 67 40 35 50

Physics 68 76 70 64 54 66

97
LYCEUM OF ALABANG

Lesson 6.6 Regression

Simple regression is used to examine the relationship between one dependent and one
independent variable. After performing an analysis, the regression statistics can be used to
predict the dependent variable when independent variable is known. Regression goes beyond
correlation by adding prediction capabilities.
People use regression on an intuitive level everyday, such as :
• in business, a well-dressed man is thought to be financially successful;
• a mother knows that more sugar in her children’s diet results in higher energy levels; and
• the ease of waking up in the morning often depends on how late you went to bed the
night before.

The regression line ( known as the least squares line ) is a plot of the expected value of
the dependent variable for all values of the independent variable. Technically, it is the line that
minimizes the squared residuals. The regression line is the one that best fits the data on a
scatter plot.
Using the regression equation , the dependent variable maybe predicted from the
independent variable. The slope of the regression line ( b ) is defined as the rise divided by
the run. The y-intercept ( a ) is the point on the y-axis where the regression line would intercept
the y-axis. The slope and y-intercept are incorporated into the regression equation. The
intercept is usually called the constant , and the slope is referred to as the coefficient. Since the
regression model is usually not a perfect predictor, there is also an error term in the equation.
In the regression equation, y is always the dependent variable and x is always the
independent variable. Here are three equivalent ways to mathematically describe a linear
regression model :
y = intercept + ( slope ± x ) + error

y = constant + ( coefficient ± x ) + error

y = a + bx + e

The significance of the slope of the regression line is determined from the t-statistic. It
is the probability that the observed correlation coefficient occurred by chance if the true
correlation is zero. Some researchers prefer to report the F-ratio instead of the t-statistic. The F-
ratio is equal to the t-statistic squared.
The t-statistic for the significance of the slope is essentially a test to determine if the
regression model ( equation ) is usable. If the slope is significantly different than zero, then we
can use the regression model to predict the dependent variable for any value of the
independent variable.

Slope ( m ) Formula: m = rise


run

m = Δy
Δx

m = y2 – y1
x2 – x1

98
LYCEUM OF ALABANG

Forms of Linear Equation

1. Slope-Intercept form: y = mx + b ( m – slope ; b – y-intercept )

2. Point – Slope form: ( y –y1 ) = m ( x – x1 )

3. Standard form: Ax + By = C ( A , B , and C are constants )

4. General form: Ax + By + C = 0

5. Intercept form: x + y =1 ( a is the x-intercept and b is the


a b y-intercept. )

6.7 Slope and Intercept of the Regression Line

The slope indicates the steepness of a line and the intercept indicates the location
where it intersects an axis. The slope and the intercept define the linear relationship between
two variables, and can be used to estimate an average rate of change. The greater the
magnitude of the slope, the steeper the line and the greater the rate of change.
By examining the equation of a line, you quickly can discern its slope and y-intercept
(where the line crosses the y-axis).

The slope is positive . When x increases by 2, y increases by 1. The y-intercept is 2.

99
LYCEUM OF ALABANG

y = - 3x + 3
4

The slope is negative 3/4. When x increases by 4, y decreases by 3. The y-intercept is 3.

The slope is 0. When x increases by 1, y neither increases nor decreases. The y-


intercept is 2.

Usually, this relationship can be represented by the equation y = b0 + b1x, where b0 is


the y intercept and b1 is the slope.

100
LYCEUM OF ALABANG

Example :
Problem:
A company determines that job performance for employees in a production department
can be predicted using the regression model y = 130 + 4.3x, where x is the hours of in-house
training they received (from 0 to 20) and y is their score on a job skills test. The value of the y-
intercept (130) indicates the average job skill score for an employee with no training. The value
of the slope (4.3) indicates that for each hour of training, the job skill score increases, on
average, by 4.3 points.

Lesson 6.8 Regression analysis

Regression analysis is a statistical process for estimating the relationships among


variables. It includes many techniques for modelling and analyzing several variables, when the
focus is on the relationship between a dependent variable and one or more independent
variables. More specifically, regression analysis helps one understand how the typical value
of the dependent variable changes when any one of the independent variables is varied, while
the other independent variables are held fixed. Regression Analysis estimates the conditional
expectation of the dependent variable given the independent variables – that is, the average
value of the dependent variable when the independent variables are fixed. In all cases, the
estimation target is a function of the independent variables called the regression function

Activity:

Materials: Graphing paper, Pencil, spaghetti strand

Can we predict the number of total calories based upon the total fat grams?

a. Predict the total calories based upon 22 grams of fat


b. Predict the total calories based upon 18 grams of fat
c. Predict the total calories based upon 26 grams of fat
d. Predict the total calories based upon 33 grams of fat
e. Predict the total calories based upon 7 grams of fat

Sandwich Total Fat (g) Total Calories


Hamburger 9 260
Cheeseburger 13 320
Quarter Pounder 21 420
Quarter Pounder with Cheese 30 530
Big Mac 31 560
Arch Sandwich Special 31 550
Arch Special with Bacon 34 590
Crispy Chicken 25 500
Fish Fillet 28 560
Grilled Chicken 20 440
Grilled Chicken Light 5 300

101
LYCEUM OF ALABANG

Solution:
1. Prepare a scatter plot of the data on graph paper.

2. Using a strand of spaghetti, position the spaghetti so that the plotted points are as close
to the strand as possible.
3. Find two points that you think will be on the "best-fit" line.
4. We are choosing the points (9, 260) and (30, 530). ( You may choose different . )
5. Calculate the slope of the line through your two points (rounded to three decimal places).

Note: The formula of the slope is

m = y2 – y1
x2 – x1

6. Write the equation of the line using the Slope-Point form.


y – y1= m(x – x1)

y – 260 = 12.857(x -9)


y = 12.857(x – 9) + 260

7. This equation can now be used to predict information that was not plotted in the scatter
plot.
Question a: Predict the total calories based upon 22 grams of fat.

y = 12.857 (x – 9) + 260
y = 12.857 (22 – 9) + 60
y = 12.857 (13) + 260
y = 427.141calories

102
LYCEUM OF ALABANG

Question b: Predict the total calories based upon 18 grams of fat.

y = 12.857(x – 9) + 260
y = 12.857 (18 – 9) +260
y = 12.857 (9) +260
y = 115.713 + 260
y = 375.713 calories

Question c: Predict the total calories based upon 26 grams of fat.

y = 12.857(x – 9) + 260
y = 12.857 (26 – 9) +260
y = 12.857 (17) +260
y = 218.569 + 260
y = 478.569 calories

Question d: Predict the total calories based upon 33 grams of fat.

Question e: Predict the total calories based upon 7 grams of fat.

103
LYCEUM OF ALABANG

CHAPTER TEST

Problem solving.

1. To interpret the relationship between years of education and salary potential, 5 persons
were surveyed. The results obtained on their number of years of higher education (college
degree and higher)and their monthly salaries are shown below. Compute the Pearson’s
Product Moment Coefficient Correlation and interpret the relationship between the variables.

Employee Salary(PHP) Years of Higher Education


A 21,400 4
B 15,300 3
C 27,400 5
D 45,000 8
E 26,600 5

104
LYCEUM OF ALABANG

2. A financial analyst believes that the interest rate on bonds is inversely related to the interest
rate of loans. Hence, bonds perform when the lending rate are down and vice versa. The
results of the observation are shown in the table below. Find the slope and y-intercept on the
data and predict the interest rate bond (%) when the Interest rate loan is
a. 7
b. 11
c. 12
Interest Rate on Loan (%) Interest Rate on Bond (%)
10 6
5 9
8 7
6 8
8 6

105
LYCEUM OF ALABANG

Tables

Table 1: Areas Under Normal Curve

106
LYCEUM OF ALABANG

Table 2: Level of Confidence

107
LYCEUM OF ALABANG

Table 3: P Value Table

108
LYCEUM OF ALABANG

Table 4: Tabular value of Z at indicated levels of significance (∞)


Test/∞ 0.005 0.01 0.05 0.1
One-tailed ±2.58 ±2.33 ±1.645 ±1.28
Two-tailed ±2.81 ±2.575 ±1.96 ±1.645

Table 5: Critical values if t distribution

109
LYCEUM OF ALABANG

110
LYCEUM OF ALABANG

Name: ____________________________________ Score____________


Grade / Section : ____________________________ Date : ___________

Chapter 1 : Exercise

I. True or False. Write C if the statement is true and W if it is false.


______1. A sample is only a part of a population.
______2. The sum of probabilities could exceed 1.
______3. The volume of a liquid is an example of a continuous variable.
______4. Discrete variables are variables that are measurable.
______5. Standard deviation is the square root of the variance.
II. Identify which of the following are discrete or continuous variables. Write D for discrete
and C for continuous.
______1. number of printing mistakes in a book
______2. number of siblings of an individual
______3. age of a person
______4. profit earned by the company
______5. distance travelled by a car
III. In a coin toss experiment, what are the number of coins been tossed together if there are:
______1. 128 possible outcomes
______2. 8 possible outcomes
______3. 256 possible outcomes
Tear Here

______4. 16 possible outcomes


______5. 1024 possible outcomes
IV. Three coins are tossed simultaneously. Find the probability of :
______1. at least 2 tails
______2. no tail
______3. 1 head
______4. 1 tail and 2 heads
______5. 2 heads and 2 tails
V. The number of adults living in homes on a randomly selected barangay in Tunasan is
described by the following probability distribution.

Number of adults Probability


x Pr ( x )
1 0.25
2 0.5
3 0.15
4 or more ?
Answer the following
________1. What is the probability that 4 or more adults reside at a randomly selected home?
________2. What is the mean of the probability ?
________3. What is the variance ?
________4. What is the standard deviation ?
( Use the back of this page for the solution )

111
LYCEUM OF ALABANG

Solution :

112
LYCEUM OF ALABANG

Name: ____________________________________ Score____________


Grade / Section : ____________________________ Date : ___________

Chapter 2 : Exercise

I. True or False. Write C if the statement is true and W if it is false.


______1. The area under normal curve is always positive.
______2. The shape of a normal curve distribution depends on the values of the mean and
the variance.
______3. Values of the areas are added if the z scores are located on the same side of the
distribution..
______4. In Pr ( z ≥ 0.5 ), the area is located on the left side of the normal curve.
______5. The area under the normal curve indicates probability.
II. Find the area between the following z-scores.
________1. z = -1 and z = 1
________2. z = 0.7 and z = 2.3
________3. z = - 2..75 and z = - 1.58
________4. z = 0 and z = 1.95
________5. z = 0.21 and z = - 1.65
III. Convert the following variable into a standard score with the following given.
______1. x = 10 ; x = 12 ; s = 4
______2. x = 75 ; x = 60 ; s = 20
Tear Here

______3. x = 45 ; x = 42 ; s = 5
______4. x = 250 ; x = 255 ; s = 2
______5. x = 28 ; x = 24 ; s = 2.5
IV. Draw the graph of the following probability.
1. Pr ( - 1.8 ≤ z ≤ 2.7 )

2. Pr ( 1.2 ≤ z ≤ 2.8 )

3. Pr ( z ≤ 1.5 )

113
LYCEUM OF ALABANG

V. Word problems.
1. Scores on a history test have an average of 80 with a standard deviation of 6. If there were
50 students who took the test on this subject, how many students got a score of at least 75 ?

2. The weight of chocolate bars from a particular chocolate factory has a mean of 8 ounces with
standard deviation of .1 ounce. What is the z-score corresponding to a weight of 8.17
ounces?

3. Books in the library are found to have an average length of 350 pages with a standard
deviation of 100 pages. If there are 10,000 books in the library , how many books have a
corresponding length of at least 80 pages?

4. The temperature is recorded at 60 airports in a region. The average temperature is 67


degrees Fahrenheit with a standard deviation of 5 degrees. How many airports have a
temperature of 68 degrees and above?

5. The mean growth of the thickness of trees in a forest is found to be .5 cm/year with a
standard deviation of .1cm/year. What is the z-score corresponding to 1 cm/year?

114
LYCEUM OF ALABANG

Name: ____________________________________ Score____________


Grade / Section : ____________________________ Date : ___________

Chapter 3 : Exercise

I. True or False. Write C if the statement is true and W if it is false.


______1. Stratified random sampling is best done for a heterogeneous population..
______2. A sample is a subset of a population.
______3. In choosing the sample, the researcher should be objective in order to get relaible
informations.
______4. Simple random sampling is appropriate when the population from where the
sample is taken is homogeneous.
______5. Stratified random sampling has an increased accuracy at a given cost than simple
random sampling.
II. Identification. Write the correct answer in the blank.
__________________1. In stratified random sampling, it means sub-group.
__________________2. It is a characteristic of a population.
__________________3. it is the square root of the variance.
__________________4. It is the basic sampling technique.
__________________5. A group of phenomena that have something in common.
__________________6. It is a characteristic of a sample.
__________________7. It is the likelihood that a certain event will occur.
__________________8. It is a measure of the spread of the distribution.
__________________9. It is the process of acquiring a section of the population for a
study.
_________________10. It is the weighted average of the possible values.
III. Compute the number of sample values with the following given:
1. n = 4 ; r = 2
Tear Here

2. n=6;r=3

3. n=8;r=2

4. n=9;r=3

5. n = 10; r = 4

115
LYCEUM OF ALABANG

IV. Problem solving.


A population consists of 4 values such as 12, 14, 16, and 18. Compute the following
when r = 2 :
a. number of sample values or combinations
b. population mean
c. sample variance
d. standard deviation
e. Fill up the table below based on the computed values.

Sample number Sample value Mean Sample variance Std. deviation


(x ) (x) ( s2 ) (s)

Solution:

116
LYCEUM OF ALABANG

Name: ____________________________________ Score____________


Grade / Section : ____________________________ Date : ___________

Chapter 4 : Exercise

I. Identification. Write the correct answer in the blank.


__________________1. a factor used to compute the margin of error
__________________2. a point estimate of the population mean
__________________3. the difference between the observed sample mean and the true
value of the population
__________________4. a range of values that is likely to contain the true value of the
parameter
__________________5. a point estimate of the population variance
__________________6. a symbol that denotes the probability of success
__________________7. a measure which determine if the population parameter is within
the interval
__________________8. a point estimate of the population standard deviation
__________________9. a rule that describes an estimate
_________________10. refers to the process of using sample data to estimate the
parameters of the selected distribution
II. Compute the probability of success ( p ) and probability of failure based on the following
given.
1. 20 out 80 respondents agreed on death penalty
Tear Here

2. 25 out of 40 students were present in the class

III. Find the value of the unknown with the following given:

1. α = _____________ if CI ( confidence interval ) = 80 %

2. Z 0.01 = ___________ if α = 0.01

3. CI = _____________ if Z 0.5 = 1.645

4. α = _____________ if CI = 0.95

5. CI =_____________ if 2α = 0.01

117
LYCEUM OF ALABANG

IV. Problem solving.


1. There are hundreds of mangoes on the trees and we want to know if they are
big enough. 46 mangoes were randomly chosen and the following data were
obtained: ( use confidence interval of 95 % )
x = 86
s = 6.2
Find : a. margin of error
b. true mean

2. Lyceum of Alabang P.E. department wants to calculate the proportion of students who have
attended a women’s basketball game at the college. They use student email addresses,
randomly choose 220 students, and email them. Of the 145 who responded, 22 had attended
a women’s basketball game.

a. What is the sample proportion of students who have attended a women’s basketball
game?

b. What is the sample proportion of students who have not attended a women’s basketball
game ?

c. Can a normal distribution be used to model the sampling distribution for the sample
proportion ? Explain.

118
LYCEUM OF ALABANG

Name: ____________________________________ Score____________


Grade / Section : ____________________________ Date : ___________

Chapter 5 : Exercise

I. True or False. Write C if the statement is correct and W if it false.


______1. T – test is used when the sample size is more than 30.
______2. The null hypothesis ( Ho ) is accepted when the computed t-test value falls at the
shaded region of the normal distribution curve covered by the critical values.
______3. Level of significance is equal to 1 minus level of confidence ( α = 1 – C ).
______4. In a two tailed test, the alternative hypothesis ( Ha ) uses the inequality symbol of
< or > .
______5. T – test is used when the population variance ( σ2 ) is unknown and therefore has
to be estimated from the sample itself.
______6. The null hypothesis ( Ho ) and the alternative hypothesis ( Ha ) are mathematical
opposites.
______7. The null hypothesis is the statement being tested.
______8. Type I error is rejecting a true null hypothesis.
______9. If the computed t-value is positive, it is located at the left side of the normal
distribution curve.
_____10. The tailed test for Ha: µ < 50 is a right-tailed test.

II. Determine if z-test or t-test is appropriate for the following given. Write Z or T in the
blank before each number. If neither of the two test is applicable, write X.
Tear Here

________1. s = 2.5 ; n = 50
________2. n = 15 ;
________3. s2 = unknown ; n = 25
________4. s = 16 ; n = 20
________5. s = 36 ; n = 30

III. Compute for the degrees of freedom based on the following given.
1. df = ___________ if n1 = 16 and n2 = 20

2. df = ___________ if n = 28

3. df = ___________ if n1 = 24 and n2 = 26

IV. Find the critical value based on the following given,


1. cv = ___________ if α = 0.01 , left-tailed test, t-test statistic, n = 24

2. cv = ___________ if α = 0.0025, two-tailed test, t-test statistic, n1 = 22, n2 = 20

119
LYCEUM OF ALABANG

V. Problem solving.
1. Average heart rate for Americans is 72 beats/minute. A group of 25 individuals
participated in an aerobics fitness program to lower their heart rate. After six months the
group was evaluated to identify is the program had significantly slowed their heart. The mean
heart rate for the group was 69 beats/minute with a standard deviation of 6.5. Was the
aerobics program effective in lowering heart rate?
Answer the following:
a. Ho: _______________________________
b. Ha: _______________________________
c. α = ________________________________
d Test statistic ________________________
e. Tailed test __________________________
f. degrees of freedom ____________________
g. critical value _________________________
h. computed t-value _____________________
i. Graph

j. Conclusion: ______________________________________________________________

2. The amount of a certain trace element in blood is known to vary with a standard deviation
of 14.1 ppm (parts per million) for male blood donors and 9.5 ppm for female donors.
Random samples of 75 male and 50 female donors yield concentration means of 28 and 33
ppm, respectively. What is the likelihood that the population means of concentrations of the
element are the same for men and women?
Answer the following:
a. Ho: _______________________________
b. Ha: _______________________________
c. α = ________________________________
d Test statistic ________________________
e. Tailed test __________________________
f. degrees of freedom ____________________
g. critical value _________________________
h. computed t-value _____________________
i. Graph

j. Conclusion: ______________________________________________________________

120
LYCEUM OF ALABANG

Name: ____________________________________ Score____________


Grade / Section : ____________________________ Date : ___________

Chapter 6 : Exercise

I. True or False. Write C if the statement is correct and W if it is false.


______1. The letter x represents the independent variable.
______2. Height is a continuous variable.
______3. A line of best fit is drawn in order to study the relationship between the
variables.
______4. Bivariate analysis is the analysis of exactly two variables.
______5. In a positive correlation, the linear graph rises from lower right to upper left.
______6. A Grade 8 student observes that he gets a high score in the test when he
studies longer. The independent variable here is getting a high score.
______7. If the pattern of dots slope is indefinte, we could say that there is no correlation
between the dependent and independent variables.
______8. The closer the value of Pearson’s correlation coefficient ( r ) to zero, the
stronger is the correlation or relationship between two variables.
______9. Continuous variable is also known as quantitative variable.
_____10. A positive correlation is when the value of the independent variable
increases, the corresponding value of the dependent variable decreases.

II. Find the slope and y-intercept with the following given.
Tear Here

1. 2y = 6x + 4 m = ___________ y-intercept = __________

2. ( 4, 6 ) and ( 2, 18 ) m = ___________ y-intercept = __________

3. – 10x + 5y + 20 = 0 m = ___________ y-intercept = __________

4. 2x + y = 1 m = ___________ y-intercept = ___________


3 4

121
LYCEUM OF ALABANG

III. Problem solving.

1.. Find the Pearson’s correlation coefficient ( r ) using the following data (α = 0.02 ;
two-tailed test ) and state the correlation.

Samples x y xy x2 y2
1 2 6
2 7 16
3 5 11
Σ

122
LYCEUM OF ALABANG

2. You have to examine the relationship between the age and price for used cars sold in the
last year by a car dealership company.

Here is the table of the data:

Car Age ( in years ) Price ( in dollars )


4 6300
5 4500
7 4200
8 4100
10 2100
12 2200

Note : Use points ( 5, 4500 ) and ( 7, 4200 ) as basis for the computation of the slope.

Find :
a. Predict the price when the car age is 6 years.
b. Predict the price when the car age is 9 years.
c. Predict the price when the car age is 15 years.
Tear Here

123

You might also like