You are on page 1of 44

50

LEARNING GUIDE
Week No.: __6__

TOPIC: Distribution Shapes


• Properties of normal distributions
• Standardizing normal distributions
• Skewness and kurtosis
• Central Limit Theorem

EXPECTED COMPETENCIES: At the end of this lesson, the you must have:
1. classified various distribution shapes;
2. identified the characteristics of a normal distribution;
3. standardized the distribution;
4. solved for the z-score and probability values;
5. categorized curves with regards to skewness and kurtosis; and
6. explained the Central Limit Theorem.

CONTENT/TECHNICAL INFORMATION

Most likely, you are familiar with the “COVID” curves below. In these curves, the x-
axis represents the number of infections. Their shapes look like a normal distribution curve,
but they are not. We will present to you the characteristics of the normal curve and you will
determine the reason why we cannot consider the figures below as normal curves.

Figure 1
The “COVID” Curve

(Source: The Economist)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
51

Distribution Shapes
Let us suppose that in a certain program, there are six sections. Look at the shapes of
the distributions of each of these sections. Take note that this is NOT an accurate histogram
because the first bar should start with 5 on the x-axis. This graph overlapped the values of
zero and five on the origin. Besides, the x-axis should use the class boundaries.
Figure 2
Bell-shaped

Figure 3
Left skewed

Figure 4
Right skewed

Note: Histograms and values were generated by


https://www.socscistatistics.com/descriptive/histograms/default.aspx

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
52

Figure 5
Uniform

Figure 6
J-shaped

Figure 7
Reverse J-shaped

Figure 8 Figure 9
Bimodal U-shaped

(Source: Bluman, 2012) (Source: Bluman)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
53

The first distribution shape is bell-shaped which is also known as the normal
distribution or Gaussian distribution, as it is named after Carl Friedrick Gauss (1777-1855)
who derived its equation. You can see this curve on the encircled portion of the bill on figure
10 which honors Gauss.

Figure 10
The German bill that displays Gauss and the normal distribution

(Source: banknotes.com)

We will focus first on the normal distribution. Bluman (2012) defines normal
distribution as a continuous, symmetric, bell-shaped distribution of a variable. While normal
distribution is theoretical and only variables may perfectly fit the normal distributions, many
variables are normally distributed because of less variability. According to Montgomery and
Rungers (2014), the normal distribution as the most widely used model for a continuous
measurement. An example is an automotive engineer who may plan to study the average
pull-off force measurements from several connectors. The replicates of random experiment
will produce a normal distribution.

Characteristics of a normal distribution

1. A normal distribution curve is bell-shaped.


See figures 2, 10, and 11. You can see that the curve is like a bell. But not all bell-
shaped curves are normal distribution just like in Figure 1.

2. The mean, median and mode are equal and are located at the center of the
distribution.
The normal distribution curve below (figure 10) has a mean, median and mode which
are all equal to 28. If we are going to solve for the mean of all scores, it will be 28.
You can see that 28 is also at the middle of the numbers when arranged from lowest
to highest, which is the median. The mode is the highest point in the distribution.

Figure 10
The normal distribution

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
54

3. A normal distribution curve is unimodal (it has only one mode).


The mode is the highest point in the distribution. You can contrast this on figures 8
and 9 which have two modes or two high points in one distribution.

4. The curve is symmetric about the mean, that is, its shape is the same on both sides
of a vertical line passing through the center. (Figure 10)

5. The curve is continuous; there are no gaps or holes. For each value of X, there is a
corresponding value of Y.
The given normal distributions that were stated earlier came from scores in the quiz.
Scores are continuous because it is a measurement of how much knowledge is
attained by a group of students. Discrete variables have different distributions as we
have discussed in learning guide 2.

6. The curve is asymptotic. It never touches the x axis. Theoretically, no matter how
far the curve extends in either direction, it never meets the x axis—but it gets
increasingly closer.

7. The total area under a normal distribution curve is equal to 1.00, or 100%. This fact
may seem unusual, since the curve never touches the x axis, but one can prove it
mathematically by using calculus.
We use a table of values to identify the area under the normal curve (See table 1 on
the next page.)

8. The area under the part of a normal curve that lies within 1 standard deviation of the
mean is approximately 0.68, or 68%; within 2 standard deviations, about 0.95, or
95%; and within 3 standard deviations, about 0.997, or 99.7%. See figure 11.

Figure 11
The empirical (or 68-95-99.7) rule

(Source: Triola, 2010)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
55

Standardizing the Normal Distribution


To identify the area of the normal curve easily, we standardize the distribution. The
standard normal distribution is a normal probability distribution with mean equals 0 and
standard deviation equals to 1. The total area under its density curve is equal to 1.
Formulas
Population data Sample data
value−mean x− μ x− x̅
z = standard deviation z= z=
σ s

Round off the z-score to two decimal places

Figure 12
Converting to a standard normal distribution

Figure 13 shows how to get the area of the normal distribution. The green example
shows that if z = 2.01, then the area under the normal curve is 0.5832 or 58.32%. The red
example shows that if z = 1.27, the area under the normal curve is 0.8980 or 89.80%.
Figure 13
Finding for the area under the normal curve

(Source: Triola, 2010)

Figure 14
Interpreting z-scores

(Source: Triola, 2010)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
56

Figure 14 shows that whenever a value is less than the mean, its corresponding z-
score is negative. The values are already unusual if it is less than -2 or more than 2.

Example1
Suppose that the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 9 milliamperes and a variance of four (milliamperes) 2. What is
the probability that a measurement a) is below 8 milliamperes (b) between 8 and 12
milliamperes, and (c) exceeds 12 milliamperes?

Given:
Let us denote that the current is in milliamperes
x̅ = 9 s2 = 4
Required:
a. P (x <8) b. P (8< x < 12) c. P (x > 12)
x− x̅
Formula: z = s
Since our given is expressed in variance and we know that the variance is the square
of standard deviation, then, our standard deviation is equal to 2.

Solution:

x− x̅ 8− 9 −1
a. 𝑧1 = = = = - 0.5
s 2 2

Since our z-score is negative, let us take the values from the table (Figure 18)
on the next page. P (z < -0.5) = 0.3085 or 30.85%
To illustrate this, we can see in figure that -0.5 is to the left of zero, before -1. We
shade the left portion because we are interested of the scores which are less than -0.5.
To make it is easy for you, the shading will correspond to the “arrow”. Example, in
less than <, the pointed portion is on the left. Or simply, in less than, shade left part.

Figure 15
P (z < -0.5) = 30.85%

Z Score
Note: Revised using Paint. Original image is from Triola

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
57

b. P (8< x < 12). We are looking for the values between 8 and 12. Since we know
already the value of P (x<8), let us find for the value of P (x<12). Take note, that
the values in the normal distribution table is always to the left. (See Figure 19.)

x− x̅ 12− 9 3
𝑧2 = = 2 = 2 = 1.5; P (z < 1.5) = 0.9332 = 93.32%
s

P (8< x < 12) = P (-0.5 < z < 1.5) = 93.32% – 30.85% = 62.47%

Figure 16
P (-0.5 < z < 1.5) = 62.47%

Note: Revised using Paint. Original image is from Triola

c. P (x > 12). Since we know that P (x < 12) is P (z < 1.5) = 0.9332, then we subtract
it from 1. This subtraction from one is because the area under the normal curve
is 1 or 100%. Such that P (z > 1) = P (1 - z < 1.5) = 100% – 93.32% = 16.68%.

Figure 17
P (z > 1.5) = 16.68%

Note: Revised using Paint. Original image is from Triola

Therefore, the probability that a measurement a) is below 8 milliamperes is 30.85%


(b) between 8 and 12 milliamperes is 62.47%, and (c) exceeds 12 milliamperes is 16.68%.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
58

Figure 18
The standard normal distribution part 1

(Source: Bluman)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
59

Figure 19
The standard normal distribution part 2

(Source: Bluman)

Example 2
The line width for semiconductor manufacturing is assumed to be normally distributed with
a mean of 0.4 micrometer and a standard deviation of 0.04 micrometer. What is the
probability that (a) a line width is greater than 0.52 micrometer? (b) a line is between 0.32
and 0.35 micrometer? and (c) the line width of 90% of samples is below what value?

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
60

Given:
Let us denote that the current is in micrometers
μ = 0.4 σ = 0.04
Required:
a. P (x > 0.52) b. P (0.32< x < 0.35) c. value below line width of 90% of samples
x− μ
Formula: z =
σ
x− μ 0.52− 0.4 0.12
a. z1 = =z= = z = 0.04 = 3.0
σ 0.04

P (z > 3.0) = 1 – P(z < 3.0) = 1 – 0.9987 = 0.0013 = 0.13%

Figure 20
P (z > 3.0) = 0.13%

Note: Revised using Paint. Original image is from Triola (2010)

b. P (0.32 x < 0.35)


0.32− 0.4 −0.8 0.35− 0.4 −0.5
z2 = = = -2 z3 = = = -1.25
0.04 0.04 0.04 0.04

P (z2 < -2) = 0.0228 P (z2 < -1.25) = 0.1056

P (0.32 x < 0.35) = 0.1056 – 0.0228 = 0.0828 = 8.28%

Figure 21
P (z > 3.0) = 8.283%

Note: Revised using Paint. Original image is from Triola (2010)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
61

c. value below line width of 90% of samples

To answer this, we are going to find 90% in our normal distribution table. There
is no exact value of 90%; therefore, we are going to consider the value nearest to
0.9 which is 1.28 as shown in figure 22.

(Source: Bluman)
x− μ
Then we are going to substitute this to our formula, z = σ

x−0.4
1.28 = 0.04

1.28 (0.04) = x – 0.4

0.0512 + 0.4 = x
0.4512 = x
x = 0.4512

Therefore, the line width of 90% of samples of the semiconductor is below 0.4512.

Thus, we can use the derived formula x = z σ + μ to find for the value of x.

Determining Normality (Bluman)


While many of the distributions are normally distributed, there are distributions that
are skewed. Since many statistical methods require that the distribution of values should be
normally or approximately normally shaped, it is important to check the normality of the data
distribution.

There are many ways to check for normality. You can draw a histogram for the data
and check its shape. You may also use the Pearson coefficient of skewness (PC), otherwise
3 (mean) − median
known as Pearson’s index of skewness with the formula: PC = .
standard deviation

If the index is less than 1, then the data is significantly left skewed (figures 3 and 22).
If the index is greater than 1, then it is significantly right skewed (figures 4 and 22). It is also
important that you check the data set of possible outliers or extremely small or extremely

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
62

large data values in the set. If ever you have outliers, you may remove the data as “data
cleaning” is part of statistical analysis. But you need to specify in your methodology that you
removed the outlier and explain its consequence to your data set.

Figure 22
Normal and skewed distributions
Figure 22 shows the
major differences between
the normal and skewed
distributions. The “tail” of
the distribution determines
its skewness. Figure 22a has
longer left tail, such that it is
skewed to the left while
figure 22c has longer right
tail and referred to as
skewed to the right.
As previously
stated, the mean, median
and mode of a normal
distribution are equal
(Figure 22b).
(Source: Triola)

Skewed Distributions
Skewness – asymmetry with respect to a histogram of data or a probability distribution
(Montgomery and Rungers). The distribution is significantly skewed if it is less
than or more than 1.
1. Skewed to the left or negatively skewed distribution (Figure 22.a) has longer left tail.
This can happen if the many of the values in the data set are high. Another basis
would be value of the mean and the median. In a negatively skewed distribution, the
mean is less than the median. Our reference point is the median because it is the most
stable measure of central tendency.
For example, in a test where many of the students get high scores, this will result to
a negatively skewed distribution. Another example is a country wherein most of its
population are aged.

2. Skewed to the right or positively skewed distribution (Figure 22c) has longer right
tail. This is true if many of the values in the data set are low. This time, the mean is
higher than the median. This is because the higher scores pull the mean. For example,
in a test where most of the students got low, the distribution is skewed to the right.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
63

Figure 23
Finding the skewness using Microsoft Excel

Things to remember when using Microsoft Excel in getting the skewness:


1. You need identify your data source, whether it is from population or from sample.
We use the command “SKEW” for sample data and “SKEW.P” for population data.
They yield different results.
2. If each row is coming from distinct group, take the skewness one at a time. That is,
a1:a40 should be taken first before you proceed to b1:b40, using the same procedure
if they are from different population groups.

Kurtosis – the measure of the degree to which a unimodal distribution is peaked. Kurtosis is
considered as the fourth moment of statistics. We can find for the kurtosis of a
distribution using Microsoft Excel by following the steps in finding for the
skewness, but this time, using the command name “KURT”.

1. Leptokurtic – random variables with a negative kurtosis. Lepto means slender.


For easier recall, lepto can be associated with lantyog.

2. Mesokurtic – technically defined as having a kurtosis of zero or nearly zero.


Examples are the normal distribution and binomial distribution
(Glen, n.d.)

3. Platykurtic – random variables with positive kurtosis. Platy means broad.


For easier recall, platy can be associated with putot or pandak.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
64

Figure 24
Platykurtic, Mesokurtic, and Leptokurtic Distributions

Note: Image from Emory Oxford College; edited using Paint

Why study the concept of skewness and kurtosis?


One of the assumptions in using parametric tests is the normality of data. The
skewness and kurtosis (heaviness of the tail or peakedness of the distribution) can have an
effect in the data distribution.

The Central Limit Theorem


If X1, X2, …, Xn is a random sample of size n taken from a population (either finite or infinite)
with mean μ and finite variance σ2 and if 𝑥̅ is the sample mean, the limiting form of the
x̅− μ
distribution of 𝑧 = 𝜎/ 𝑛 as n → ∞, is the standard normal distribution.

Important consideration in using the central limit theorem:


1. When the original variable is normally distributed, the distribution of the sample
means will be normally distributed, for any sample size n.
2. When the distribution of the original variable might not be normal, a sample
size of 30 or more is needed to use a normal distribution to approximate the
distribution of the sample means. The larger the sample, the better the
approximation will be.

Figure 25
The Central Limit Theorem

(Source: Devore, 2012)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
65

Figure 23 is an illustration of the central limit theorem as explained Devore (2012).


According to the central limit theorem, when n is large and we wish to calculate a probability
such as P(a < 𝑥̅ < b), we need only assume that 𝑥̅ is normal, standardize it, and use the normal
table of values. The resulting answer will be approximately correct. The exact answer could
be obtained only by first finding the distribution of 𝑥̅ , so that this theorem provides a truly
impressive shortcut. The proof of the theorem involves much advanced mathematics which
is beyond the scope of our lesson.

PROGRESS CHECK
This serves as your answer sheet. Kindly answer the following, take its picture and email to
your teacher. Subject: Midterm Progress 1 Filename: CYS, Surname, First Name

Name: ___________________________________ Course, Year & Section: _____

I. Identify the shape of the following distributions: (5 points)

1. _______________________

2. _______________________

3. _______________________

4. _______________________

5. _______________________
Note: Images from Emory Oxford College

II. Match column A with column B. Write the CAPITAL letter on the space
provided before each number. (5 points)
Column A Column B
______1. The left side is a mirror image A. Asymptotic
of its right side B. Continuous
______2. The tails approximate the x-axis C. Kurtosis
but they do not meet. D. One-modal
______3. There are no gaps. E. Skewness
______4. The peakedness or flatness F. Symmetric
of a distribution G. Unimodal
______5. Has only one mode

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
66

III. Show neat and complete solution. (20 points)

The average fuel efficiency of U.S. light vehicles (cars, SUVs, minivans, vans,
and light trucks) for 2005 was 21 miles per gallon (mpg). If the standard deviation
of the population was 2.8 and the gas ratings were normally distributed, (a) what
is the probability that the fuel used for a random sample of 25 light vehicles is
under 18? (b) between 20 and 24?

Given:

Required:

Solution:

a.

b.

IV. Why is it that the “COVID curve” cannot be considered as a normal


distribution? (Content: 8 points, organization of ideas: 2 points)

V. What are the important concepts about the Central Limit Theorem? (10 points)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
67

REFERENCES

Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The
McGraw-Hill Companies, Inc.

Devore, J. (2012). Probability and Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning

Glen. S. (n.d.). Kurtosis: Definition, Leptokurtic, Platykurtic. StatisticsHowTo.com

Montgomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers.
(6th Ed.). John Wiley & Sons, Inc.

Triola, M. (2012). Elementary Statistics. Pearson. http://www.imathas.com/triola/

Wattkins, J. (n.d.). An Introduction to the Science of Statistics: From Theory to


Implementation.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
68

LEARNING GUIDE

Week No.: __7__

TOPIC: Sampling Techniques


• Probability Sampling
• Non-Probability Sampling
• Getting the Sample Size

EXPECTED COMPETENCIES. At the end of this lesson, you must have:


1. explained the general concepts of estimating the parameters of a population
or probability distribution;
2. described the role of sampling distributions in inferential statistics;
3. differentiated probability from non-probability sampling; and
4. solved sample sizes using the appropriate formula.

CONTENT/TECHNICAL INFORMATION
What comes to your mind about the picture below? You are right! This is about
population (the larger circle) and getting the sample from it (smaller circle). According to
Ott and Longnecker (2004), a population is the set of all measurements of interest to the
sample collector while a sample is any subset of measurements selected from the sample.
Singh and Masuko (2014) denotes that sampling is related with the selection of a subset of
individuals from a population to estimate the characteristics of whole population.

Figure 1
Sampling Procedure

Basic Concepts

Sampling distribution – the probability distribution of a statistic


Point estimate of some population parameter  is a single numerical value θ̂ of a
statistic 
̂
Point estimator – the statistic of the point estimate
Sampling error is the difference between the sample measure and the
corresponding population measure since the sample is not a perfect
representation of the population.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
69

Montgomery and Rungers (2014) describe that we use statistical inference when we
use statistical methods to make decisions and draw conclusions about population. It has two
areas, parameter estimation and hypothesis testing. Ott and Longnecker define parameter
estimation as making inferences about parameters where one predicts the value of the
population parameter. We will discuss parameter estimation in this session and hypothesis
testing will be discussed on week 8.

Examples of parameter estimation:


1. By Montgomery and Rungers
a. Suppose that an engineer is analyzing the tensile strength of a component used
in an air frame to assess the overall structural integrity of the airplane.
Variability is naturally present in the individual components due to the
differences in the batches of raw material used, manufacturing processes, and
measurement procedures. In practice, the engineer will use sample data to
compute a number that is in some sense a reasonable value of the true
population mean.

b. Suppose that the random variable X is normally distributed with an unknown


mean μ. The sample mean is a point estimator of the unknown population
mean μ. For example, you do not know the population mean but you are able
to get these values from the sample: x1 = 26, x2 = 28, x3 = 30, and x1 = 32. The
mean 𝑥̅ = 29 is the point estimate of μ.

2. By Rice (2007)
a. The normal or Gaussian distribution involves two parameters, μ and σ, where
μ is the mean and σ2 is the variance of the distribution.
1 (𝑥−𝜇)2
1 −
𝑓 (𝑥|𝜇, 𝜎 ) = 𝑒 2 𝜎2 , −∞ < 𝑥 < ∞
𝜎√ 2𝜋

An example is the random fluctuations of current across a muscle membrane


which produces a bell-curve when plotted.

b. Gamma distribution depends on two parameters, 𝛼 and :


 −1 𝑒−𝑥, 0 < x < ∞
1
𝑓 (𝑥|, ) = ()

An example is the amount of rainfalls from different storms.

Examples of point estimates (Montgomery and Rungers)


• For μ, the estimate is 𝜇̂ = 𝑥̅ , the sample mean.
• For σ2, the estimate is 𝜎̂ 2 = s2, the sample variance
• For p, the estimate is 𝑝̂ =x/n, the sample proportion, where x is the number of items
random sample of size n that belong to the class of interest
• For μ1 – μ2, the estimate is 𝜇̂ 1 − 𝜇̂ 2 =𝑥̅1 − 𝑥̅2 , the difference between the sample
means of two independent random samples.
• For p1 – p2, the estimate is 𝑝̂1 − 𝑝̂ 2 =𝑥̅1 − 𝑥̅2 , the difference between two sample
proportions computed from two independent random samples.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
70

Common Sampling Techniques (Bluman; Singh & Masuko; Triola)


Unbiased sample or a sample that is chosen at random from a population is important. If the
sample is not randomly selected and is based on judgment or is flawed in some other way,
statistical methods will not work properly and will lead to incorrect decisions.

Reasons for using the sample instead of population.


1. It saves the researcher’s time and money.
2. It enables the researcher to get information that he or she might not be able to obtain
otherwise.
a. In getting blood, only a drop is taken as a sample.
b. In finding the strength of a cable, only few cables are tested.
3. It enables the researcher to get more detailed information about a certain subject.

A. Probability Sampling Methods


1. Random Sampling – a method of obtaining the sample by using chance
methods in such a way that every member of the population has an equal
chance of being selected.
a. Lottery method – number each element of the population and then
place the numbers on cards. Place the cards in a hat or fishbowl,
mix them, and then select the sample by drawing the cards. When
using this procedure, researchers must ensure that the numbers are
well mixed.
b. Use of random numbers – the theory behind random numbers is
that each digit, 0 through 9, has an equal probability of occurring.

Figure 2
Generating random numbers from Microsoft Excel

Note: Generated from Microsoft Excel and edited using Paint

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
71

Figure 2 shows the steps on how to assign random numbers to the students. Suppose
the population size is 20 and the sample size is 5. This is only for the sake of demonstration.
We will have a detailed lesson on several methods of getting the sample size.
1. Assign a distinct number for each student. This will keep their anonymity.
2. Generate random numbers by using the command, “=rand()” on the second
column.
3. Add another column where you are going to paste the values of the random
number. This is because the numbers on column B will keep on changing values.
4. Highlight column B or the random numbers and click “copy”. Then position the
cursor to column C and click “paste values”.
5. You may delete column B. Highlight the new column B. Then click “Sort and
Filter”. Click “Sort Smallest to Largest”.
6. Click “Expand the selection”.
7. Since you only need 5 samples, then the first 5 numbers that correspond the
persons, will be your respondents or participants.

2. Systematic Sampling – number each subject or respondent of the


population and select every kth subject. k = population size over sample
size. For example, there are 20 students and you need 5 respondents. Then
k = 4. You can use die to identify the first starting point. For example, the
starting point is 3. This is how you are going to do it.

Step 1. Assign number to the population

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

Step 2. Determine the starting point using a die. Example, 3 and k = 4.

Step 3. Mark the student numbers of those who will be included.

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

We start with number 3, then adding 4, we have 7… So that the students


who will be your participants or respondents have the following
corresponding numbers: 3, 7, 11, 15, and 19.

3. Stratified Sampling – divide the population into groups or strata according


to some characteristics that are important to the study, then randomly
selecting from each group.

For example, the population is 100, with 60 males and 40 females. The
prescribed sample size is 80 and your grouping will be based on sex.

Males: 60 ÷ 100 = 0.6 Females: 40 ÷ 100 = 0.4


0.6 (80) = 48 0.4 (80) = 32
Thus, our sample will comprise of 48 males and 32 females which will add up to 80.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
72

4. Cluster Sampling – divide the population area into sections (or clusters).
Then randomly select some of those clusters. Then choose all members
from selected clusters. For example, the researcher will conduct a study
about TUP. Using cluster sampling, s/he will randomly select suppose two
schools. If for example, Talisay and Taguig campuses will come out from
random selection, then all students of Talisay and Taguig will be part of
the study. The study will be considered as a result of the whole TUP
system.

5. Multistage Sampling – collect data by using some combination of the


basic sampling methods. In a multistage sample design, pollsters select a
sample in different stages, and each stage might use different methods of
sampling.

B. Non-Probability Sampling Methods


1. Convenience Sampling – an example is to simply ask “the person on the
street”. Another example is to ask by radio, television, mails, or social
media without proper randomization procedures. Selecting people
haphazardly may yield to results that do not reflect the real scenario.

Figure 3
Convenience sampling

(Source: Triola)

2. Quota Sampling – the population is segmented into mutually exclusive


subgroups. Then the researcher, by personal judgment will select the
respondents, which can be biased.

3. Purposive Sampling – sampling units are selected according to the


purpose. This is a biased estimate and can only be used for some specific
purposes.

4. Snowball Sampling – is sampling by referral. This is used for cases in


which finding for participants is difficult.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
73

Sample Size Criteria (Israel, 2003)


1. Level of precision – sometimes called as the sampling error
– the range in which the true value is estimated to be
– ranges are expressed in percentage. Example, ±5%
– For example, if the Social Weather Station say that they are 60%
sure of the data, then it follows that the data is true from 55% -
65%.

2. Confidence level – sometimes called as the risk level


– When a population is repeatedly sampled, the average value of
the attribute obtained by these samples are normally distributed
about the true value. In a normal distribution, approximately
95% of the sample values are within 2 standard deviations of
the true population value

3. Degree of variability – the distribution of the attributes of the population. The more
heterogeneous a population, the larger is the needed sample
size to obtain precision.
– The proportion of 50% indicates maximum variability and is
often used in determining a more conservative sample size

Strategies for Determining the Sample Size (Bluman; Israel)


1. Using a census for a small population – A census eliminates sampling error and
provides data of all individuals in the population. This has high data precision
2. Using a sample size of similar study – caution should be taken as to the methodology
of sampling method used by the study you plan to use.
3. Using formulas to calculate the sample size
Bluman (2012, p. 364) suggests that “If necessary, round the answer up to obtain a
whole number. That is, if there is any fraction or decimal portion in the answer, use
the next whole number for sample size n.”

a. Yamane (1967) – popularly known as Slovin’s formula. Tejada and Punzalan


(2012) stressed that we can only use this formula once these two assumptions
are met: first, the assumption that the confidence coefficient is 95% and
second, the population proportion is close to 0.5. The formula is:
N
n= where N is the population size and e is the margin of error.
1+Ne2

For example: Find the sample size if there are 100 persons in a group.
Let e = 0.05

N 100
n= = 1+100 (0.05)2 = 80
1+Ne2

b. Formula for calculating a sample for proportions (Cochran, 1963).


𝑍2 𝑝 𝑞
𝑛0 =
𝑒2

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
74

𝑍2 𝑝 𝑞
𝑛0 =
𝑒2

where: n0 – sample size


Z2 – the abscissa of the normal curve that cuts off
p – the estimated proportion
q=1–p
*Z values can be found in tables that contains area under the
normal curve
** Confidence level = 1 -  = 95%

Example: Suppose, we wish to evaluate the province’s extension program


where farmers were encouraged to adopt a new practice.
Assume that there is a large population. Therefore, assume
maximum variability (p = 0.5). Moreover, we want 95%
confidence level and ±5% precision.
Note: z-value is from the normal distribution table

𝑍2 𝑝 𝑞 1.962 (0.5)(.5)
𝑛0 = = = 385 farmers
𝑒2 0.052

In most cases in statistics, we round off. However,


when determining sample size, we always round up to the
next whole number.

c. Formula for the minimum sample size needed for an interval estimate of the
population mean (Bluman)

𝑍/2 . 𝜎 2
n=( )
𝑒

Example: A scientist wishes to estimate the average depth of a river. He


wants to be 99% confident that the estimate is accurate within
2 feet. From a previous study, the standard deviation of the
depths measured was 4.33 feet.

Given: Z (from table of values) = 2.58; e = 2; σ = 4.33

2.58 (4.33) 2
n=[ ] = 31.2 ≈ 32 feet deep
2

Round the value 31.2 up to 32. Therefore, to be 99%


confident that the estimate is within 2 feet of the true mean
depth, the scientist needs at least a sample of 32
measurements.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
75

PROGRESS CHECK
This serves as your answer sheet. Kindly answer the following, take its picture and email to
your teacher. Subject: Midterm Progress 2 Filename: CYS, Surname, First Name

I. Evaluation. Show neat and complete solution on the following:


1. Solve for the sample size with the assumption that the confidence coefficient
is 95% and second, the population proportion is close to 0.5. (15 points)
a. Suppose the school has the following population per year level:
First year – 205
Second year – 220
Third year – 180
Fourth year – 165

Use the appropriate probability sampling for this population.

Population =

Sample size =

First year: n =

Second year: n =

Third year: n =

Fourth year: n =

b. Use systematic sampling to find the respondents of the study. Let


2 be the starting point. (10 points)

1 2 3 4 5 6 7 8 9 10

11 12 13 14 15 16 17 18 19 20

21 22 23 24 25 26 27 28 29 30

31 32 33 34 35 36 37 38 39 40

N = ________ n= k=

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
76

II. Discussion: Use the space provided for your answer. Please do not use extra sheet
of paper. For each item, you will be graded using the following criteria:
Content – 10 points
Organization of ideas – 3 points
Spelling and grammar – 2 points

1. Give two examples of parameters and explain each in terms of point


estimates and point estimators.

2. Why is it important for the researcher to use probability sampling methods?

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
77

REFERENCES

Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The McGraw-
Hill Companies, Inc.

Cochran, W. (1977). Sampling Techniques (3rd Ed.). John Wiley and Sons, Inc.

Israel, G. (2003). Determining the Sample Size. http.//edis.it]fas.ufl.edu

James Cook University Australia (n.d.). Basic Statistics: Sample vs Population Distributions.

Ott, L. & Longnecker, M. (2004), A First Course in Statistical Methods: Thomson-


Brooks/Cole

Rice, J. (2007). Mathematical Statistics and Data Analysis (3rd Ed.). Thomson Learning,
Inc.

Singh, A. & Masuku, M. (2014). Sampling Techniques & Determination of Sample Size in
Applied Statistics Research: An Overview. International Journal of Economics,
Commerce and Management. 2(11). 1 – 22.

Tejada, J. & Punzalan, R. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician. 61(1). 129 – 136.

Yamane, T. (1967). Elementary Sampling Theory. Prentice Hall, Inc.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
78

LEARNING GUIDE

Week No.: __8__

TOPIC: Hypothesis Testing


• Definition
• Null and alternative hypothesis
• p-value
• test types (one-tailed or two-tailed)
• type 1 and type 2 errors

EXPECTED COMPETENCIES. At the end of the lesson, you must have:


1. defined hypothesis testing;
2. enumerated the components of a formal hypothesis testing method;
3. formulated the mathematical form of hypotheses;
4. evaluated problems presented in hypothesis testing;
5. made decisions on what to do with the null hypothesis; and
6. interpreted the implications of the decisions made.

CONTENT/TECHNICAL INFORMATION

Figure 1
Basics of Hypothesis Testing

(Source: https://www.psychologywizard.net/hypotheses-ao1-ao2.html)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
79

In the previous learning guide, we have discussed estimation as one of the areas of
inferential statistics. This time, we are going to talk about hypothesis testing as another area
of inferential statistics. As many would say, hypothesis is an educated guess or assumption.
Hypothesis is singular and its plural form is hypotheses. We need to test the hypothesis in
order to generalize the population.

Figure 1 shows a comical way of describing hypothesis testing. Your construction of


hypothesis will determine the tail of the distribution that you have. That is why, the figure 1
shows that if the hypothesis is “A happy dog will eat more”, it is only one-tailed. This is
because the word “more” is associated with right-tail distribution. This also signifies the
direction of your study. On the other hand, if the hypothesis is, “Mood affects the appetite of
dogs”, there is no distinct direction for the test because mood will either increase or decrease
the appetite. Therefore, this is two-tailed test.

Concepts (Bluman, 2012; Triola, 2010)

Statistical hypothesis – conjecture about a population parameter.


– may or may not be true

Null hypothesis – a statistical hypothesis that states that there is no difference


between a parameter and a specific value, or that there is no
difference between two parameters.
– denotes a condition of equality
– symbolized by H0
– tested directly (either rejected or not rejected)

Alternative hypothesis – statement that the parameter has a value that differs
from the null hypothesis.
– otherwise known as the research hypothesis
– symbolized by H1 (Some use Ha or HA)
– symbolic forms use one of these symbols , <, >

One-tailed test – indicates that the null hypothesis should be rejected when the
test value is in the critical region on one side of the mean.
• Left-tailed
• Right-tailed

Two-tailed test – indicates that the null hypothesis should be rejected when the
test value is in either of the two critical regions.

Let us have the following examples. Write the symbolic form of the following
hypotheses. Write in symbolic form. Identify if it is null or alternative. If null, give its
alternative form, and vice-versa. Lastly, identify the test type.
1. The mean burning rate of the airplane propellant is equal to 50 cm/second.
2. The mean lifetime of a battery is greater than 36 months.
3. The mean grade of the students is at most 7.5.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
80

Answers:
1. The mean burning rate of the airplane propellant is equal to 50 cm/second.
• Symbolic form: μ = 50 – null hypothesis
• Alternative hypotheses: μ ≠ 50
• Test type: two-tailed

2. The mean lifetime of a battery is greater than 36 months.


• Symbolic form: μ > 36 – alternative hypothesis
• Alternative hypotheses: μ <36
• Test type: right-tailed (the arrow points to the left)

3. The mean grade of the students is at most 7.5.


• Symbolic form: μ < 7.5 – null hypothesis
• Alternative hypotheses: μ > 7.5
• Test type: right-tailed (the arrow points to the left)
Note that “at most” means “less than or equal to 7.5”

Concepts

Statistical test – uses the data obtained from a sample to decide whether the
null hypothesis should be rejected.
Test value - the numerical value obtained from a statistical test
Level of significance – the maximum probability of committing a type I error.
– symbolized by α
Critical value – separates the critical region from the noncritical region.
– symbolized by C.V.
Critical or rejection region – is the range of values of the test statistic that
cause us to reject the null hypothesis

Figure 2
Critical/ rejection regions of a two-tailed test

(Source: Triola)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
81

Figure 3
Critical/ rejection region of a left-tailed test

(Source: Triola)

Figure 4
Critical/ rejection region of a rightt-tailed test

(Source: Triola)

Figures 2, 3, and 4 show the rejection regions. The rejection region starts at the critical
value. Then, it extends to the direction set by the alternative hypothesis. If the alternative
hypothesis uses words such as less than, is below, is lower than, is shorter than, is smaller
than or is reduced from, then the alternative hypothesis is left tailed. Meanwhile, if it uses
greater than, is above, is higher than, is longer than, is bigger than, or is increased to, then, it
is right-tailed. Words such as is different from or is not equal to indicate two-tailed test.
In testing hypothesis, we have only two decisions:
1. Reject the null hypothesis; or
2. Fail to reject the null hypothesis.
Failing to reject the null hypothesis is more appropriate than “accept the null
hypothesis” because the available evidence is not strong enough to warrant rejection
of the null hypothesis. Also, we are not proving the null hypothesis (Triola; Bluman)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
82

Possible Outcomes of Hypothesis Testing


Hypothesis testing takes risk. This is affected by the nature of the null hypothesis and
the decision made by the researcher. Let us have this example from Montgomery and
Rungers:
Suppose that an engineer is designing an air crew escape system that consists
of an ejection seat and a rocket motor that powers the seat. The rocket motor contains
a propellant, and for the ejection seat to function properly, the propellant should have
a mean burning rate of 50 cm/second. If the burning rate is too low, the ejection seat
may not function properly, leading to an unsafe ejection and possible injury of the
pilot. Higher burning rates may imply instability in the propellant or an ejection seat
that is too powerful, again leading to possible pilot injury.

Suppose that we are interested in the burning rate of the solid propellant.
Burning rate is a random variable that can be described by a probability distribution.
Suppose that our interest focuses on the mean burning rate (a parameter of this
distribution). Specifically, we are interested in deciding whether the mean burning
rate is 50 centimeters per second or not.

Null hypothesis: The mean burning rate of the propellant is 50cm/s.

Table 1
Outcomes of Hypothesis Testing

Nature of the null hypothesis

H0 is true H0 is false
Decision
(μ = 50 cm/sec) (μ ≠ 50 cm/sec)

Type 1 error () Correct Decision


Reject H0
Plane accident
Type 2 Error ()
Fail to reject H0 Correct Decision
Plane accident
(Note: Revised from Bluman, p. 404)

This example shows how important is the evaluation of the of the accuracy of the
hypothesis and the correctness of the decision made. Both are important in the study. In this
case, the safety of the airplane crew and the passenger are at stake.

Concepts

Type I error () – rejecting a true null hypothesis


Type II error () – failing to reject a false null hypothesis
Level of Significance - When the null hypothesis is rejected at a specific
significance level, it can be concluded that the difference is probably
not due to chance and thus is statistically significant.
Note: Statistical significance is different from practical significance.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
83

Let us have another example. Say, our research hypothesis says that the vaccine that
was made significantly decreases the COVID-19 or Corona Virus Disease 2019.
H0: The vaccine is not effective
H1: The vaccine is effective

Table 2
Outcomes of Hypothesis Testing

Nature of the null hypothesis

H0 is true H0 is false
Decision
(The vaccine is not effective) (The vaccine is effective)

Reject H0 Correct Decision


Type 1 error ()
(Mass produce (Expenses and false hope) (COVID-19 crisis will stop)
the vaccine and
recommend its
use)
Type 2 Error ()
Fail to reject H0 Correct Decision
(Deprivation of medicine to
(Do not (Savings and safety)
many)
produce)

Figure 5
Wording of the Conclusion

(Source: Triola)

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
84

Methods used to test hypotheses (Bluman)


A. The traditional method
B. The p-value method
C. The confidence interval method

Steps in Hypothesis Testing


1. Identify the claim and state the hypotheses.
2. Identify the level of significance and find the critical value.
3. Compute the test value.
4. Decide (reject or fail to reject H0)
5. Summarize the results. Make a conclusion.

A. Hypothesis Testing Using Traditional Method

A researcher wishes to see if the mean number of days that a basic, low-
price, small automobile sits on a dealer’s lot is 29. A sample of 30 automobile
dealers has a mean of 30.1 days for basic, low-price, small automobiles. At  =
0.05, test the claim that the mean time is greater than 29 days. The standard
deviation of the population is 3.8 days.

Given: μ = 29; 𝑥̿ = 30.1; n = 30;  = 0.05; σ = 3.8

1. Identify the claim and state the hypotheses.

Claim: The mean time is greater than 29 days.


H0: The mean time is equal to 29 days. μ = 29
(Strictly speaking, H0 μ < 29, However, books use = sign only to
simplify)

H0: The mean time is greater than 29 days. μ > 29


Test type: right-tailed

2. Identify the level of significance and find the critical value.


 = 0.05; C.V. = 1.65

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
85

3. Compute the test value.


𝑥̿ − 𝜇 30.1 − 29
z= = = 1.59
𝜎/√𝑛 3.8/√30
4. Decide
Fail to reject H0 because z = 1.59 is less than 1.65.
5. Summarize the results. Make a conclusion.
The sample data support the claim that the mean time of basic, low-price,
small automobile sits on a dealer’s lot is greater than 29 days.

B. p – Value Method for Hypothesis Testing

The p-value (or probability value) is the probability of getting a sample


statistic (such as the mean) or a more extreme sample statistic in the direction
of the alternative hypothesis when the null hypothesis is true.

Decision Rule when Using a p-value


If p-value < , reject the null hypothesis.
If p-value > , do not reject the null hypothesis.

Example 1:
Suppose the mean grade of 9 randomly selected males is 7.3 while the mean
grade of 9 females was 6.9. Does this mean that males perform better than females?

Steps in Hypothesis Testing


1. Identify the claim and state the hypotheses.
Let µ1 be the grade of male students and µ2 be the grade of female students
µ1 > µ2 – This is the claim and is alternative. Right-tailed test.
Ho: There is no significant difference in the mean scores of male and female
students. µ1 = µ2 (Strictly speaking, the Ho would be µ1 < µ2)

2. Find the p-value


We will use Microsoft Excel in doing this. As to the statistical tool, we will
just use t-test: Two-sample Assuming Equal Variances. It is two-sample because
we get from two populations, male and female. We are assuming equal variances
because there is a minimal difference between the variances of the two groups.

We will discuss more about the statistical tools on the 10 th week.


3. Decide
Since p = 1.75 and is greater than .05, we will not reject the null hypothesis.
Decision: Fail to reject the null hypothesis.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
86

Figure 6
Finding the p-value of the t-test using Microsoft Excel

1. After encoding the grades of the students, click


“Data”.
2. Click “Data Analysis”.
3. Click statistical tool that is appropriate to your
problem.
4. Click OK.
5. Specify the data range by either clicking then
dragging the data or by filling out the data range.
6. Set Alpha to the significance level required. It is
easier to create a “New Worksheet Ply, so tick it.
7. Click OK and go to the new sheet generated by
Microsoft Excel.
8. We use the “t Critical one-tail” because our
alternative hypothesis is right-tailed (greater than).

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
87

5. Summarize the results. Make a conclusion.


The sample data do not support the claim that male students perform better than
female students. Therefore, there is no significant difference in the mean grades of male
and female students.

Example 2 (Revised from Montgomery and Rungers)


An electrical engineer must design a circuit to deliver the maximum amount of current to
display tube to achieve sufficient image brightness. Within her allowable design constraints,
she has developed two candidate circuits and tests prototypes of each. Assume that the data
are randomly selected and normally distributed. The resulting data in milliamperes are as
follows:
Circuit 1: 250, 253, 259, 270, 280, 290, 267, 280
Circuit 2: 240, 238, 243, 228, 226, 227, 208, 212
Test the hypothesis that there is no significant difference in the means of the two groups.
1. Claim: There is no significant difference in the means of the two groups of
circuits.
Let µ1 be the mean of circuit 1 and µ2 be the mean of circuit 2
Ho: µ1 = µ2
There is no significant difference in the means of two circuits.
H1: µ1  µ2
There is a significant difference in the means of two circuits.
2. Find the p-value

p = 0.0000285
3. Decide
Since p is less than 0.05, let us reject the null hypothesis.

4. Conclude
There is sufficient evidence to warrant rejection of the claim that there is no
significant difference in the means of the two groups of circuits.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
88

C. Hypothesis Testing Using Confidence Interval Method

Confidence level – an interval estimate of a parameter is the probability that


the interval estimate will contain the parameter, assuming that a large number
of samples are selected and that the estimation process on the same parameter
is repeated.

Confidence interval – a specific interval estimate of a parameter determined


by using data obtained from a sample and by using the specific confidence
level of the estimate.

90%, 95% and 99% - commonly used confidence levels

The central limit theorem states that when the sample size is large,
approximately 95% of the sample means taken from a population and same
sample size will fall within  1.96 standard errors of the population mean, that
𝜎
is, μ = 1.96  ( 𝑛).

Let us have this example by Bluman:


A researcher wishes to estimate the number of days it takes an
automobile dealer to sell a Chevrolet Aveo. A sample of 50 cars had a mean time
on the dealer’s lot of 54 days. Assume the population standard deviation to be
6.0 days. Find the best point estimate of the population mean and the 95%
confidence interval of the population mean.
The best point estimate of the mean is 54 days. For the 95% confidence
interval use z = 1.96.
𝜎 𝜎
𝑥̿ − 𝑧𝛼 ( 𝑛) < 𝜇 < 𝑥 + 𝑧𝛼 ( 𝑛)
2 √ 2 √
6.0 6.0
54 – 1.96 ( ) < 𝜇 < 𝑥 + 1.96 ( )
√50 √50

54 – 1.7 < 𝜇 <1.7 + 54


52.3 < 𝜇 < 55.7

Reporting the CI indicates precision, provides information about the


magnitude of effect and facilitates meta-analytic thinking (Hirpara, et al., , 2015;
Zhang & Zhang, 2012). Narrower CI width denotes a more precise estimate and
sufficient power (Hirpara; Tan & Tan, 2010; Zhang, 2008). Confidence interval
and p-value provide complementary information about practical significance
and statistical probability, respectively (Hirpara).

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
89

PROGRESS CHECK
This serves as your answer sheet. Kindly answer the following, take its picture and email to
your teacher. Subject: Midterm Progress 3 Filename: CYS, Surname, First Name

I. Define the following terms (10 points)

1. Hypothesis testing

2. Alternative hypothesis

3. Null hypothesis

4. p-Value

5. Confidence interval

II. What are the components of a formal hypothesis testing using traditional
method? (5 points)

1. _____________________________________________________
2. _____________________________________________________
3. _____________________________________________________
4. _____________________________________________________
5. _____________________________________________________

III. Follow the steps in hypothesis using p-value method. The output of the
Microsoft Excel is shown in the last part. (25 points)

Criteria: Accuracy of the terms – 20


Organization of ideas, grammar, spelling and syntax – 3 points
Neatness – 2 points

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
90

Two machines are used for filling plastic bottles with a net volume of 16.0 ounces.
The fill volume can be assumed to be normal with standard deviation σ 1= 0.020 and σ2 =
0.025 ounces. A member of the quality engineering staff suspects that both machines fill to
the same mean net volume, whether this volume is 16.0 ounces. A random sample of 10
bottles is taken from the output of each machine. (Montgomery and Rungers, p. 381)

Here is the Microsoft Excel Report from the data given on this problem.

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
91

REFERENCES

Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The
McGraw-Hill Companies, Inc.

Hirpara, N., Jain, S., & Gupta, A. (2015). Interpreting Research Findings with Confidence
Interval. Journal of Orthodontics & Endodontics Volume (1)8. 1-4.

Montogomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers
(6th Ed.). John Wiley & Sons, Inc.

Tan, S. H. & Tan, S. B. (2010). The Correct Interpretation of Confidence Intervals.


https://doi.org/10.1177/201010581001900316

Triola, M. (2012). Elementary Statistics. Pearson. http://www.imathas.com/triola/

Zhang, S. C. & Zhang, S. (2012). Confidence Intervals for Low-Dimensional Parameters in


High-Dimensional Linear Models. https://arxiv.org/abs/1110.2563v2

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
92

REFERENCES

Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The
McGraw-Hill Companies, Inc.

Cochran, W. (1977). Sampling Techniques (3rd Ed.). John Wiley and Sons, Inc.

Devore, J. (2012). Probability and Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning

Glen. S. (n.d.). Kurtosis: Definition, Leptokurtic, Platykurtic. StatisticsHowTo.com

Hirpara, N., Jain, S., & Gupta, A. (2015). Interpreting Research Findings with Confidence
Interval. Journal of Orthodontics & Endodontics Volume (1)8. 1-4.

Israel, G. (2003). Determining the Sample Size. http.//edis.it]fas.ufl.edu

James Cook University Australia (n.d.). Basic Statistics: Sample vs Population Distributions.

Montgomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers
(6th Ed.). John Wiley & Sons, Inc.

Ott, L. & Longnecker, M. (2004), A First Course in Statistical Methods: Thomson-


Brooks/Cole

Rice, J. (2007). Mathematical Statistics and Data Analysis (3rd Ed.). Thomson Learning,
Inc.

Singh, A. & Masuku, M. (2014). Sampling Techniques & Determination of Sample Size in
Applied Statistics Research: An Overview. International Journal of Economics,
Commerce and Management. 2(11). 1 – 22.

Tan, S. H. & Tan, S. B. (2010). The Correct Interpretation of Confidence Intervals.


https://doi.org/10.1177/201010581001900316

Tejada, J. & Punzalan, R. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician. 61(1). 129 – 136.

Triola, M. (2012). Elementary Statistics. Pearson. http://www.imathas.com/triola/

Wattkins, J. (n.d.). An Introduction to the Science of Statistics: From Theory to


Implementation.

Yamane, T. (1967). Elementary Sampling Theory. Prentice Hall, Inc.

Zhang, S. C. & Zhang, S. (2012). Confidence Intervals for Low-Dimensional Parameters in


High-Dimensional Linear Models. https://arxiv.org/abs/1110.2563v2

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.
93

ABOUT THE AUTHOR

Lucille S. Arcedas, Ph. D. is a TUPV faculty member


from Basic Arts and Science Department. She started her
teaching career on June 2001 after obtaining a degree in
Bachelor of Secondary Education major in Mathematics from
Kabankalan Catholic College. She received Master of
Education major in Mathematics degree from University of
Saint La Salle, Bacolod City. In 2007, she is one of the Filipinos
who were chosen to be a scholar of Ford Foundation
International Fellowships Program. This enabled her to earn a
degree in Master in Professional Studies (Applied Statistics) at
Cornell University, a member of the Ivy League in Ithaca, New
York, United States of America. She gained the degree in Doctor of Philosophy in Science
Education major in Mathematics from West Visayas State University, Iloilo City as a scholar
of the Department of Science and Technology – Science Education Institute – Capacity
Building Program for Science and Math Education.

You may contact her through email (lucille_arcedas@tup.edu.ph.).

This module is a property of Technological University of the Philippines Visayas intended


for EDUCATIONAL PURPOSES ONLY and is NOT FOR SALE NOR FOR REPRODUCTION.

You might also like