Professional Documents
Culture Documents
LEARNING GUIDE
Week No.: __6__
EXPECTED COMPETENCIES: At the end of this lesson, the you must have:
1. classified various distribution shapes;
2. identified the characteristics of a normal distribution;
3. standardized the distribution;
4. solved for the z-score and probability values;
5. categorized curves with regards to skewness and kurtosis; and
6. explained the Central Limit Theorem.
CONTENT/TECHNICAL INFORMATION
Most likely, you are familiar with the “COVID” curves below. In these curves, the x-
axis represents the number of infections. Their shapes look like a normal distribution curve,
but they are not. We will present to you the characteristics of the normal curve and you will
determine the reason why we cannot consider the figures below as normal curves.
Figure 1
The “COVID” Curve
Distribution Shapes
Let us suppose that in a certain program, there are six sections. Look at the shapes of
the distributions of each of these sections. Take note that this is NOT an accurate histogram
because the first bar should start with 5 on the x-axis. This graph overlapped the values of
zero and five on the origin. Besides, the x-axis should use the class boundaries.
Figure 2
Bell-shaped
Figure 3
Left skewed
Figure 4
Right skewed
Figure 5
Uniform
Figure 6
J-shaped
Figure 7
Reverse J-shaped
Figure 8 Figure 9
Bimodal U-shaped
The first distribution shape is bell-shaped which is also known as the normal
distribution or Gaussian distribution, as it is named after Carl Friedrick Gauss (1777-1855)
who derived its equation. You can see this curve on the encircled portion of the bill on figure
10 which honors Gauss.
Figure 10
The German bill that displays Gauss and the normal distribution
(Source: banknotes.com)
We will focus first on the normal distribution. Bluman (2012) defines normal
distribution as a continuous, symmetric, bell-shaped distribution of a variable. While normal
distribution is theoretical and only variables may perfectly fit the normal distributions, many
variables are normally distributed because of less variability. According to Montgomery and
Rungers (2014), the normal distribution as the most widely used model for a continuous
measurement. An example is an automotive engineer who may plan to study the average
pull-off force measurements from several connectors. The replicates of random experiment
will produce a normal distribution.
2. The mean, median and mode are equal and are located at the center of the
distribution.
The normal distribution curve below (figure 10) has a mean, median and mode which
are all equal to 28. If we are going to solve for the mean of all scores, it will be 28.
You can see that 28 is also at the middle of the numbers when arranged from lowest
to highest, which is the median. The mode is the highest point in the distribution.
Figure 10
The normal distribution
4. The curve is symmetric about the mean, that is, its shape is the same on both sides
of a vertical line passing through the center. (Figure 10)
5. The curve is continuous; there are no gaps or holes. For each value of X, there is a
corresponding value of Y.
The given normal distributions that were stated earlier came from scores in the quiz.
Scores are continuous because it is a measurement of how much knowledge is
attained by a group of students. Discrete variables have different distributions as we
have discussed in learning guide 2.
6. The curve is asymptotic. It never touches the x axis. Theoretically, no matter how
far the curve extends in either direction, it never meets the x axis—but it gets
increasingly closer.
7. The total area under a normal distribution curve is equal to 1.00, or 100%. This fact
may seem unusual, since the curve never touches the x axis, but one can prove it
mathematically by using calculus.
We use a table of values to identify the area under the normal curve (See table 1 on
the next page.)
8. The area under the part of a normal curve that lies within 1 standard deviation of the
mean is approximately 0.68, or 68%; within 2 standard deviations, about 0.95, or
95%; and within 3 standard deviations, about 0.997, or 99.7%. See figure 11.
Figure 11
The empirical (or 68-95-99.7) rule
Figure 12
Converting to a standard normal distribution
Figure 13 shows how to get the area of the normal distribution. The green example
shows that if z = 2.01, then the area under the normal curve is 0.5832 or 58.32%. The red
example shows that if z = 1.27, the area under the normal curve is 0.8980 or 89.80%.
Figure 13
Finding for the area under the normal curve
Figure 14
Interpreting z-scores
Figure 14 shows that whenever a value is less than the mean, its corresponding z-
score is negative. The values are already unusual if it is less than -2 or more than 2.
Example1
Suppose that the current measurements in a strip of wire are assumed to follow a normal
distribution with a mean of 9 milliamperes and a variance of four (milliamperes) 2. What is
the probability that a measurement a) is below 8 milliamperes (b) between 8 and 12
milliamperes, and (c) exceeds 12 milliamperes?
Given:
Let us denote that the current is in milliamperes
x̅ = 9 s2 = 4
Required:
a. P (x <8) b. P (8< x < 12) c. P (x > 12)
x− x̅
Formula: z = s
Since our given is expressed in variance and we know that the variance is the square
of standard deviation, then, our standard deviation is equal to 2.
Solution:
x− x̅ 8− 9 −1
a. 𝑧1 = = = = - 0.5
s 2 2
Since our z-score is negative, let us take the values from the table (Figure 18)
on the next page. P (z < -0.5) = 0.3085 or 30.85%
To illustrate this, we can see in figure that -0.5 is to the left of zero, before -1. We
shade the left portion because we are interested of the scores which are less than -0.5.
To make it is easy for you, the shading will correspond to the “arrow”. Example, in
less than <, the pointed portion is on the left. Or simply, in less than, shade left part.
Figure 15
P (z < -0.5) = 30.85%
Z Score
Note: Revised using Paint. Original image is from Triola
b. P (8< x < 12). We are looking for the values between 8 and 12. Since we know
already the value of P (x<8), let us find for the value of P (x<12). Take note, that
the values in the normal distribution table is always to the left. (See Figure 19.)
x− x̅ 12− 9 3
𝑧2 = = 2 = 2 = 1.5; P (z < 1.5) = 0.9332 = 93.32%
s
P (8< x < 12) = P (-0.5 < z < 1.5) = 93.32% – 30.85% = 62.47%
Figure 16
P (-0.5 < z < 1.5) = 62.47%
c. P (x > 12). Since we know that P (x < 12) is P (z < 1.5) = 0.9332, then we subtract
it from 1. This subtraction from one is because the area under the normal curve
is 1 or 100%. Such that P (z > 1) = P (1 - z < 1.5) = 100% – 93.32% = 16.68%.
Figure 17
P (z > 1.5) = 16.68%
Figure 18
The standard normal distribution part 1
(Source: Bluman)
Figure 19
The standard normal distribution part 2
(Source: Bluman)
Example 2
The line width for semiconductor manufacturing is assumed to be normally distributed with
a mean of 0.4 micrometer and a standard deviation of 0.04 micrometer. What is the
probability that (a) a line width is greater than 0.52 micrometer? (b) a line is between 0.32
and 0.35 micrometer? and (c) the line width of 90% of samples is below what value?
Given:
Let us denote that the current is in micrometers
μ = 0.4 σ = 0.04
Required:
a. P (x > 0.52) b. P (0.32< x < 0.35) c. value below line width of 90% of samples
x− μ
Formula: z =
σ
x− μ 0.52− 0.4 0.12
a. z1 = =z= = z = 0.04 = 3.0
σ 0.04
Figure 20
P (z > 3.0) = 0.13%
Figure 21
P (z > 3.0) = 8.283%
To answer this, we are going to find 90% in our normal distribution table. There
is no exact value of 90%; therefore, we are going to consider the value nearest to
0.9 which is 1.28 as shown in figure 22.
(Source: Bluman)
x− μ
Then we are going to substitute this to our formula, z = σ
x−0.4
1.28 = 0.04
0.0512 + 0.4 = x
0.4512 = x
x = 0.4512
Therefore, the line width of 90% of samples of the semiconductor is below 0.4512.
Thus, we can use the derived formula x = z σ + μ to find for the value of x.
There are many ways to check for normality. You can draw a histogram for the data
and check its shape. You may also use the Pearson coefficient of skewness (PC), otherwise
3 (mean) − median
known as Pearson’s index of skewness with the formula: PC = .
standard deviation
If the index is less than 1, then the data is significantly left skewed (figures 3 and 22).
If the index is greater than 1, then it is significantly right skewed (figures 4 and 22). It is also
important that you check the data set of possible outliers or extremely small or extremely
large data values in the set. If ever you have outliers, you may remove the data as “data
cleaning” is part of statistical analysis. But you need to specify in your methodology that you
removed the outlier and explain its consequence to your data set.
Figure 22
Normal and skewed distributions
Figure 22 shows the
major differences between
the normal and skewed
distributions. The “tail” of
the distribution determines
its skewness. Figure 22a has
longer left tail, such that it is
skewed to the left while
figure 22c has longer right
tail and referred to as
skewed to the right.
As previously
stated, the mean, median
and mode of a normal
distribution are equal
(Figure 22b).
(Source: Triola)
Skewed Distributions
Skewness – asymmetry with respect to a histogram of data or a probability distribution
(Montgomery and Rungers). The distribution is significantly skewed if it is less
than or more than 1.
1. Skewed to the left or negatively skewed distribution (Figure 22.a) has longer left tail.
This can happen if the many of the values in the data set are high. Another basis
would be value of the mean and the median. In a negatively skewed distribution, the
mean is less than the median. Our reference point is the median because it is the most
stable measure of central tendency.
For example, in a test where many of the students get high scores, this will result to
a negatively skewed distribution. Another example is a country wherein most of its
population are aged.
2. Skewed to the right or positively skewed distribution (Figure 22c) has longer right
tail. This is true if many of the values in the data set are low. This time, the mean is
higher than the median. This is because the higher scores pull the mean. For example,
in a test where most of the students got low, the distribution is skewed to the right.
Figure 23
Finding the skewness using Microsoft Excel
Kurtosis – the measure of the degree to which a unimodal distribution is peaked. Kurtosis is
considered as the fourth moment of statistics. We can find for the kurtosis of a
distribution using Microsoft Excel by following the steps in finding for the
skewness, but this time, using the command name “KURT”.
Figure 24
Platykurtic, Mesokurtic, and Leptokurtic Distributions
Figure 25
The Central Limit Theorem
PROGRESS CHECK
This serves as your answer sheet. Kindly answer the following, take its picture and email to
your teacher. Subject: Midterm Progress 1 Filename: CYS, Surname, First Name
1. _______________________
2. _______________________
3. _______________________
4. _______________________
5. _______________________
Note: Images from Emory Oxford College
II. Match column A with column B. Write the CAPITAL letter on the space
provided before each number. (5 points)
Column A Column B
______1. The left side is a mirror image A. Asymptotic
of its right side B. Continuous
______2. The tails approximate the x-axis C. Kurtosis
but they do not meet. D. One-modal
______3. There are no gaps. E. Skewness
______4. The peakedness or flatness F. Symmetric
of a distribution G. Unimodal
______5. Has only one mode
The average fuel efficiency of U.S. light vehicles (cars, SUVs, minivans, vans,
and light trucks) for 2005 was 21 miles per gallon (mpg). If the standard deviation
of the population was 2.8 and the gas ratings were normally distributed, (a) what
is the probability that the fuel used for a random sample of 25 light vehicles is
under 18? (b) between 20 and 24?
Given:
Required:
Solution:
a.
b.
V. What are the important concepts about the Central Limit Theorem? (10 points)
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The
McGraw-Hill Companies, Inc.
Devore, J. (2012). Probability and Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning
Montgomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers.
(6th Ed.). John Wiley & Sons, Inc.
LEARNING GUIDE
CONTENT/TECHNICAL INFORMATION
What comes to your mind about the picture below? You are right! This is about
population (the larger circle) and getting the sample from it (smaller circle). According to
Ott and Longnecker (2004), a population is the set of all measurements of interest to the
sample collector while a sample is any subset of measurements selected from the sample.
Singh and Masuko (2014) denotes that sampling is related with the selection of a subset of
individuals from a population to estimate the characteristics of whole population.
Figure 1
Sampling Procedure
Basic Concepts
Montgomery and Rungers (2014) describe that we use statistical inference when we
use statistical methods to make decisions and draw conclusions about population. It has two
areas, parameter estimation and hypothesis testing. Ott and Longnecker define parameter
estimation as making inferences about parameters where one predicts the value of the
population parameter. We will discuss parameter estimation in this session and hypothesis
testing will be discussed on week 8.
2. By Rice (2007)
a. The normal or Gaussian distribution involves two parameters, μ and σ, where
μ is the mean and σ2 is the variance of the distribution.
1 (𝑥−𝜇)2
1 −
𝑓 (𝑥|𝜇, 𝜎 ) = 𝑒 2 𝜎2 , −∞ < 𝑥 < ∞
𝜎√ 2𝜋
Figure 2
Generating random numbers from Microsoft Excel
Figure 2 shows the steps on how to assign random numbers to the students. Suppose
the population size is 20 and the sample size is 5. This is only for the sake of demonstration.
We will have a detailed lesson on several methods of getting the sample size.
1. Assign a distinct number for each student. This will keep their anonymity.
2. Generate random numbers by using the command, “=rand()” on the second
column.
3. Add another column where you are going to paste the values of the random
number. This is because the numbers on column B will keep on changing values.
4. Highlight column B or the random numbers and click “copy”. Then position the
cursor to column C and click “paste values”.
5. You may delete column B. Highlight the new column B. Then click “Sort and
Filter”. Click “Sort Smallest to Largest”.
6. Click “Expand the selection”.
7. Since you only need 5 samples, then the first 5 numbers that correspond the
persons, will be your respondents or participants.
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
For example, the population is 100, with 60 males and 40 females. The
prescribed sample size is 80 and your grouping will be based on sex.
4. Cluster Sampling – divide the population area into sections (or clusters).
Then randomly select some of those clusters. Then choose all members
from selected clusters. For example, the researcher will conduct a study
about TUP. Using cluster sampling, s/he will randomly select suppose two
schools. If for example, Talisay and Taguig campuses will come out from
random selection, then all students of Talisay and Taguig will be part of
the study. The study will be considered as a result of the whole TUP
system.
Figure 3
Convenience sampling
(Source: Triola)
3. Degree of variability – the distribution of the attributes of the population. The more
heterogeneous a population, the larger is the needed sample
size to obtain precision.
– The proportion of 50% indicates maximum variability and is
often used in determining a more conservative sample size
For example: Find the sample size if there are 100 persons in a group.
Let e = 0.05
N 100
n= = 1+100 (0.05)2 = 80
1+Ne2
𝑍2 𝑝 𝑞
𝑛0 =
𝑒2
𝑍2 𝑝 𝑞 1.962 (0.5)(.5)
𝑛0 = = = 385 farmers
𝑒2 0.052
c. Formula for the minimum sample size needed for an interval estimate of the
population mean (Bluman)
𝑍/2 . 𝜎 2
n=( )
𝑒
2.58 (4.33) 2
n=[ ] = 31.2 ≈ 32 feet deep
2
PROGRESS CHECK
This serves as your answer sheet. Kindly answer the following, take its picture and email to
your teacher. Subject: Midterm Progress 2 Filename: CYS, Surname, First Name
Population =
Sample size =
First year: n =
Second year: n =
Third year: n =
Fourth year: n =
1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20
21 22 23 24 25 26 27 28 29 30
31 32 33 34 35 36 37 38 39 40
N = ________ n= k=
II. Discussion: Use the space provided for your answer. Please do not use extra sheet
of paper. For each item, you will be graded using the following criteria:
Content – 10 points
Organization of ideas – 3 points
Spelling and grammar – 2 points
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The McGraw-
Hill Companies, Inc.
Cochran, W. (1977). Sampling Techniques (3rd Ed.). John Wiley and Sons, Inc.
James Cook University Australia (n.d.). Basic Statistics: Sample vs Population Distributions.
Rice, J. (2007). Mathematical Statistics and Data Analysis (3rd Ed.). Thomson Learning,
Inc.
Singh, A. & Masuku, M. (2014). Sampling Techniques & Determination of Sample Size in
Applied Statistics Research: An Overview. International Journal of Economics,
Commerce and Management. 2(11). 1 – 22.
Tejada, J. & Punzalan, R. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician. 61(1). 129 – 136.
LEARNING GUIDE
CONTENT/TECHNICAL INFORMATION
Figure 1
Basics of Hypothesis Testing
(Source: https://www.psychologywizard.net/hypotheses-ao1-ao2.html)
In the previous learning guide, we have discussed estimation as one of the areas of
inferential statistics. This time, we are going to talk about hypothesis testing as another area
of inferential statistics. As many would say, hypothesis is an educated guess or assumption.
Hypothesis is singular and its plural form is hypotheses. We need to test the hypothesis in
order to generalize the population.
Alternative hypothesis – statement that the parameter has a value that differs
from the null hypothesis.
– otherwise known as the research hypothesis
– symbolized by H1 (Some use Ha or HA)
– symbolic forms use one of these symbols , <, >
One-tailed test – indicates that the null hypothesis should be rejected when the
test value is in the critical region on one side of the mean.
• Left-tailed
• Right-tailed
Two-tailed test – indicates that the null hypothesis should be rejected when the
test value is in either of the two critical regions.
Let us have the following examples. Write the symbolic form of the following
hypotheses. Write in symbolic form. Identify if it is null or alternative. If null, give its
alternative form, and vice-versa. Lastly, identify the test type.
1. The mean burning rate of the airplane propellant is equal to 50 cm/second.
2. The mean lifetime of a battery is greater than 36 months.
3. The mean grade of the students is at most 7.5.
Answers:
1. The mean burning rate of the airplane propellant is equal to 50 cm/second.
• Symbolic form: μ = 50 – null hypothesis
• Alternative hypotheses: μ ≠ 50
• Test type: two-tailed
Concepts
Statistical test – uses the data obtained from a sample to decide whether the
null hypothesis should be rejected.
Test value - the numerical value obtained from a statistical test
Level of significance – the maximum probability of committing a type I error.
– symbolized by α
Critical value – separates the critical region from the noncritical region.
– symbolized by C.V.
Critical or rejection region – is the range of values of the test statistic that
cause us to reject the null hypothesis
Figure 2
Critical/ rejection regions of a two-tailed test
(Source: Triola)
Figure 3
Critical/ rejection region of a left-tailed test
(Source: Triola)
Figure 4
Critical/ rejection region of a rightt-tailed test
(Source: Triola)
Figures 2, 3, and 4 show the rejection regions. The rejection region starts at the critical
value. Then, it extends to the direction set by the alternative hypothesis. If the alternative
hypothesis uses words such as less than, is below, is lower than, is shorter than, is smaller
than or is reduced from, then the alternative hypothesis is left tailed. Meanwhile, if it uses
greater than, is above, is higher than, is longer than, is bigger than, or is increased to, then, it
is right-tailed. Words such as is different from or is not equal to indicate two-tailed test.
In testing hypothesis, we have only two decisions:
1. Reject the null hypothesis; or
2. Fail to reject the null hypothesis.
Failing to reject the null hypothesis is more appropriate than “accept the null
hypothesis” because the available evidence is not strong enough to warrant rejection
of the null hypothesis. Also, we are not proving the null hypothesis (Triola; Bluman)
Suppose that we are interested in the burning rate of the solid propellant.
Burning rate is a random variable that can be described by a probability distribution.
Suppose that our interest focuses on the mean burning rate (a parameter of this
distribution). Specifically, we are interested in deciding whether the mean burning
rate is 50 centimeters per second or not.
Table 1
Outcomes of Hypothesis Testing
H0 is true H0 is false
Decision
(μ = 50 cm/sec) (μ ≠ 50 cm/sec)
This example shows how important is the evaluation of the of the accuracy of the
hypothesis and the correctness of the decision made. Both are important in the study. In this
case, the safety of the airplane crew and the passenger are at stake.
Concepts
Let us have another example. Say, our research hypothesis says that the vaccine that
was made significantly decreases the COVID-19 or Corona Virus Disease 2019.
H0: The vaccine is not effective
H1: The vaccine is effective
Table 2
Outcomes of Hypothesis Testing
H0 is true H0 is false
Decision
(The vaccine is not effective) (The vaccine is effective)
Figure 5
Wording of the Conclusion
(Source: Triola)
A researcher wishes to see if the mean number of days that a basic, low-
price, small automobile sits on a dealer’s lot is 29. A sample of 30 automobile
dealers has a mean of 30.1 days for basic, low-price, small automobiles. At =
0.05, test the claim that the mean time is greater than 29 days. The standard
deviation of the population is 3.8 days.
Example 1:
Suppose the mean grade of 9 randomly selected males is 7.3 while the mean
grade of 9 females was 6.9. Does this mean that males perform better than females?
Figure 6
Finding the p-value of the t-test using Microsoft Excel
p = 0.0000285
3. Decide
Since p is less than 0.05, let us reject the null hypothesis.
4. Conclude
There is sufficient evidence to warrant rejection of the claim that there is no
significant difference in the means of the two groups of circuits.
The central limit theorem states that when the sample size is large,
approximately 95% of the sample means taken from a population and same
sample size will fall within 1.96 standard errors of the population mean, that
𝜎
is, μ = 1.96 ( 𝑛).
√
PROGRESS CHECK
This serves as your answer sheet. Kindly answer the following, take its picture and email to
your teacher. Subject: Midterm Progress 3 Filename: CYS, Surname, First Name
1. Hypothesis testing
2. Alternative hypothesis
3. Null hypothesis
4. p-Value
5. Confidence interval
II. What are the components of a formal hypothesis testing using traditional
method? (5 points)
1. _____________________________________________________
2. _____________________________________________________
3. _____________________________________________________
4. _____________________________________________________
5. _____________________________________________________
III. Follow the steps in hypothesis using p-value method. The output of the
Microsoft Excel is shown in the last part. (25 points)
Two machines are used for filling plastic bottles with a net volume of 16.0 ounces.
The fill volume can be assumed to be normal with standard deviation σ 1= 0.020 and σ2 =
0.025 ounces. A member of the quality engineering staff suspects that both machines fill to
the same mean net volume, whether this volume is 16.0 ounces. A random sample of 10
bottles is taken from the output of each machine. (Montgomery and Rungers, p. 381)
Here is the Microsoft Excel Report from the data given on this problem.
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The
McGraw-Hill Companies, Inc.
Hirpara, N., Jain, S., & Gupta, A. (2015). Interpreting Research Findings with Confidence
Interval. Journal of Orthodontics & Endodontics Volume (1)8. 1-4.
Montogomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers
(6th Ed.). John Wiley & Sons, Inc.
REFERENCES
Bluman, A. (2012). Elementary Statistics: A Step by Step Approach (8th Ed.). The
McGraw-Hill Companies, Inc.
Cochran, W. (1977). Sampling Techniques (3rd Ed.). John Wiley and Sons, Inc.
Devore, J. (2012). Probability and Statistics for Engineering and the Sciences (8 th Ed.).
Brooks/Cole, Cengage Learning
Hirpara, N., Jain, S., & Gupta, A. (2015). Interpreting Research Findings with Confidence
Interval. Journal of Orthodontics & Endodontics Volume (1)8. 1-4.
James Cook University Australia (n.d.). Basic Statistics: Sample vs Population Distributions.
Montgomery, D., & Rungers, G. (2014). Applied Statistics and Probability for Engineers
(6th Ed.). John Wiley & Sons, Inc.
Rice, J. (2007). Mathematical Statistics and Data Analysis (3rd Ed.). Thomson Learning,
Inc.
Singh, A. & Masuku, M. (2014). Sampling Techniques & Determination of Sample Size in
Applied Statistics Research: An Overview. International Journal of Economics,
Commerce and Management. 2(11). 1 – 22.
Tejada, J. & Punzalan, R. (2012). On the Misuse of Slovin’s Formula. The Philippine
Statistician. 61(1). 129 – 136.