MS2 CHP 1-10 by Mark Yu

MODULE BUSINESS STATISTICS
CHAPTER 10: Regression and Correlation

OBJECTIVES :
After reading this chapter, you should be able to:
 Explain regression and correlation.

 Interpret a positive or negative correlation.
 Calculate the Pearson’s correlation coefficient.
 Interpret the degree of relationship between two variables.
When investigating the relationship between two or more numeric variables, it is

important to know the difference between correlation and regression. The similarities/differences
and advantages/disadvantages of these tools are discussed here along with examples of each.
Correlation quantifies the direction and strength of the relationship between two numeric
variables, X and Y, and always lies between -1.0 and 1.0. Simple linear regression relates X to Y
through an equation of the form Y = a + bX.
Key similarities
 Both quantify the direction and strength of the relationship between two numeric
variables.
 When the correlation (r) is negative, the regression slope (b) will be negative.
 When the correlation is positive, the regression slope will be positive.
 The correlation squared (r2 or R2) has special meaning in simple linear regression. It
represents the proportion of variation in Y explained by X.
Key differences
 Regression attempts to establish how X causes Y to change and the results of the analysis
will change if X and Y are swapped. With correlation, the X and Y variables are
interchangeable.
Chapter 10 1
 Regression assumes X is fixed with no error, such as a dose amount or temperature

setting. With correlation, X and Y are typically both random variables*, such as height
and weight or blood pressure and heart rate.
 Correlation is a single statistic, whereas regression produces an entire equation.
*The X variable can be fixed with correlation, but confidence intervals and statistical tests are no
longer appropriate. Typically, regression is used when X is fixed.
Key advantage of correlation
 Correlation is a more concise (single value) summary of the relationship between two
variables than regression. In result, many pairwise correlations can be viewed together at
the same time in one table.
Key advantage of regression
 Regression provides a more detailed analysis which includes an equation which can be
used for prediction and/or optimization.
LESSON 10.1 Correlation Coefficient

Correlation coefficients are used in statistics to measure how strong a relationship is
between two variables. There are several types of correlation coefficient, but the most popular is
Pearson’s. Pearson’s correlation (also called Pearson’s R) is a correlation coefficient commonly
used in linear regression. If you’re starting out in statistics, you’ll probably learn about
Pearson’s R first. In fact, when anyone refers to the correlation coefficient, they are usually
talking about Pearson’s.
Correlation coefficient formulas are used to find how strong a relationship is between data. The
formulas return a value between -1 and 1, where:
 1 indicates a strong positive relationship.

 -1 indicates a strong negative relationship.
 A result of zero indicates no relationship at all.
Graphs showing a correlation of -1, 0 and +1
Chapter 10 2
A correlation coefficient of 1 means that for every positive increase in one variable, there
is a positive increase of a fixed proportion in the other. For example, shoe sizes go up in (almost)
perfect correlation with foot length.
A correlation coefficient of -1 means that for every positive increase in one variable,
there is a negative decrease of a fixed proportion in the other. For example, the amount of gas in
a tank decreases in (almost) perfect correlation with speed.
Zero means that for every increase, there isn’t a positive or negative increase. The two
just aren’t related.
The absolute value of the correlation coefficient gives us the relationship strength. The larger the
number, the stronger the relationship. For example, |-.75| = .75, which has a stronger relationship
than .65.
Types of correlation coefficient formula.
There are several types of correlation coefficient formulas.
One of the most commonly used formulas in stats is Pearson’s correlation coefficient formula. If
you’re taking a basic stats class, this is the one you’ll probably use:
Pearson correlation coefficient - Correlation between sets of data is a measure of how well they
are related. The most common measure of correlation in stats is the Pearson Correlation. The full
name is the Pearson Product Moment Correlation (PPMC). It shows the linear
relationship between two sets of data. In simple terms, it answers the question, Can I draw a line
graph to represent the data? Two letters are used to represent the Pearson correlation: Greek
letter rho (ρ) for a population and the letter “r” for a sample.
Two other formulas are commonly used: the sample correlation coefficient and the
population correlation coefficient.
Sample correlation coefficient
Sx and Sy are the sample standard deviations, and Sxy is the sample covariance.
Population correlation coefficient
The population correlation coefficient uses σx and σy as the population standard deviations, and
σxy as the population covariance.
Chapter 10 3
LESSON 10.2 Testing Correlation Coefficient
USING PEARSON’S CORRELATION COEFFICIENT
Example: Find the value of the correlation coefficient from the following table:
S UBJE CT AGE X GLUCOS E LE VE L Y
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.
S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2
1 43 99
2 21 65
3 25 79
4 42 75
5 57 87
6 59 81
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99
= 4,257.
1 43 99 4257
2 21 65 1365
Chapter 10 4
3 25 79 1975
4 42 75 3150
5 57 87 4959
6 59 81 4779
Step 3: Take the square of the numbers in the x column, and put the result in the x2 column.
1 43 99 4257 1849
2 21 65 1365 441
3 25 79 1975 625
4 42 75 3150 1764
5 57 87 4959 3249
6 59 81 4779 3481
Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Step 5: Add up all of the numbers in the columns and put the result at the bottom of the
column. The Greek letter sigma (Σ) is a short way of saying “sum of.”
Chapter 10 5
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
Step 6: Use the following correlation coefficient formula.
The answer is: 2868 / 5413.27 = 0.529809
From our table:
 Σx = 247
 Σy = 486
 Σxy = 20,485
 Σx2 = 11,409
 Σy2 = 40,022
 n is the sample size, in our case = 6
The correlation coefficient =
 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]

= 0.5298
The range of the correlation coefficient is from -1 to 1. Our result is 0.5298 or 52.98%, which
means the variables have a moderate positive correlation.
LESSON 10.3 Simple Linear Regression

Etymology
“Linear” means line. The word Regression came from a 19th-Century Scientist, Sir Francis
Galton, who coined the term “regression toward mediocrity” (in modern language,
that’s regression to the mean. He used the term to describe the phenomenon of how nature tends
to dampen excess physical traits from generation to generation (like extreme height).
Chapter 10 6
Why use Linear Relationships?

Linear relationships, i.e. lines, are easier to work with and most phenomenon are naturally
linearly related. If variables aren’t linearly related, then some math can transform that
relationship into a linear one, so that it’s easier for the researcher (i.e. you) to understand.
What is Simple Linear Regression?

You’re probably familiar with plotting line graphs with one X axis and one Y axis. The X
variable is sometimes called the independent variable and the Y variable is called the dependent
variable. Simple linear regression plots one independent variable X against one dependent
variable Y. Technically, in regression analysis, the independent variable is usually called
the predictor variable and the dependent variable is called the criterion variable. However, many
people just call them the independent and dependent variables. More advanced regression
techniques (like multiple regression) use multiple independent variables.
Regression analysis can result in linear or nonlinear graphs. A linear regression is where the
relationships between your variables can be described with a straight line. Non-linear
regressions produce curved lines.( **)
Simple linear regression for the amount of rainfall per year.
Regression analysis is almost always performed by a computer program, as the equations are
extremely time-consuming to perform by hand.
The Linear Regression Equation

Linear regression is a way to model the relationship between two variables. You might also
recognize the equation as the slope formula. The equation has the form Y= a + bX, where Y is
the dependent variable (that’s the variable that goes on the Y axis), X is the independent variable
(i.e. it is plotted on the X axis), b is the slope of the line and a is the y-intercept.
The first step in finding a linear regression equation is to determine if there is a relationship
between the two variables. This is often a judgment call for the researcher. You’ll also need a list
of your data in x-y format (i.e. two columns of data—independent and dependent variables).
Warnings:
1. Just because two variables are related, it does not mean that one causes the other. For
example, although there is a relationship between high GRE scores and better performance
in grad school, it doesn’t mean that high GRE scores cause good grad school performance.
2. If you attempt to try and find a linear regression equation for a set of data (especially
through an automated program like Excel or a TI-83), you will find one, but it does not
necessarily mean the equation is a good fit for your data. One technique is to make a scatter
Chapter 10 7
plot first, to see if the data roughly fits a line before you try to find a linear regression
equation.
How to Find a Linear Regression Equation: Steps

Step 1: Make a chart of your data, filling in the columns in the same way as you would fill in the
chart if you were finding the Pearson’s Correlation Coefficient.
1 43 99 4257 1849 9801
2 21 65 1365 441 4225
3 25 79 1975 625 6241
4 42 75 3150 1764 5625
5 57 87 4959 3249 7569
6 59 81 4779 3481 6561
Σ 247 486 20485 11409 40022
From the above table, Σx = 247, Σy = 486, Σxy = 20485, Σx2 = 11409, Σy2 = 40022. n is the
sample size (6, in our case).
Step 2: Use the following equations to find a and b.
a = 65.1416
b = .385225
Find a:
2
 ((486 × 11,409) – ((247 × 20,485)) / 6 (11,409) – 247 )
 484979 / 7445
 =65.14
Find b:
2
 (6(20,485) – (247 × 486)) / (6 (11409) – 247 )
2
 (122,910 – 120,042) / 68,454 – 247
 2,868 / 7,445
 = .385225
Chapter 10 8
Step 3: Insert the values into the equation.

y’ = a + bx
y’ = 65.14 + .385225x
* Note that this example has a low correlation coefficient, and therefore wouldn’t be too good at
predicting anything.
For further understanding and example check the following link below:
 https://www.statisticshowto.com/probability-and-statistics/correlation-
coefficient-formula/
 https://www.statisticshowto.com/probability-and-statistics/regression-
analysis/find-a-linear-regression-
equation/#:~:text=Linear%20regression%20is%20a%20way%20to%20model%2
0the%20relationship%20between%20two%20variables.&text=The%20equation
%20has%20the%20form,a%20is%20the%20y%2Dintercept.
 https://www.youtube.com/watch?v=11c9cs6WpJU
 https://www.youtube.com/watch?v=jf-SIOFUuEo
 https://www.youtube.com/watch?v=27ywsOzDzJM
 https://www.youtube.com/watch?v=aztcS-3MwH0
Chapter 10 9
CHAPTER 9: Test of Hypothesis

OBJECTIVES :

 Explain hypothesis and hypothesis testing.
 Explain how to reject or accept null hypothesis.
 Apply an appropriate statistical test for the hypothesis.
 Generate the procedure for hypothesis testing.
Hypothesis testing is a statistical technique that is used in a variety of situations. Though

the technical details differ from situation to situation, all hypothesis tests use the same core set of
terms and concepts. The following descriptions of common terms and concepts refer to a
hypothesis test in which the means of two populations are being compared.
The main purpose of statistics is to test a hypothesis. For example, you might run an
experiment and find that a certain drug is effective at treating headaches. But if you can’t repeat
that experiment, no one will take your results seriously. A good example of this was the cold
fusion discovery, which petered into obscurity because no one was able to duplicate the results.
Lesson 9.1 : Basic Concepts of Hypothesis Testing

A hypothesis is an educated guess about something in the world around you. It should be
testable, either by experiment or observation. For example:
 A new medicine you think might work.
 A way of teaching you think might be better.
 A possible location of new species.
 A fairer way to administer standardized tests.
It can really be anything at all as long as you can put it to the test.
What is a Hypothesis Statement?
If you are going to propose a hypothesis, it’s customary to write a statement. Your statement will
look like this:
“If I…(do this to an independent variable)….then (this will happen to the dependent variable).”
For example:
 If I (decrease the amount of water given to herbs) then (the herbs will increase in size).
 If I (give patients counseling in addition to medication) then (their overall depression
scale will decrease).
 If I (give exams at noon instead of 7) then (student test scores will improve).
 If I (look in this certain location) then (I am more likely to find new species).
A good hypothesis statement should:
 Include an “if” and “then” statement (according to the University of California).

 Include both the independent and dependent variables.
Chapter 9 1
 Be testable by experiment, survey or other scientifically sound technique.

 Be based on information in prior research (either yours or someone else’s).
 Have design criteria (for engineering or programming projects).
What is Hypothesis Testing?
Where:
Z = z score
̅ = mean
𝒙
𝝁𝟎 = null hypothesis
𝝈 = standard deviation
𝒏 = total number of observations
Hypothesis testing in statistics is a way for you to test the results of a survey or experiment to see
if you have meaningful results. You’re basically testing whether your results are valid by
figuring out the odds that your results have happened by chance. If your results may have
happened by chance, the experiment won’t be repeatable and so has little use.
Hypothesis testing can be one of the most confusing aspects for students, mostly because before
you can even perform a test, you have to know what your null hypothesis is. Often, those tricky
word problems that you are faced with can be difficult to decipher. But it’s easier than you think;
all you need to do is:
1. Figure out your null hypothesis,

2. State your null hypothesis,
3. Choose what kind of test you need to perform,
4. Either support or reject the null hypothesis.
Null hypothesis - the null hypothesis is a clear statement about the relationship between two (or
more) statistical objects. These objects may be measurements, distributions, or categories.
Typically, the null hypothesis, as the name implies, states that there is no relationship. In the
case of two population means, the null hypothesis might state that the means of the two
populations are equal.
Alternative hypothesis - Once the null hypothesis has been stated, it is easy to construct
the alternative hypothesis. It is essentially the statement that the null hypothesis is false. In our
example, the alternative hypothesis would be that the means of the two populations are not equal.
Significance - The significance level is a measure of the statistical strength of the hypothesis
test. It is often characterized as the probability of incorrectly concluding that the null hypothesis
is false. The significance level is something that you should specify up front. In applications, the
significance level is typically one of three values: 10%, 5%, or 1%. A 1% significance level
represents the strongest test of the three. For this reason, 1% is a higher significance level than
10%.
Power - Related to significance, the power of a test measures the probability of correctly
concluding that the null hypothesis is true. Power is not something that you can choose. It is
Chapter 9 2
determined by several factors, including the significance level you select and the size of the
difference between the things you are trying to compare. Unfortunately, significance and power
are inversely related. Increasing significance decreases power. This makes it difficult to design
experiments that have both very high significance and power.
Test statistic - The test statistic is a single measure that captures the statistical nature of the
relationship between observations you are dealing with. The test statistic depends fundamentally
on the number of observations that are being evaluated. It differs from situation to situation.
Distribution of the test statistic - The whole notion of hypothesis rests on the ability to specify
(exactly or approximately) the distribution that the test statistic follows. In the case of this
example, the difference between the means will be approximately normally distributed
(assuming there are a relatively large number of observations).
One-tailed vs. two-tailed tests - Depending on the situation, you may want (or need) to employ
a one- or two-tailed test. These tails refer to the right and left tails of the distribution of the test
statistic. A two-tailed test allows for the possibility that the test statistic is either very large or
very small (negative is small). A one-tailed test allows for only one of these possibilities.
In an example where the null hypothesis states that the two population means are equal, you need
to allow for the possibility that either one could be larger than the other. The test statistic could
be either positive or negative. So, you employ a two-tailed test. The null hypothesis might have
been slightly different, namely that the mean of population 1 is larger than the mean of
population 2. In that case, you don’t need to account statistically for the situation where the first
mean is smaller than the second. So, you would employ a one-tailed test.
Critical value - The critical value in a hypothesis test is based on two things: the distribution of
the test statistic and the significance level. The critical value(s) refer to the point in the test
statistic distribution that give the tails of the distribution an area (meaning probability) exactly
equal to the significance level that was chosen.
Decision - Your decision to reject or accept the null hypothesis is based on comparing the test
statistic to the critical value. If the test statistic exceeds the critical value, you should reject the
null hypothesis. In this case, you would say that the difference between the two population
means is significant. Otherwise, you accept the null hypothesis.
P-value - The p-value of a hypothesis test gives you another way to evaluate the null hypothesis.
The p-value represents the highest significance level at which your particular test statistic would
justify rejecting the null hypothesis. For example, if you have chosen a significance level of 5%,
and the p-value turns out to be .03 (or 3%), you would be justified in rejecting the null
hypothesis.
Lesson 9.2 : Hypothesis Testing Examples

Examples #1: Basic Example
A researcher thinks that if knee surgery patients go to physical therapy twice a week (instead of 3
times), their recovery period will be longer. Average recovery time for knee surgery patients is
8.2 weeks.
The hypothesis statement in this question is that the researcher believes the average recovery
time is more than 8.2 weeks. It can be written in mathematical terms as:
H1: μ > 8.2
Chapter 9 3
Next, you’ll need to state the null hypothesis (See: How to state the null hypothesis). That’s
what will happen if the researcher is wrong. In the above example, if the researcher is wrong
then the recovery time is less than or equal to 8.2 weeks. In math, that’s:
H0 μ ≤ 8.2
Rejecting the null hypothesis
Ten or so years ago, we believed that there were 9 planets in the solar system. Pluto was demoted
as a planet in 2006. The null hypothesis of “Pluto is a planet” was replaced by “Pluto is not a
planet.” Of course, rejecting the null hypothesis isn’t always that easy—the hard part is usually
figuring out what your null hypothesis is in the first place.
Hypothesis Testing Examples (One Sample Z Test)
The one sample z test isn’t used very often (because we rarely know the actual
population standard deviation). However, it’s a good idea to understand how it works as it’s one
of the simplest tests you can perform in hypothesis testing. In English class you got to learn the
basics (like grammar and spelling) before you could write a story; think of one sample z tests as
the foundation for understanding more complex hypothesis testing. This page contains two
hypothesis testing examples for one sample z-tests.
A principal at a certain school claims that the students in his school are above average
intelligence. A random sample of thirty students IQ scores have a mean score of 112. Is there
sufficient evidence to support the principal’s claim? The mean population IQ is 100 with
a standard deviation of 15.
Step 1: State the Null hypothesis. The accepted fact is that the population mean is 100, so: H0:
μ=100.
Step 2: State the Alternate Hypothesis. The claim is that the students have above average IQ
scores, so:
H1: μ > 100.
The fact that we are looking for scores “greater than” a certain point means that this is a one-
tailed test.
Step 3: Draw a picture to help you visualize the problem.
Step 4: State the alpha level. If you aren’t given an alpha level, use 5% (0.05).
Step 5: Find the rejection region area (given by your alpha level above) from the z-table. An area
of .05 is equal to a z-score of 1.645.
Chapter 9 4
Step 6: Find the test statistic using this formula:

For this set of data: z= (112.5 – 100) / (15/√30) = 4.56.
Step 6: If Step 6 is greater than Step 5, reject the null hypothesis. If it’s less than Step 5, you
cannot reject the null hypothesis. In this case, it is greater (4.56 > 1.645), so you can reject the
null.
One Sample Hypothesis Testing Examples: #3
Blood glucose levels for obese patients have a mean of 100 with a standard deviation of 15. A
researcher thinks that a diet high in raw cornstarch will have a positive or negative effect on
blood glucose levels. A sample of 30 patients who have tried the raw cornstarch diet have a mean
glucose level of 140. Test the hypothesis that the raw cornstarch had an effect.
Step 1: State the null hypothesis: H0:μ=100
Step 2: State the alternate hypothesis: H1:≠100
Step 3: State your alpha level. We’ll use 0.05 for this example. As this is a two-tailed test, split
the alpha into two.
0.05/2=0.025
Step 4: Find the z-score associated with your alpha level. You’re looking for the area in one tail
only. A z-score for 0.75(1-0.025=0.975) is 1.96. As this is a two-tailed test, you would also be
considering the left tail (z = 1.96)
Step 5: Find the test statistic using this formula:

z = (140 – 100) / (15/√30) = 14.60.
Step 6: If Step 5 is less than -1.96 or greater than 1.96 (Step 3), reject the null hypothesis. In this
case, it is greater, so you can reject the null.
*This process is made much easier if you use a TI-83 or Excel to calculate the z-score (the
“critical value”).
 https://www.statisticshowto.com/probability-and-statistics/hypothesis-testing/
 https://www.youtube.com/watch?v=VK-rnA3-41c
 https://www.youtube.com/watch?v=zR2QLacylqQ
 https://www.youtube.com/watch?v=zJ8e_wAWUzE
 https://www.youtube.com/watch?v=FU9UR9XVZwc
Chapter 9 5
CHAPTER 8: Probabilities
OBJECTIVES :
 Organize values/data by using Venn diagram.

 Differentiate permutation from combination.
 Calculate the probability of an event.
Lesson 8.1: Factorial Notations

Factorial Notation, Formula, and Basic Examples
When you first encountered an algebra problem with exclamation mark “!“, you probably
thought it was a trick question. You didn’t know how to handle it because you have no idea what
it meant. As you know, symbols in math are everything. The key is to recognize that each
mathematical symbol has an implied meaning. Most of the time, it suggests some kind of
operation that tells us what to do with a number. The best way to understand how it works is to
look at a specific example.
Suppose you are asked to evaluate 5!5! which is read as “five factorial“.
You can approach this in two ways.
Two Ways to Evaluate the Factorial of a Number
 Counting Down Method:

Start with the number 5, and count down until you reach 1. Then multiply those numbers to get
the answer.
 Counting Up Method:
Or, you may do it the other way around. Begin by counting from 1 until you reach the target
number which in this case is 5. Multiply those factors to obtain the answer.
So here’s the general formula of factorial that I think you need to remember. It doesn’t matter
which one you use to solve a problem, the answer will come out the same. However, the first one
is the “preferred” way so ask your teacher if you’re not sure.
Two Ways to Expand the Factorial of the Variable n Written as n!
Chapter 8 1
Before we go over some worked examples, remember the special rule that “zero factorial is equal
to one“.
Examples of How to Evaluate Factorials involving Whole Numbers
Example 1: Evaluate the factorial expression 6!.

If you decide to use the descending format of whole numbers, count down from six until one
then get their product. That’s all there is to it.
This is considered as the “full expansion” of 6! because we list down all its factors, that is,
starting from the given number 6 and decreased by 1 in every sequence until we reach the
number 1.
Example 2: Evaluate the factorial expression 7!.

This next example is intended to illustrate that you can easily solve a factorial problem by using
the value from the previous calculation. You don’t have to always write out all the factors
because it can become tedious and redundant in no time.
To solve for 7!, I will expand the expression until I see six factorial, 6! , because we already
know its value which is 6! = 720.
Since we don’t list all the factors of 7! , we may consider this is a “partial expansion”.
Some calculators such as TI-84 have the capability to quickly compute for the factorial of any
number. The command is usually located under the Probability menu.
Example 3: Simplify the expression 5! + 3!.

I see some students commit this common error when dealing with the addition of factorials. They
treat this problem just like adding regular variables, i.e. 5x + 3x = 8x. That’s absolutely incorrect.
 Wrong Solution:
Chapter 8 2
 Correct Solution:
What you should do instead is to evaluate each factorial separately then add them together.
Checking our answer with a calculator,
Factorial Notation
We are all familiar with multiplication. The factorial notation is a symbol that we use to represent
a multiplication operation. But it is more than just a symbol. In the space below we will see what
the factorial notation is and how we can use it to make our calculations easier. Let us begin with
the introduction of the factorial and then we will see some solved examples of the same.
The factorial notation comes in handy when you are arranging objects. Consider the following
scenario that we shall use to use to define and introduce this notation. For example, you have ten
balls. Each ball has a number marked on it. You also have ten slots that you have to fill with the
balls. How many different ways can you fill these slots in?
The first slot can be filled in 10 ways because you have 10 different balls to fill it with. You can
fill the second slot in 9 ways. Since one of the slots already has a ball in it. Similarly, we can fill
the next slot in 8 ways and so on. What is the total number of ways we can arrange these 10 balls
in ten slots? This will be got from the fundamental principle of counting. The total number of
ways is 10×9×8×7×6×5×4×3×2×1.
For all such arrangements, we will see a similar pattern of multiplication. For example, for any
number ‘n’, we can make n×(n-1)×(n-2)×(n-3)×(n-4)×(n-5)×…×3×2×1. This is where we use the
factorial notation. We define the factorial of a positive integer as the product of the integer with
all the numbers lesser than it all the way up to 1.
Chapter 8 3
We define the factorial of a number as the product of consecutive descending natural numbers and
represent it by !. For example, the factorial of 4 or 4! = 4×3×2×1. Similarly the factorial of 7 or 7!
= 7×6×5×4×3×2×1. Similarly, we can find the factorials of all the positive integers. In the
factorial notation, we define the factorial of 0 to be = 1. So 0! = 1. By convention, 0! = 1. Also 1!
= 1. Then 2! = 2 ! 1 = 2 and 3! = 3 ! 2 ! 1 = 6. Likewise, 4! = 4 ! 3 ! 2 ! 1 = 24 and 5! = 5 ! 4 ! 3 !
2 ! 1 = 120.
Example 1: Aman has 12 balls that have different numbers on it. He makes all the possible
arrangements for the 12 different balls. Shoaib also has 6 balls that he arranges in all the possible
orders. What is the ratio of the arrangements that Aman makes to the number of arrangements that
Shoaib makes?
Answer: We know that the number of arrangements that we can make for any ‘n’ number of
objects is given by n factorial or n!. Since Aman is arranging 10 objects, he will be able to do it in
10! ways. Similarly, Shoaib has 6 different objects that he will arrange in 6! ways. The ratio will
simply be = 10!/6!
We can write this as (10×9×8×7×6×5×4×3×2×1)/6!
In other words, we can say that the ratio is = (10×9×8×7×6!)/6! = (10×9×8×7) /1 = 5040:1
Example 2: Evaluate the following: (i) 14!/8! (ii) 12!/(3!)(5!)
Answer: (i) The calculations with factorials can be difficult. We should try to reduce the
numerator or the denominator such that a factorial term cancels itself. For example, in the first
example we can write:
(14×13×12×11×10×9×8!)/8! = 14×13×12×11×10×9 = 2162160.
(ii) We shall use the same method to simplify the second part. Here we have, 12!/(3!)(5!) which
can be written as:
(12×11×10×9×8×7×6×5!)/(3×2×1)(5!) = 665280
Examples of Arrangements
Example 3: A compact disc has 10 songs. The random play feature will play all of these songs in
an unknown permutation (i.e., in order, without repeats).
(a) How many possible permutations are there of these 10 songs?
(b) If you select only your 8 favourite tracks and then use the random play, then how many
possible arrangements will there be of these 8 songs?
(c) If you only have time to hear 4 songs on random play, then how many possibilities are there
for a playing of 4 different songs from the entire CD?
Solution. (a) Since all 10 songs are to be arranged in order without repeats (i.e., permuted), there
are 10! = 3,628,800 possibilities.
(b) Now only 8 songs are to be permuted, so there are 8! = 40,320 possibilities.
(c) Now 4 songs chosen from a set of 10 are to be listed in order without repeats. So now there are
Chapter 8 4
10×9×8×7 = 5,040 possibilities. Note that the value in (c) also is given by
(10×9×8×7×6×5×4×3×2×1)/(6×5×4×3×2×1) = 10!/6!
The 10! in the numerator comes from the total number of 10 songs. The 6! in the denominator
comes from the number of unused songs in the list.
Example 4: In a psychological word association test, a computer will randomly pick a letter from
the alphabet (A – Z) without repeating letters. The subject will have to say the first word coming
to mind that starts with that letter. If the test goes on for 16 letters, then how many possibilities are
there for the list of letters?
Answer: Because letters are not being repeated, we compute the number of choices by
26×25×24×. . .× until we have multiplied 16 terms together. It would be much easier to use
26!/10! ≈ 1.111363×1020 (which is a lot of possibilities). The 26! comes from the total set of 26
letters, and 10! comes from the number of 10 unused letters in the list.
Lesson 8.2: Permutations and Combinations

Permutations follow directly from the fundamental principle of counting. The
permutations are all the different number of ways in which we can arrange a number of objects.
Since the objects can only be counted from the natural numbers or the counting numbers, we can
say that in permutations we will only encounter the positive integers. Here we will introduce the
formulae for permutations and see how we can use them to solve the questions of various exams.
Given a set of N distinct objects, a permutation is an arrangement of the entire set in order without
repetitions. There are N! ways to permute the entire set. The value N! is called “N factorial” and
is computed by:
N! = N × (N – 1) × (N – 2) × . . . × 1.
This gives the number of permutations for N objects taken all at a time. Suppose a set has N
distinct objects and we wish to make a list of ‘k’ of these objects (in order without repeats). For
example, from a group of 32 balls, we
need 3 balls for a slot number 1, a slot number 2 and a slot number 3. How many choices are
possible?
Answer: We are listing 3 without repeats from a group of 32, so there are 32 × 31 × 30 = 29,760
possible choices. Notice that the number of choices also can be computed by 32!/29! ; but in this
case, it is easier to use 32 × 31×30. However, if we were arranging a larger portion of the set, then
it would be more convenient to use the factorial notation.
The General Formula Of Permutations
In the above example, we saw that if we are permuting or arranging 32 objects into 3 slots or in
other words 32 objects in three ways, then the number of arrangements can be written as 32!/29!.
We can write this as:
32!/(32 – 3)!. If we generalise this, we can see that if we have ‘n’ objects taken say ‘r’ at a time,
the total number of permutations is equal to n!/(n-r)!. this is the general formula for
permutations. The expression P(n, r), also written nPr, is calculated by:
P(n, r ) = n!/(n-r)!
You must use the difference in the denominator.
Chapter 8 5
For example, P(14, 6) = 14!/(14-6)! = 14!/8! = 2,162,160.

The formula for P(n, r ) gives the number of ways to permute a group of r objects selected from
the larger group of n objects.
Remember that in permutations, the order does matter. This means that if we have two letters say
A and A, then AA and AA where the order of the two A’s has changed will count as two
permutations. Let us see some more examples:
For example, we have two identical balls that we have marked as ‘a’ and ‘a’. Then instead of one
arrangement, we count them as two because in permutations the order matters.
Example 1: Suppose you want to arrange your English, Hindi, Mathematics,

History, Geography and Science books on a shelf. In how many ways can you do it?
Source: indiamart
Answer: Here we have to arrange 6 books. As we know that the number of permutations of n
objects is n! = n (n – 1)(n – 2) … 2.1
Here n = 6 and therefore, number of permutations is 6! = 6.5.4.3.2.1 = 720. Therefore the number
of ways we can arrange the six books on the shelf = 720.
Example 2: Suppose you have 6 happy birthday cards for your friends and you want to send
them to 4 of your
friends. In how many ways can you send these cards to 4 of your friends?
Answer: Here we have to find the number of permutations of 4 objects out of 6 objects. In other
words, we have to count the number of permutations of six objects take 4 at a time. This can be
done as follows:
This number is 6(6-1)(6-2)(6-3) = 6.5.4.3 = 360. We can also do this in an easy way as below:
6
P4 = 6!/(6-4)! = 6!/2! = 6.5.4.3 = 360. Therefore, cards can be sent in 360 ways.
Example 3: In a library, there are 4 books on fairy tales, 5 books are novels and 3 books are on
plays. In how many ways can you arrange these so that books on the fairy tales are together in one
place. The novels are together and plays are also together. The requirement is that these books
Chapter 8 6
should be in a specific order i.e., books on fairy tales, before novels, before plays.
Answer: There are 4 books on fairy tales and they have to be put together. They can be arranged
in 4! ways.
Similarly, there are 5 novels. They can be arranged in 5! ways. And there are 3 books on plays.
They can be arranged in 3! ways. So, by the counting principle all of them together can be
arranged in 4! × 5! × 3! ways = 17280 ways.
Type II
Example 4: In the above example what is the number of permutations if the books are not to be
kept in order?
Answer: Whenever you are asked to keep a particular class of objects together, a convenient trick
is to sort of glue them together in your head and treat them as one object. First, we consider the
books on fairy tales, novels and plays as single objects.
These three objects i.e the one group of fairy tale books, the one group of novels and the one
group of plays can be arranged in 3! ways = 6 ways.
Let us fix one of these 6 arrangements. This may give us a specific order, say, novels → fairy
tales → plays. Given this order, the books on the same subject can be arranged as follows. In
other words, now we have to count the internal permutations. The 4 books on fairy tales can be
arranged among themselves in 4! = 24 ways.
The 5 novels can be arranged in 5! = 120 ways. The 3 plays can be arranged in 3! = 6 ways.
For a given order, the books can be arranged in 24×120×6 = 17280 ways.
Therefore, for all the 6 possible orders the books can be arranged in 6×17280 = 103680 ways.
Combination
The number of combinations is the number of ways in which we can select a group of objects
from a set. For example, if you have ‘n’ objects, in how many ways can you select or choose these
‘n’ objects? Moreover, if the order is taken into consideration then it is the same as the number
of permutations. But since the order doesn’t matter, there is only one way to do it! Which means
that if you have to select ‘n’ objects taking ‘n’ at a time, there is only one way to do it.
How about something smaller than ‘n’? Let us see this with the help of an example. Consider that
there are 4 objects and you have to select 2 objects from them. Then how many selections can you
Chapter 8 7
do? You can pick the first two, the second two, the middle two, the first and the last and so on. If
you count, you will find that there are exactly 6 ways to do it.
Thus, combinations are just permutations where the order is not taken into account. So the
number of permutations will always be greater than the number of combinations. Using the
definition of permutations, we can get the combination formula. Let us see how!
Combination Formula
Let us say that we have 10 items out of which we will have to select 2 items. How many
arrangements can we make? The number of arrangements will be given by = 10P2 = 90. So there
are 90 arrangements that we can make from 10 objects if we take 2 at a time. What if the order of
the arrangement was not taken into account? For example, we mark one object A and the other B.
Then if AB and BA are considered as one arrangement, we say that order doesn’t matter. What
will be the number of arrangements in such a case? In that case, it will be the number of ways we
can select two items out of a group of 10 items.
To get that, we need to cancel the number of arrangements that are generated because of order.
For example, if we take 2 objects then they can be arranged in 2 factorial(2!) ways and so on. So
we need to cancel these 2 factorial ways. Thus the number of ways in which we can “select” 2
items from a group of 10 items = 10P2 /2!. This is the combination formula.
In general, we say that if we have a group of ‘n’ objects out of which we make a selection taking
‘r’ objects at a time, then the number of such selections or arrangements is given by nPr/r!
This is known as the combination formula. We represent combination formula as
n
Cr = n!/r!(n-r)!
Other names for it are ‘n choose r’ or ‘binomial coefficient’.
Let us see real-world applications of combination formula.
Example 1: Find the number of subsets of the set {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11} having 4
elements.
A) 340 B) 430 C) 330 D) 430
Answer: C. The set has 11 elements. Any subset that we form has to have 4 elements from the
set. Here the order of choosing the elements doesn’t matter. The set { 1, 2, 3, 4} is the same as {4,
3, 2, 1}. Therefore, this is a problem in combinations.
We can do this by using the combination formula as:
11
C 4 = 11!/4!(11-4)! = 11!/7! = (11.10.9.8)/4.3.2.1 = 330 ways.
Example 2: The Indian Cricket team consists of 16 players. It includes 2 wicketkeepers and 5
bowlers. In how many ways can you select a cricket team of eleven players if you have to select 1
wicketkeeper and at least 4 bowlers?
Chapter 8 8
A) 1024 B) 1028 C) 1092 D) 1084
Answer: C. If we have to select a team of 11 players from a roster of 16 players then the total
number of ways would be 16C11. But here we have to select 11 players including 1 wicketkeeper
and 4 bowlers or 1 wicketkeeper and 5 bowlers.
Note that there are a total of 2 wicketkeepers and 5 bowlers to choose from. So the number of
ways of selecting 1 wicket keeper, 4 bowlers and 6 other players =
2
C1×5C4×9C6 = 840.
Furthermore, the number of ways of selecting 1 wicket keeper, 5 bowlers and 5 other players.
2
C1×5C5×9C5 = 252
Therefore, the total number of ways of selecting the team = 840 + 252 = 1092.
Lesson 8.3: Probabilities

Probability
How likely something is to happen.
Many events can't be predicted with total certainty. The best we can say is how likely they are to
happen, using the idea of probability.
Tossing a Coin
When a coin is tossed, there are two possible outcomes:
 heads (H) or
 tails (T)
We say that the probability of the coin landing H is ½ And the probability of the coin
landing T is ½
Throwing Dice
Chapter 8 9
When a single die is thrown, there are six possible outcomes: 1, 2, 3, 4, 5, 6.
The probability of any one of them is 16
Probability
In general:
Number of ways it can happen

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑎𝑛 𝑒𝑣𝑒𝑛𝑡 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔 =
𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑢𝑡𝑐𝑜𝑚𝑒𝑠
Example: the chances of rolling a "4" with a die
Number of ways it can happen: 1 (there is only 1 face with a "4" on it)
Total number of outcomes: 6 (there are 6 faces altogether)
So the probability = 16
Example: there are 5 marbles in a bag: 4 are blue, and 1 is red. What is the probability that a blue
marble gets picked?
Number of ways it can happen: 4 (there are 4 blues)
Total number of outcomes: 5 (there are 5 marbles in total)
So the probability = 45 = 0.8
Example: there are 5 marbles in a bag: 4 are blue, and 1 is red. What is the probability that a blue
marble gets picked?
Number of ways it can happen: 4 (there are 4 blues)
Total number of outcomes: 5 (there are 5 marbles in total)
So the probability = 45 = 0.8
Probability Line
We can show probability on a Probability Line:
Probability is always between 0 and 1
Chapter 8 10
Probability is Just a Guide
Probability does not tell us exactly what will happen, it is just a guide
Example: toss a coin 100 times, how many Heads will come up?
Probability says that heads have a ½ chance, so we can expect 50 Heads.
But when we actually try it we might get 48 heads, or 55 heads ... or anything really, but in most
cases it will be a number near 50.
Some words have special meaning in Probability:
Experiment: a repeatable procedure with a set of possible results.
Example: Throwing dice
We can throw the dice again and again, so it is repeatable.
The set of possible results from any single throw is {1, 2, 3, 4, 5, 6}
Outcome: A possible result of an experiment.
Example: Getting a "6"
Sample Space: all the possible outcomes of an experiment.
Example: choosing a card from a deck
Chapter 8 11
There are 52 cards in a deck (not including Jokers)
So the Sample Space is all 52 possible cards: {Ace of Hearts, 2 of Hearts, etc... }
The Sample Space is made up of Sample Points:
Sample Point: just one of the possible outcomes

Example: Deck of Cards
 the 5 of Clubs is a sample point

 the King of Hearts is a sample point
"King" is not a sample point. There are 4 Kings, so that is 4 different sample points.
Example: Throwing dice
There are 6 different sample points in the sample space.
Chapter 8 12
Event: one or more outcomes of an experiment

Example Events:
An event can be just one outcome:
 Getting a Tail when tossing a coin

 Rolling a "5"
An event can include more than one outcome:
 Choosing a "King" from a deck of cards (any of the 4 Kings)

 Rolling an "even number" (2, 4 or 6)
Hey, let's use those words, so you get used to them:
Example:
Alex wants to see how many times a "double" comes up when throwing 2 dice.
The Sample Space is all possible Outcomes (36 Sample Points):
{1,1} {1,2} {1,3} {1,4} ... {6,3} {6,4} {6,5} {6,6}
The Event Alex is looking for is a "double", where both dice have the same number. It is made
up of these 6 Sample Points:
{1,1} {2,2} {3,3} {4,4} {5,5} and {6,6}
These are Alex's Results:
Is it a
Experiment
Double?
{3,4} No
{5,1} No
{2,2} Yes
Chapter 8 13
{6,3} No
... ...
After 100 Experiments, Alex has 19 "double" Events ... is that close to what you would expect?
 https://www.toppr.com/guides/quantitative-aptitude/permutation-and-
combination/factorial-notation/
combination/permutations/
combination/combination/
 https://www.mathsisfun.com/data/probability.html
 https://youtu.be/SFPGVTThJNk
 https://youtu.be/ZxV-kf0yBss
 https://youtu.be/hZxnzfnt5v8
 https://youtu.be/saO1yLxd1p8
Chapter 8 14
MODULE QUANTITATIVE METHOD
CHAPTER 7: Measures of Symmetry and

Peaknedness
OBJECTIVES :

• Differentiate skewness from kurtosis.
• Examine the skewness of the distribution.
• Examine the kurtosis of the distribution.
Lesson 7.1 : Skewness

• Skewness refers to the symmetry or asymmetry of the frequency distribution.
• The direction of the long tail of the distribution points the direction of the skewness.
THE SHAPES OF DISTRIBUTIONS
SYMMETRIC DISTRIBUTION
A frequency distribution is symmetric when a vertical line can be drawn through the middle of a
graph of the distribution and the resulting halves are approximately mirror images.
Examples of symmetric distributions are IQ scores and heights of adult males.
UNIFORM DISTRIBUTION
A frequency distribution is uniform (or rectangular) when all entries, or classes, in the
distribution have equal or approximately equal frequencies. A uniform distribution is also
symmetric.
Chapter 7 1
SKEWED LEFT DISTRIBUTION
▪ A frequency distribution is skewed when the “tail” of the graph elongates more to one
side than to the other.
▪ A distribution is skewed left (negatively skewed) when its tail extends to the left.
Example, a negatively skewed distribution results if the majority of students score very high
on examination. These scores will tend to cluster to the right of the distribution.
SKEWED RIGHT DISTRIBUTION
A distribution is skewed right (positively skewed) when its tail extends to the right.
Example, if an instructor gave an examination and most of the students did poorly, their scores
would tend to cluster on the left side of the distribution. A few high scores would constitute the
tail of the distribution, which would be on the right side.
Pearson’s Coefficient of Skewness
Chapter 7 2
• For a perfectly symmetrical distribution the value of sk = 0
• If the value of sk > 0 then the frequency polygon is skewed to the right.
• If the value of sk < 0 then the frequency polygon is skewed to the left.
• There is no lower or upper limit for Pearson’s skewness coefficient.
Examples:
Using the tables below, find the coefficient of skewness of the scores and make an analysis.
a) b)
A 50-Item Test in Statistics A 60-Item Test in Physics
Chapter 7 3
The following values were calculated from the Achievement Test Results of the Experimental
group and the Control group in Mathematics 10:
INTERPRETATION:
The skewness value of the experimental group is 0.374 while the control group is 0.739.
Both data indicate positive skewness, which it interpreted means that the scores of both groups of
student respondents tend to be low. However, the coefficient of skewness of the experimental
group is lower than what the control group received. This implies that the scores of the
experimental group are less spread than the control group.
Lesson 7.2 : Kurtosis

Kurtosis
• Kurtosis is from the Greek word kyrtos or kurtos meaning bulging.
• In statistics, kurtosis (or excess) is a statistical measure used to describe the distribution
of observed data around the mean.
It measures the relative peakedness or flatness of a distribution (as compared to the normal
distribution, which shows a kurtosis of zero).
Chapter 7 4
Types of kurtosis
1. Leptokurtic are distributions where values clustered heavily or pile up in the center.
There are tall distribution with narrow humps and long and high tails. Its kurtosis is
positive. (kurtosis>3) and it denotes a high degree of peakness.
2. Mesokurtic are intermediate distribution which are neither too peaked nor too flat. The
values are immediately distributed about the center. Its kurtosis is three. (kurtosis=3)
3. Platykurtic are flat distributions with values more evenly distributed about the center
with broad humps and short tails. Its kurtosis is negative (kurtosis < 3) and it denotes a
low degree of peakness.
• If one distribution (green curve) is more peaked than another, then it is more leptokurtic.
• If it is less peaked (red curve), then it is said to be more platykurtic.
• The normal curve (blue curve) is mesokurtic.
Example:
Eight selected students were asked to report the total number of hours they spent on the Internet
last week. Find the coefficient of skewness and kurtosis of the collected data below.
Chapter 7 5
Solution:
Interpretation:
This indicates that there is a slight positive skewness in the hour distribution they spent on the
Internet.
Solution:
INTERPRETATION
The kurtosis value of 1.31 is less than 3, thus, the hour distribution they spent on the Internet is
Platykurtic.
Chapter 7 6
Example:
Solution:
INTERPRETATION
The kurtosis value of 1.94 is less than 3, hence, the score distribution is Platykurtic . It means
that the number of students is slightly distributed among the score interval.
▪ https://www.youtube.com/watch?v=XSSRrVMOqlQ
▪ https://www.youtube.com/watch?v=U0NZu6f5TMI&t=10s
▪ https://www.youtube.com/watch?v=TM033GCU-SY&t=364s
▪ https://www.youtube.com/watch?v=HnMGKsupF8Q
• https://www.itl.nist.gov/div898/handbook/eda/section3/eda35b.htm#:~:text=Skewn
ess%20is%20a%20measure%20of,precisely%2C%20the%20lack%20of%20symmetry.
&text=Kurtosis%20is%20a%20measure%20of,have%20heavy%20tails%2C%20or%20o
utliers.https://www.mathsisfun.com/data/standard-normal-distribution.html
Chapter 7 7
CHAPTER 6: THE NORMAL DISTRIBUTION

OBJECTIVES:
• Calculate normal probability distribution.

• Explain how the normal distribution plays a central role in statistical inference.
• Differentiate the types of probability distribution.
Lesson 6.1: The Normal Curve

The Normal Distribution
Standard deviation is a statistic that characterizes a distribution of score. It increases

indirect proportion as the scores spread out more widely, the larger the standard deviation, the
wider the spread of scores.
The meaning of standard deviation is best defined by normal distribution of scores. The
normal distribution is illustrated by the normal curve. Normal curve is a symmetrical curve
having a bell-like shape.
The total area under the normal curve represents all of the scores in a normal distribution.
In such a curve, the mean, the median, the mode are identical, so the mean falls at the exact
center of the curve. The curve has no boundaries in either direction, for the curve never touches
the baseline no matter how far it is extended. The curve is a curve of probability, not of certainty.
Percentage under the Normal Curve
Since the curve is symmetrical, this holds true for both side of the mean. In presented,
approximately 68.27% of the scores lie between +1sd and -1sd. Furthermore, about 13.59% of
the scores fall between 1sd and 2sd. All of the scores in a normal distribution lie between the
mean plus or minus standard deviations.
If a set of scores is normally distributed, one can interpret any particular score if he
knows how far, in standard deviation units, it is from the mean.
Chapter 6 1
Example
The mean of a normal distribution is 41 and the standard deviation is 11. How does an
individual’s score of 52 compare with all the other scores? How about the score of 19?
If a person’s score is 52, then slightly more than 84.135% of all the other scores in a distribution
lie below his score. If a person’s score is 19, then slightly more than 97.725% of all the scores in
a distribution fall above his score.
PRATICAL APPLICATIONS OF THE NORMAL CURVE
In the field of educational research, there are a number of practical applications of the normal
curve, among which are:
1. To calculate the percentile rank of scores in a distribution.
2. To normalize a frequency distribution, which is an important process in standardizing a

psychological test or inventory.
3. To test the significance of observe measures in experiments, relating them to the chance
fluctuations or errors that are inherent in the process of sampling and generalizing about
population form which the samples are drawn (BEST, 1990).
Lesson 6.2 Standard Normal Curve and Standard

Scores
STANDARD SCORES
Researchers are often interested in seeing how person’s score compares with another’s. To
determine this, researchers convert raw scores to derived scores such standard scores.
Standard scores use a common scale to indicate how an individual compares to other
individuals in a group. These scores are particularly helpful in comparing an individual’s relative
position on different instruments. (Fraenkel, 1994)
TWO MOST FREQUENTLY USED STANDARD SCORES
z scores. These standard scores tell how far a raw score is from the mean in standard deviation
units. The formula is x − x
z =
s
Chapter 6 2
where:
x = any raw score
𝑥̅ = mean
s = standard deviation of the score distribution
t scores. These are z scores that are expressed in another way. The formula is:
 x−x 
t = 50 +10 or
 
 s 
t = 50 + 10 z
Example:
1. Ryan got a grade of 86 on the final examination in English for which the mean grade was
76 and the standard deviation was 10. On the final examination in Mathematics for which
the mean grade was 83 and the standard deviation was 16, he received a grade of 91. In
which subject was his relative standing higher?
Interpretation:
The score of Ryan in English is 1 unit standard deviation above the mean. His score in
Mathematics is 0.5 unit standard deviation above the mean. Thus, his relative standing was
higher in English.
2. Prof. Miko wanted to get a student’s equally weighted mean achievement on Math test and
Science test. The data are shown below.
Chapter 6 3
Interpretation:
The mean standard score of -1.75 indicates that on an equally weighted mean, the performance
of the student was fairly consistent that is 2 standard deviations below the mean in Math and 1.5
standard deviations below the mean in Science.
Lesson 6.3 Table of Areas Under the Normal Curve

Computing Normal Probabilities
Example:
If a random variable has the standard normal distribution, what are the probabilities that it will
take on a value Between zero and 1.27?
Thus, the area between zero and 1.27 is 0.3980 0r 39.80%
Chapter 6 4
Example:
Find the area under the standard normal distribution curve between 0.49 to 1.23
Hence, the area of the curve between 0.49 to 1.23 is 0.20.28 or 20.28%
Example:
Find the area under the standard normal distribution curve Between -0.79 and 1.43
Chapter 6 5
Hence, the area of the curve between -0.79 to 1.43 is 0.7088 or 70.88%
Example:
Find the area under the standard normal distribution curve to the right of z =1.96.
Hence, the area of the curve to the right of 1.96 is 0.250 or 2.50%
Chapter 6 6
Example:
Find the area under the standard normal distribution curve to the left of z = 0.5
Therefore, the area of the standard normal distribution curve to the left of 0.5 is 0.5915 or
59.15%
APPLICATION OF THE NORMAL CURVE
• Continuous Random Variable (measurable)

• Discrete Random Variable (Countable)
Continuous Random Variable (measurable)
The scores of the grade six pupils have a mean of 5.23 and standard deviation of 0.25.
• What percentage of all these scores are lower than 6?
• What percentages of these scores are between 5 and 6?
Chapter 6 7
Chapter 6 8
Discrete Random Variable (Countable)
Yedha is a Home Economics teacher. She knows from experience that the number of budget
meals she sells each day is a random variable having approximately a normal distribution with
the mean equal to 30.2 and standard deviation equal to 4.5. What are the probabilities that in any
value day she will sell Exactly 25 budget meals? At most 25 budget meals?
Chapter 6 9
Continuous Random Variable (measurable)
A survey indicates that for each trip to Robinsons supermarket, a shopper spends a mean of 45
minutes with a standard deviation of 15 minutes in the store. The lengths of time spent in the
store are normally distributed and are represented by the variable x. A shopper enters the store.
a. Find the probability that the shopper will be in the store for each interval of time listed.
b. Interpret your answer when 150 shoppers enter the store. How many shoppers would you
expect to be in the store for each interval of time listed below?
1 . Between 25 and 55 minutes 2. More than 40 minutes
Chapter 6 10
Chapter 6 11
S tan d ar di z e d N or m a l Di s tr i b u t i on t ab le
Z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359
0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753
0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141
0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517
0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879
0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224
0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549
0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852
0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133
0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3340 0.3365 0.3389
1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621
1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830
1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015
1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177
1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319
1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441
1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545
1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633
1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706
1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767
2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817
2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857
2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890
Chapter 6 12
2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916
2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936
2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952
2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964
2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974
2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981
2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986
3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990
▪ https://www.youtube.com/watch?v=xgQhefFOXrM
▪ https://www.youtube.com/watch?v=iiRiOlkLa6A
▪ https://www.youtube.com/watch?v=2tuBREK_mgE&t=41s
▪ https://www.youtube.com/watch?v=p_KApjpyBHE&t=13s
• https://www.investopedia.com/terms/n/normaldistribution.asp#:~:text=A%20no
rmal%20distribution%20is%20the,all%20symmetrical%20distributions%20are
%20normal.
• https://www.mathsisfun.com/data/standard-normal-distribution.html
Chapter 6 13
CHAPTER 5: Measures of Dispersion or Variability

• Explain the meaning of variability
• Interpret the dispersion of scores.
• Differentiate range from quartile deviation
• Differentiate mean absolute deviation from standard deviation
• Calculate the range, mean absolute deviation, quartile deviation, variance and
standard deviation.
• A measure of variation is a single value that is used to describe the spread of the
distribution.
• Dispersion (or spread) refers to the extent to which the data values of a numeric random
variable are scattered about their central location value.
• A measure of central tendency alone does not uniquely describe a distribution.
LESSON 5.1 Range, Mean Absolute Deviation and

Quartile Deviation
RANGE FOR UNGROUPED AND GROUPED DATA
Range (R) - The mathematical difference between the highest (maximum) value and lowest
(minimum) value in a data set.
Formula: R = HV – LV
Properties of the Range
• It is quick and easy to understand.

• It is a rough estimation of dispersion/variability.
• It is easily affected by the extreme scores.

• The larger the value of the range, the more dispersed the observations are.
RANGE for UNGROUPED DATA
Chapter 5 1
Analysis:
The range of Section A (10) is greater than the range of Section B (8). It implies that the scores
in Section A are more spread out than the scores in Section B or the scores in Section B are less
scattered than the score in Section A.
RANGE for GROUPED DATA
INTERPRETATION OF RANGE VALUE
• When the range value is large, the scores in the distribution are more dispersed,
widespread, or heterogeneous.
• When the range value is small, the scores in the distribution are less dispersed, less
scattered, or homogeneous.
INTERQUARTILE RANGE, QUARTILE DEVIATION, AND MIDHINGE FOR

UNGROUPED AND GROUPED DATA
Interquartile Range (IQR)
• It gives the range of the middle portion (about half) of the data.
• It is the difference between the third quartile and the first quartile. In symbol,
IQR = Q3 − Q1
Properties of Interquartile Range
➢ Reduces the influence of extreme values.
➢ Not as easy to calculate as the Range.
➢ Only considers the middle 50% of the scores in the distribution
➢ The point of dispersion is the median value

Chapter 5 2
➢ It is a graph of a data set obtained by drawing a horizontal line from the minimum data
value to Q1, drawing a horizontal line from Q3 to the maximum data value, and drawing a
box whose vertical sides pass through Q1 and Q3 with a vertical line inside the box
passing through the median or Q2.
Quartile Deviation (QD)
It is based on the range of the middle 50% of the scores, instead of the range of the entire set.
• It indicates the distance we need to go above and below the median to include midhinge or
approximately the middle 50% of the scores. In symbol,
QD = Q3 − Q1 = IQR
2 2
Midhinge (MH)
• It is used to overcome potential problems introduced by extreme values (or outliers) in the data
set.
Q1 + Q3
MH =
2
Outlier
• It is a data entry that is far removed from the other entries in the data set.
• It is an extremely high or an extremely low data value when compared with the rest of the data
values.
IQR, QD and MH for UNGROUPED DATA
Example:
Chapter 5 3
Chapter 5 4
Chapter 5 5
Guidelines
Using the Interquartile Range to Identify Outliers
Ql – 1.5(IQR) < x < Q3 + 1.5(IQR)
Any data entry x falls outside the interval is an outlier.
Chapter 5 6
OUTLIER for UNGROUPED DATA
Mean Deviation For Ungrouped And Grouped Data
Mean Deviation
• It measures the average deviation of the scores from arithmetic mean.
• It gives equal weight to the deviation of every score in the distribution.
MEAN DEVIATION for UNGROUPED DATA
MEAN DEVIATION for GROUPED DATA
Chapter 5 7
Lesson 5.2 : Variance and Standard Deviation for

Ungrouped Data
Variance
• It is the square of the standard deviation and is also known as the mean square.
• It is important measure of variation
• It shows variation about the mean
Standard Devation
• The most common used indicator of the degree of dispersion and is also the most
dependable measure to estimate the variability in a total population.
• It may be referred to as the root-mean-square of the deviations from the arithmetic

means.
• The standard deviation is also the square root of variance.
Two Methods in Computing the Variance and Standard Deviation for Ungrouped Data:
Chapter 5 8
Working Formula
• It is the sum of squares of the deviation about the mean divided by the number of cases
minus one (n – 1) or the degrees of freedom.
( x − x )
2
s2 =
n −1
Example:
n =10
x=
 x
n
346
=
10
x = 34.6

2
( x − x ) = 830.4
Chapter 5 9
Machine Formula
• It is obtained by getting the difference of the sum of squares of each score and correction
factor divided by n – 1.
Hence, according to this method, the variance is
− CF
s2 =  x2
n −1
Lesson 5.3 : Variance and Standard Deviation for

Grouped Data
Variance and Standard Deviation for Grouped Data
In getting the variance and standard deviation using the n point method, the steps are as follows:
1. Multiply frequency ( f ) times midpoint (X) to get the fX and add all the fX values to get
the sum of fX or .
2. Multiply the fX times to X get fX2 and add all the values to get
3. Compute the variance and standard deviation from grouped data using the formula below.
n fX 2 − (  fX )
2
s2 =
Chapter 5 10
,= s
n2 − n
Chapter 5 11
Computation of the Variance and Standard Deviation
Interpretation of Standard Deviation
The most accept interpretation of standard deviation is within the range from one standard
deviation below the mean (-1s) to one standard deviation above the mean (+1s). Consider the
data of table where the arithmetic mean is 72.4 and the standard deviation is 13.70.
The distance from -1s to +1s on the scale of measurement is 58.7 to 86.10 cycles. Within these
limits are scores of 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 79,
80, 81, 82, 83, 84, 85 and 86.
Example
Consider the scores of 30 Grade 10 students in English Exam out of 50 points.
Mean = 23
sd = 9.52
Chapter 5 12
Comparing Standard Deviation
Chapter 5 13
• https://www.toppr.com/guides/business-mathematics-and-statistics/measures-
of-central-tendency-and-dispersion/measure-of-dispersion/
▪ https://youtu.be/Cx2tGUze60s
▪ https://youtu.be/s7WTQ0H0Acc
▪ https://youtu.be/wDAd_QHKoOg
Chapter 5 14
CHAPTER 4: MEASURES OF POSITION

OBJECTIVE:
➢ Solve the quartiles, deciles and percentiles of grouped and ungrouped data.
➢ Explain quartile, decile and percentile.
➢ Calculate the quartile, decile and percentile (grouped and ungrouped data).
A measure of position is a method by which the position that a particular data value has within a
given data set can be identified.
QUANTILES
It is a score distribution where the scores are divided into different equal parts.
There are three kinds of quantiles.
➢ Quartile
➢ Decile
➢ Percentile
LESSON 4.1 Quartiles, Deciles and Percentiles for

Ungrouped Data
Chapter 4 1
Page 1 of 9
QUARTILES
DECILES
PERCENTILES
QUANTILE OF UNGROUPED DATA
Chapter 4 2
Page 1 of 9
Formula to be used:
William Mendenhall and Terry Sincich developed the steps in order to solve quantiles for
ungrouped data
Steps in solving quantile for the ungrouped data using Mendenhall and Sincich method
➢ First, arrange the data in ascending of descending order. Then find n.
➢ Second, locate the position of the score distribution using Mendenhall and Sincich
method.
➢ Third, if the result or rank has a decimal number, then linear interpolation is needed.
Steps of Interpolation
➢ Step 1: Arrange the score in ascending order.
➢ Step 2: Locate the position of the score in the distribution (using the given formula in
finding the location of the score)
➢ Step 3: If the result is a decimal number, proceed for the interpolation.
➢ Step 4: Find the difference between the two values where in Qk, Dk or Pk is situated.
➢ Step 5: Multiply the result in Step 4 by the decimal part obtained in Step 3.
➢ Step 6: Add the result in Step 5 to the 2nd smaller number in Step 4.
EXAMPLE:
Using the given data 6, 8, 10, 12, 12, 14, 15, 16, 20. Find the 1st Quartile, 6th Decile and 65th
Percentile.
Solution:
6, 8, 10,12, 12, 14, 15, 16, 20
1st quartile Linear Interpolation:
Chapter 4 3
Page 1 of 9
Chapter 4 4
Page 1 of 9
LESSON 4.2 Quartiles, Deciles and Percentiles for

Grouped Data
Where:
Qk = quartile
k = quartile location {1,2,3}
n = sample size
LB = lower boundary of the quartile class
f = frequency of the quartile class
cf = less than cumulative frequency before the quartile class
i = class interval
Example:
The data shown in frequency are distribution Scores of 40 students in a mathematics class consist
of 60 items. Find for the 1st quartile.
CL f CL f <cf
10-14 5 10-14 5 5
15-19 2 15-19 2 7
20-24 3 20-24 3 10
25-29 5 25-29 5 15
30-34 2 30-34 2 17
35-39 9 35-39 9 26
40-44 6 40-44 6 32
45-49 3 45-49 3 35
50-54 5 50-54 5 40
n=40 n=40
Solution:
Chapter 4 5
Page 1 of 9
DECILES FOR GROUPED DATA
Where:
Dk = Decile
k = decile location {1,2,3,…, 9} n = sample size
LB = lower boundary of the decile class f = frequency of the decile class
cf = less than cumulative frequency before the decile class i = class interval
Example:
of 60 items. Find for the 5th decile.
CL f
10-14 5
15-19 2
20-24 3
25-29 5
30-34 2
35-39 9
40-44 6
45-49 3
50-54 5
n=40
Chapter 4 6
Page 1 of 9
Where:
Pk = Percentile
k = percentile location {1,2,3,…, 99}
n = sample size
LB = lower boundary of the Percentile class
f = frequency of the percentile class
cf = less than cumulative frequency before the percentile class
i = class interval
Example:
of 60 items. Find for the 85th Percentile.
Chapter 4 8
Page 8 of 9
Solution:
• https://www.slideshare.net/mobile/chuckrymaunes5/measures-of-position-for-
ungrouped-data-quartiles-deciles-percentiles-130064276#:~:text=1.,DISTRIBUTION
%20INTO%20FOUR%20EQUAL%20PARTS.
• https://www.slideshare.net/mobile/chuckrymaunes5/measures-of-position-for-
grouped-data-quartiles
• https://www.slideshare.net/mobile/chuckrymaunes5/percentile-measures-of-
position-for-grouped-data
• https://www.slideshare.net/mobile/chuckrymaunes5/decile-measures-of-position-
for-grouped-data
▪ https://youtu.be/bRYWBbvOMpo
▪ https://youtu.be/8i70KsmqN9s
▪ https://youtu.be/FFYvNrRGVOo
▪ https://youtu.be/kECIGHFn6fk
Chapter 4 9
Page 8 of 9
Chapter 4 7
Page 7 of 9
Chapter 4 8
Page 8 of 9
Chapter 4 9
Page 9 of 9
MODULE QUANTITATIVE METHODS
CHAPTER 3: MEASURES OF CENTRAL TENDENCY

OBJECTIVES:
Any data set can be characterized by measuring its central tendency. A measure of
central tendency, commonly referred to as an average, is a single value that represents a data
set. Its purpose is to locate the center of the data set.
LESSON 3.1 MEAN

The arithmetic mean, often called as the mean, is the most frequently used measure of
central tendency. The mean is the only common measure in which all values plays an equal role
meaning to determine its values you would need to consider all the values of any given data set.
It is found by adding the data values and dividing the total number of data values.
PROPERTIES OF MEAN
1. A set of data has only one mean.
2. Mean can be applied for interval and ratio data.
3. All values in the data set are included in computing the mean.
4. The mean is very useful in comparing two or more data sets.
5. Mean is affected by the extreme small or large values on a data set.
6. The mean cannot be computed for the data in a frequency distribution with an open-
ended class.
SAMPLE MEAN FOR UNGROUP DATA
Formula:
ΣX
x̄ =
n
Where:
x̄ = sample mean
X = the value of any particular observations or measurement.
Σ X = Sum of all observations
n = total number of values in sample
Chapter 3 1
Example:
Five judges give their scores on the performance of a gymnast as follows: 8, 9, 9, 9, and 10.
find the mean score of a gymnast.
Solution:
Therefore, the mean score of a gymnast is 9.
SAMPLE MEAN FOR GROUP DATA
Formula:
Where:
x̄ = sample mean
f = frequency M
= Midpoint
Σ fM = sum of all the product of f and the midpoints
n = total number of values in the sample.
Example:
CLASS LIMITS f M fM
46-50 2 48 96
51-55 3 53 159
56-60 3 58 174
61-65 4 63 252
66-70 6 68 408
71-75 9 73 657
76-80 6 78 468
81-85 5 83 415
86-90 4 88 352
91-95 5 93 465
96-100 3 98 294
TOTAL 50 3,740
Chapter 3 2
Solution:
LESSON 3.2 MEDIAN

The median of a data set is the measure of center that is the middle value when the original data
values are arranged in order of increasing (or decreasing) magnitude
PROPERTIES OF MEDIAN
1. The median is unique, there is only one median for a set of data.
2. The median is found by arranging the set of data from lowest to highest (or highest to
lowest) and getting the value of the middle observation.
3. Median is not affected by the extreme small or large values.
4. Median can be computed for an open-ended frequency distribution.
5. Median can be applied for ordinal, interval, and ratio data.
MEDIAN FOR UNFROUPED DATA
To determine the value of median for ungrouped we need to consider two rules:
1. If n is odd, the median is the middle ranked.
2. If n is even, the median is the average of the two middle ranked values.
n +1
x̄ (rank value) =
Where:
2 n = is the number of sample size.
Example:
Find the median of the scores of 11 HRM 1st year students in their midterm Examination, the
data set is 92, 89, 87, 93, 94, 90, 88, 84, 90, 85, 82.
Solution: Arrangement of Data set;
n = 11 (odd) 82,84,85,87,88,89, 90, 90, 92, 93, 94
Chapter 3 3
x̄= n+1
2
11+1 Hence, the Median is 89.
=
2
Chapter 3 3
̄x = 6 ( r a n k v a l u e )
MEDIAN FOR GROUPED DATA
Median from grouped data in a form of frequency distribution is applicable when the number of
cases is 30 or more. The concept is to determine a value that falls 50 percent (50%) above and
the other half below it
Example
Determine the median of a frequency distribution on the ages of 50 people taking travel hours.
Solution:
Class Limits f cf
18-26 3 3 N 50
27-35 5 8 median = = = 25
36-44 9 17 2 2
45-53 14 31
Chapter 3 5
Chapter 3 6
LESSON 3.3 MODE

The mode of a data set is the value that occurs with the greatest frequency.
Like the median and unlike the mean, extreme values in a data set do not affect the mode. A data
may not contain any mode if none of the values is “most typical”.
A data set that has the only one value that occur with greatest frequency is
said to be unimodal. If the data has two values with the same greatest frequency, both values are
considered the mode and the data set is bimodal. if a data set have more than two modes, and the
data set is said to be multimodal. There are some cases when a data set values have the same
number of frequencies, when this occur, the data set is said to be no mode.
PROPERTIES OF MODE
1. The mode is found by locating the most frequently occurring value.
2. The mode is the easiest average to compute
3. There can be more than one mode or even no mode in any given data set.
4. Mode is not affected by the extreme small or large values.
5. Mode can be applied for nominal, ordinal, interval and ration data.
MODE FOR UNGROUPED DATA
Example:
Consider the heights in inches of 10 basketball players.
70, 70, 71, 71, 72, 72, 72, 72, 75, 75
The mode is 72, this implies that the most frequent height among the 10 basketball players is 72
inches.
MODE FOR GROUPED DATA
Mode from grouped data in a form of frequency distribution is applicable when the number of
cases (N) is 30 or more. The modal class is found in a class limit having the highest frequency.
Formula:
Chapter 3 7
Example:
Determine the mode of a frequency distribution on the ages of 50 people taking travel hours
Class Limits f cf
18-26 3 3
27-35 5 8
36-44 9 17
54-62 11 42
63-71 6 48
72-80 2 50
MIDRANGE
The midrange is the average of the lowest and highest value in a data set. This can be
computed using the formula;
X lowest + X highest
M id ra n g e =
2
Chapter 3 8
PROPERTIES OF MIDRANGE
• The midrange is easy to compute

• The midrange gives the midpoint
• The midrange is unique
• Midrange is affected by the extreme small or large values
• Midrange can be applied for interval and ratio data.
Example:
Find the midrange of the ages of 9 middle-management employees of a certain company. The
ages are 53, 45, 59, 48, 54, 46, 51, 58, and 55.
Solution:
Xlowest = 45 and Xhighest = 59
Xlowest + Xhighest
Midrange =
2
45+ 59
=
2
Midrange = 52 Therefore, the midrange age is 52.
Chapter 3 9
Chapter 3 10
CHAPTER 2: FREQUENCY DISTRIBUTION AND
GRAPHICAL METHODS

OBJECTIVE:
➢ Construct group frequency distribution.
➢ Describe different graphing frequency distribution.
➢ Construct a graphing frequency distribution.
Lesson 2.1 Constructing Frequency Distribution

Table
In statistics, a frequency distribution is a list, table or graph that displays the frequency of
various outcomes in a sample. Each entry in the table contains the frequency or count of the
occurrences of the values within a particular group or interval.
Raw data are data collected in an investigation and they are not organized systematically.
These are presented in the form of a frequency distribution called grouped data.
One way of presenting raw data is the frequency table. When the data are arranged in
tabular form by the frequencies, the table is called frequency table. The arrangement itself is
called frequency distribution.
It would be difficult to determine by scanning the mass of numerical data unless they
are organized into a frequency distribution table where drawing generalization will be readily
drawn. The construction of frequency distribution consists essentially of three steps;
1) Deciding on a set of groupings called classes,

2) Sorting or tallying the data into classes, and
3) Counting the number of tallies in each class called frequency
Rules in the construction of Frequency Distribution
1. We seldom use fewer than 5 or more than 15 classes. We note that it is impractical to group a
thousand measurements into 4 classes or to group 10 observations to 7 classes.
2. Whenever possible we make the classes cover equal ranges of values and make ranges
multiple of numbers that are easy to work with. Open classes should be avoided such as classes if
“less than,” or “more than.”
3. We make sure that each item goes only into one class. It means that classes should overlap.
4. In the final presentation of the table tally is usually omitted.
In deciding the number of classes, the statisticians Freud and Simon suggested the following:
Highest observed value−Lowest observed value

Suggested Class Interval =
Number of classes
Chapter 2 1
Page 1 of 12
However, if we cannot decide on the number of classes to be used, the suggested formula is:
Highest observed value−Lowest observed value

Suggested Class Interval =
1+3.322 log N
Where N denotes the number of observations.
Raw Data (Array of numbers arranged in smallest to largest)
18 26 34 36 38 41 43 44
45 50 50 51 52 52 53 53
54 54 55 58 58 59 60 60
61 61 62 62 62 62 63 63
66 66 66 71 71 77 79 80
For example, using the data in a given array of numbers above, the class interval is
80−18 62
= =9.8∨10, approximate size of class interval.
1+3.332 log 40 6.322
We note that:
1. This approximate value means that the number of class intervals maybe more than 10 may
be less than 10. If the highest value in the array of numbers is not yet included in the last class
interval, then we add some more intervals until all the scores or items in the list of raw data are
already included.
2. In Statistics, the value 1+3.322 log N is called the Slovin’s Formula.
Each category or class has two limits – a lower stated class limit and an upper stated class limit.
A common practice is to let the lower limit of the first class be a number below the lowest
observation and to make all the classes in equal lengths of class size. A convenient value to start
the first class is 10, o we may start with the smallest value of the array of numbers. Thus, the first
class would be 10-19. The resulting frequency distribution is given in the Table below.
Cumulative
Lower
Class Tally Frequency Frequency
Boundaries
(<cf)
10-19 I 1 1 9.5
20-29 I 1 2 19.5
30-39 III 3 5 29.5
40-49 IIII 4 9 39.5
50-59 IIIII IIIII III 13 22 49.5
60-69 IIIII IIIII III 13 35 59.5
70-79 IIII 4 39 69.5
80-89 I 1 40 79.5
Total 40
True Limits and Class Marks
Chapter 2 2
Page 1 of 12
A point that represents the halfway point between successive classes is called a true limit or a
class boundary. It is obtained by adding the upper limit of the class and the lower limit of the
next class and then divided by 2. The table below shows the true limits of classes given on the
previous example. Note that the upper boundary of one class is the lower boundary of the next
class Thus,
19+20
=19.5
2
A class mark is the midpoint of a class. It is determined by going halfway between the stated
class limits or the class boundaries. To obtain the class mark, the lower and upper stated class
limits or class boundaries are added and divided by two. Class marks are used to construct a
frequency polygon, which will be discussed in the graphical representation of data. Thus,
10+19
=14.5
2
is the class mark of the first class.
Class Limits, Class Boundaries and Class Marks
Stated Lower Upper Class

Classes
Lower Limit Upper Limit Boundary Boundary Mark
10-19 10 19 9.5 19.5 14.5
20-29 20 29 19.5 29.5 24.5
30-39 30 39 29.5 39.5 34.5
40-49 40 49 39.5 49.5 44.5
50-59 50 59 49.5 59.5 54.5
60-69 60 69 59.5 69.5 64.5
70-79 70 79 69.5 79.5 74.5
80-89 80 89 79.5 89.5 84.5
Lesson 2.2 Graphical Representation of Data

Graphical Representation is a way of analyzing numerical data. It exhibits the relation
between data, ideas, information and concepts in a diagram. It is easy to understand and it is one
of the most important learning strategies. It always depends on the type of information in a
particular domain.
There are different types of graphical representation. Some of them are as follows;
Line Graphs – Linear graphs are used to display the continuous data and it is useful for
predicting the future events over time.
Bar Graphs – Bar Graph is used to display the category of data and it compares the data using
solid bars to represent the quantities.
Histograms – The graph that uses bars to represent the frequency of numerical data that are
organized into intervals. Since all the intervals are equal and continuous, all the bars have the
same width.
Line Plot – It shows the frequency of data on a given number line. ‘ x ‘ is placed above a
number line each time when that data occurs again.
Chapter 2 3
Page 1 of 12
Frequency Table – The table shows the number of pieces of data that falls within the given
interval.
Circle Graph – Also known as pie chart that shows the relationships of the parts of the whole.
The circle is considered with 100% and the categories occupied is represented with that specific
percentage like 15%, 56%, etc.
Stem and Leaf Plot – In stem and leaf plot, the data are organized from least value to the
greatest value. The digits of the least place values from the leaves and the next place value digit
forms the stems.
Box and Whisker Plot – The plot diagram summarizes the data by dividing into four parts. Box
and whisker show the range (spread) and the middle (median) of the data.
General Rules for Graphical Representation of Data
There are certain rules to effectively present the data and information in the graphical
representation. They are:
Suitable Title: Make sure that the appropriate title is given to the graph which indicates the
subject of the presentation.
Measurement Unit: Mention the measurement unit in the graph
Proper Scale: To represent the data in an accurate manner, choose a proper scale.
Index: Index the appropriate colors, shades, lines, design in the graphs for better understanding
Chapter 2 4
Page 1 of 12
Data Sources: Include the source of information wherever it is necessary at the bottom of the
graph.
Keep it Simple: Construct a graph in an easy way that everyone can understand.
Neat: Choose the correct size, lettering, colors etc. in such a way that the graph should be a
visual aid for the presentation of information.
Generally, frequency distribution is represented in four methods, namely:
▪ Histogram
▪ Smoothed frequency graph
▪ Pie diagram
▪ Cumulative or ogive frequency graph
▪ Frequency Polygon
Frequency Polygon
A frequency polygon is a graphical form of representation of data. It is used to depict the

shape of the data and to depict trends. It is usually drawn with the help of a histogram but can be
drawn without it as well. A histogram is a series of rectangular bars with no space between them
and is used to represent frequency distributions.
Steps to Draw a Frequency Polygon
• Mark the class intervals for each class on the horizontal axis. We will plot the frequency
on the vertical axis.
• Calculate the classmark for each class interval. The formula for class mark is:
Classmark = (Upper limit + Lower limit) / 2
• Mark all the class marks on the horizontal axis. It is also known as the mid-value of every
class.
• Corresponding to each class mark, plot the frequency as given to you. The height always
depicts the frequency. Make sure that the frequency is plotted against the class mark and not
the upper or lower limit of any class.
• Join all the plotted points using a line segment. The curve obtained will be kinked.
• This resulting curve is called the frequency polygon.

Note that the above method is used to draw a frequency polygon without drawing a histogram.
You can also draw a histogram first by drawing rectangular bars against the given class intervals.
After this, you must join the midpoints of the bars to obtain the frequency polygon. Remember
that the bars will have no spaces between them in a histogram.
Chapter 2 5
Page 1 of 12
Example:
Construct a frequency polygon using the data given below:
Test Scores Frequency
49.5-59.5 5
59.5-69.5 10
69.5-79.5 30
79.5-89.5 40
89.5-99.5 15
Answer: We first need to calculate the cumulate frequency from the frequency given.
Test Scores Frequency Cumulative Frequency

49.5-59.5 5 5
59.5-69.5 10 15
69.5-79.5 30 45
79.5-89.5 40 85
89.5-99.5 15 100
We now start by plotting the class marks such as 54.5, 64.5, 74.5 and so on till 94.5. Note that
we will also plot the previous and next class marks to start and end the polygon, i.e., we plot
44.5 and 104.5 as well.
Then, the frequencies corresponding to the class marks are plotted against each class mark. Like
you can see below, this makes sense as the frequency for class marks 44.5 and 104.5 are zero and
touching the x-axis. These plot points are used only to give a closed shape to the polygon. The
polygon looks like this:
Chapter 2 6
Page 1 of 12
Histogram
▪ Grouped data are often represented graphically by histograms. A histogram consists of

rectangles, each of which has breadth equal or proportional to the size of the concerned
call interval, and height equal or proportional to the corresponding frequency. In a
histogram, consecutive rectangles have a common side. For this, the class intervals are
made overlapping in all cases.
Method of constructing a histogram:
▪ Step I: Observe the class intervals of the distribution. If they are nonoverlapping
(discontinuous), Change them into overlapping (continuous) classes.
▪ Step II: Locate the class boundaries on the x-axis (horizontal axis).
▪ Step III: Construct a vertical rectangle on each line segment representing a class interval
such that the height of the rectangle represents frequency of the class interval.
▪ Step IV: Put a kink mark (N) on the horizontal axis, between the vertical axis and the
first rectangle if the leftmost rectangle does not have the vertical axis on its side.
▪
▪ Note: For drawing graphs, a scale of representation is required. Unless given, the choice
of scale is made of suit the data.
▪
▪ Different scales can be taken for the two axes.
▪
▪ In the scale for the x-axis is “1 mm = an interval of 5” then the class interval 20 – 40 will
be shown by 4-mm-long line segment on the x-axis.
▪
▪ If the scale for the y-axis is “1 mm = frequency 1” (i.e., frequency of 1 is denoted by 1
mm) then the frequency 10 will be shown by 1-cm-long line segment on the y-axis.
Example: Construct a histogram for the following frequency distribution.
Height (in cm) Number of children

101 – 110 15
111 – 120 18
121 – 130 12
131 – 140 6
141 -150 9
Solution:
Here, the distribution is discontinuous. So, first we write the frequency distribution with
overlapping intervals to make it continuous.
Height (in cm) Number of children

100.5-110.5 15
110.5–120.5 18
120.5–130.5 12
130.5–140.5 6
140.5-150.5 9
Chapter 2 7
Page 1 of 12
Following the above-mentioned steps, the histogram will be as shown below.
Scale: On the x-axis, 1 cm = height of 10 cm. On the y-axis, 0.5 cm = frequency 3
Ogive Curve
The Ogive is defined as the frequency distribution graph of a series. The Ogive is a graph
of a cumulative distribution, which explains data values on the horizontal plane axis and either
the cumulative relative frequencies, the cumulative frequencies or cumulative percent
frequencies on the vertical axis. Cumulative frequency is defined as the sum of all the previous
frequencies up to the current point. To find the popularity of the given data or the likelihood of
the data that fall within the certain frequency range, Ogive curve helps in finding those details
accurately. Create the Ogive by plotting the point corresponding to the cumulative frequency of
each class interval. Most of the Statisticians use Ogive curve, to illustrate the data in the pictorial
representation. It helps in estimating the number of observations which are less than or equal to
the particular value.
Ogive Graph
The graphs of the frequency distribution are frequency graphs that are used to exhibit the
characteristics of discrete and continuous data. Such figures are more appealing to the eye than
Chapter 2 8
Page 1 of 12
the tabulated data. It helps us to facilitate the comparative study of two or more frequency
distributions. We can relate the shape and pattern of the two frequency distributions. The two
methods of Ogives are
• Less than Ogive

• Greater than or more than Ogive
The graph given below represents less than and the greater than Ogive curve. The rising
curve (Brown Curve) represents the less than Ogive, and the falling curve (Green Curve)
represents the greater than Ogive.
Less than Ogive
The frequencies of all preceding classes are added to the frequency of a class. This series is
called the less than cumulative series. It is constructed by adding the first-class frequency to the
second-class frequency and then to the third-class frequency and so on. The downward
cumulation results in the less than cumulative series.
Greater than or More than Ogive
The frequencies of the succeeding classes are added to the frequency of a class. This series is
called the more than or greater than cumulative series. It is constructed by subtracting the
first-class second-class frequency from the total, third class frequency from that and so on.
The upward cumulation result is greater than or more than the cumulative series.
Ogive Chart
An Ogive Chart is a curve of the cumulative frequency distribution or cumulative relative

frequency distribution. For drawing such a curve, the frequencies must be expressed as a
percentage of the total frequency. Then, such percentages are cumulated and plotted as in the
case of an Ogive. Here, the steps for constructing the less than and greater than Ogive are given.
How to Draw Less Than Ogive Curve?
Chapter 2 9
Page 10 of 12
• Draw and mark the horizontal and vertical axes.

• Take the cumulative frequencies along the y-axis (vertical axis) and the upper-class limits
on the x-axis (horizontal axis).
• Against each upper-class limit, plot the cumulative frequencies.
• Connect the points with a continuous curve.
How to Draw Greater than or More than Ogive Curve?
• Draw and mark the horizontal and vertical axes.

• Take the cumulative frequencies along the y-axis (vertical axis) and the lower-class limits
on the x-axis (horizontal axis).
• Against each lower-class limit, plot the cumulative frequencies
• Connect the points with a continuous curve.
Example:
Construct the more than cumulative frequency table and draw the Ogive for the below-given
data.
Marks 1-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80
Frequency 3 8 12 14 10 6 5 2
Solution:
“More than” Cumulative Frequency Table:
More than Cumulative
Marks Frequency
Frequency
More than 1 3 60
More than 11 8 57
More than 21 12 49
More than 31 14 37
More than 41 10 23
More than 51 6 13
More than 61 5 7
More than 71 2 2
Plotting an Ogive:
Plot the points with coordinates such as (70.5, 2), (60.5, 7), (50.5, 13), (40.5, 23), (30.5, 37),
(20.5, 49), (10.5, 57), (0.5, 60).
An Ogive is connected to a point on the x-axis, that represents the actual upper limit of the last
class, i.e.,( 80.5, 0)
Chapter 2 10
Page 10 of 12
Take x-axis, 1 cm = 10 marks

Y-axis = 1 cm – 10 c.f
More than the Ogive Curve:
Chapter 2 11
Page 10 of 12
▪ http://jukebox.esc13.net/untdeveloper/RM/RM_FR_P2/RM_FR_P22.hhtml
▪ https://byjus.com/maths/graphical-representation/
▪ www.math-only-math.com/histogram.html
▪ https://byjus.com/maths/ogive/
▪ https://youtu.be/j6ftiC2o6O4
▪ https://youtu.be/sVwOUcxPX98
▪ https://youtu.be/Rx8wSEDq5Hs
▪ https://youtu.be/JVaKq-oJnFs
▪ https://youtu.be/0ZKtsUkrgFQ
Chapter 2 12
Page 10 of 12
CHAPTER 1: BASIC CONCEPTS OF STATISTICS

OBJECTIVES :
• Recognize statistics.
• Explain statistics.
• Differentiate the descriptive and inferential statistics
• Classify sampling techniques.
• Differentiate random sampling and non-random sampling.
• Distinguish levels of measurements
What is Statistics?
STATISTICS is the science of
• planning studies and experiments,
• collecting,
• organizing,
• presenting,
• analyzing,
• interpreting, and
• drawing conclusions based on the data.
LESSON 1.1 Definitions of Terms

• Collection of data refers to the process of obtaining information.
• Organization of data refers to the determining/ascertaining (after a calculation, investigation,

experiment, survey, or study) manner of presenting the data into tables, graphs, or charts so that
logical and statistical conclusion can be drawn from the collected measurements.
• Analysis of data refers to the process of extracting from the given data relevant information
from which numerical description can be formed.
• Interpretation of data refers to the task of drawing conclusions from the analyzed data.
BRANCHES OF STATISTICS
• Descriptive statistics is the branch of statistics that involves the collection, organization,
presentation, summarization or analysis of data.
• Inferential statistics is the branch of statistics that involves using a sample to interpret, and
draw conclusions based on the data or about a population. A basic tool in the study of inferential
statistics is probability. An area of inferential statistics called hypothesis testing is a decision-
Chapter 1 1
making process for evaluating claims about a population, based on information obtained from
samples.
LESSON 1.2 Classifications of Variables and Data
CLASSIFICATIONS OF VARIABLES AND DATA
Variable (or Response Variable)
- A characteristic or attribute of interest about each individual element of a population

or sample that can assume different values.
Example #1:
A student’s age at entrance into college, the color of the student’s hair, the student’s
height, and the student’s weight are four variables.
Data
- It is the collection of observations.

- It consists of information coming from observations (realized value of a variable),
counts, measurements, or responses.
- It is the set of values collected from the variable from each of the elements that
belong to the sample. Once all the data are collected, it is common practice to refer to
the set of data as the sample.
Example #2:
The set of 30 heights gathered from 30 students is an example of a set of data.
Data Value
- The value of the variable associated with one element of a population or sample. This
value may be a number, a word, or a symbol.
Example #3:
Chapter 1 2
Angelo entered college at age “23,” his hair is “brown,” he is “71 inches” tall, and he
weighs “183 pounds.” These four data values are the values for the four variables as applied to
Angelo.
Data sets are called populations and samples.
Population
- Collection of all outcomes, responses,

measurements, or counts that are of
interest.
- Consists of all subjects (human or
otherwise) that are being studied.
Sample
- A sample is a subset, or part, of a

population.
- A sample is a group of subjects
selected from a population.
Elementary unit or Element
- It is a member of the population whose

measurement on the variable of
interest is what we wish to examine.
Experiment
- A planned activity whose results yield a set of data.

- An experiment includes the activities for both selecting the elements and obtaining
the data values.
EXPERIMENTAL CLASSIFICATION\
A researcher may classify variables according to the function they serve in the experiment.
• Independent variables are variables controlled by the experimenter/researcher, and expected

to have an effect on the behavior of the subjects. The independent variable is also called
explanatory variable.
• Dependent variable is some measure of the behavior of subjects and expected to be influenced
by the independent variable. The dependent variable is also called outcome variable.
Example #4:
Chapter 1 3
In the sit-up study, the researchers gave the groups two different types of instructions, general
and specific. Hence, the independent variable is the type of instruction. The dependent variable,
then, is the resultant variable, that is, the number of sit-ups each group was able to perform after
four days of exercise.
Parameter
- A numerical description of a population characteristic.

- A numerical value summarizing all the data of an entire population.
Statistic
- A numerical description of a sample characteristic.

- A numerical value summarizing the sample data.
SYMBOLIC NOTATION FOR SAMPLE AND POPULATION MEASURES
RELATIONSHIPS AMONG PROBABILITY, STATISTICS, POPULATION, AND

SAMPLE
Example #5:
Chapter 1 4
SOURCE OF DATA
Primary data are date documented by the primary source. The data collectors themselves
documented this data.
Example #6: census, sample survey, experiment
Secondary data are data documented by a secondary source. An individual/agency, other than
the data collectors, documented this data.
Example #7: books, journals, magazines, theses
DATA COLLECTION METHODS
1. SURVEYS
- It is a method of collecting data on the variable of interest by asking people questions.

When data came from asking all the people in the population, then the study is called a census*.
On the other hand, when data came from asking a sample of people selected from a well-defined
population, then the study is called sample survey.
(*Census or Registration requires the enactment of law to take effect for it needs the
participation of a large, if not the entire, population.)
DIFFERENT METHODS OF COMMUNICATION
a) Personal Interview - It refers to as the direct method of gathering data since this requires a
face-to-face inquiry with the respondent.
b) Self-Administered Questionnaire - It is an inventory of information listed down to which a

respondent answers. There is no face-to-face confrontation.
2. OBSERVATION
- It is a method of collecting data on the phenomenon of interest by recording the

observations made about the phenomenon as it actually happens.
- It makes use of the different human senses in gathering information.
- It is useful in studying the reactions and behavior of individuals or groups of

persons/objects in a given situation or environment as it happens.
Chapter 1 5
3. EXPERIMENTATION
- It is a method of collecting data where there is direct human intervention on the

conditions that may affect the values of the variables of interest.
- It is conducted in laboratories where specimens are subjected to some aspects of control

to find out cause and effect relationships.
TYPES OF DATA
Qualitative, or Attribute, or Categorical Variable
• consist of attributes, labels, or nonnumerical entries.
• A variable that describes or categorizes an element of a population.
▪ Dichotomous
▪ Trichotomous
▪ Multinomous
Quantitative, or Numerical Variable
• A variable that quantifies an element of a population.
• It consists of numerical measurements or counts and can be ordered or ranked.
DISCRETE VARIABLES
- Assume values that can be counted.
- Can be assigned values such as 0, 1, 2, 3 and are said to be countable.
Example #8:
Examples of discrete variables are the number of children in a family, the number of students in a
classroom, and the number of calls received by a switchboard operator each day for a month.
CONTINUOUS VARIABLES
- Can assume an infinite number of values in an interval between any two specific
values. They are obtained by measuring. They often include fractions and decimals.
Example #9:
SCALE OF MEASUREMENTS
Chapter 1 6
Measurement - It is the process of determining the value or label of the variable based on what
has been observed.
Nominal Level of Measurement
• Data are qualitative only.

• Data at this level are categorized using
names, labels, or qualities. No mathematical
computations can be made at this level.
Example #10 (on the right side)
Ordinal Level of Measurement
- Data are qualitative or quantitative.
- Data at this level can be arranged in order, or

ranked, but differences between data entries are not
meaningful.
Example #11: (on the right side)
Interval Level of Measurement
- Data can be ordered, and meaningful differences

between data entries can be calculated.
- At the interval level, a zero entry simply represents a

position on a scale; the entry is not an inherent zero. Note: An
inherent zero is a zero that implies “none.”
Ratio level of measurement
- Data are similar to data at the interval level, with

the added property that a zero entry is an inherent zero.
- A ratio of two data values can be formed so that

one data value can be meaningfully expressed as a multiple
of another.
Level of measurement has all of the following

properties:
Chapter 1 7
a) The numbers in the system are used to classify a person/object into distinct, nonoverlapping,
and complete/exhaustive categories.
b) The system arranges categories according to magnitude/degree.
c) The system has a fixed unit of measurement representing a set of size throughout the scale;
and
d) The system has an absolute zero.
• Ratio level of measurement satisfies a, b, c, and d
• Interval level of measurement satisfies only a, b, and c
• Ordinal level of measurement satisfies only a, and b
• Nominal level of measurement satisfies only a
TYPES OF DATA AND MEASUREMENT SCALES
METHODS OF PRESENTATION OF DATA
1. Textual method
- This method presents the collected data in narrative and paragraphs forms.
2. Tabular method
- This method presents the collected data in table which are orderly arranged in rows and
columns for an easier and more comprehensive comparison of figures.
3. Graphical method
- This method presents the collected data in visual or pictorial form to get a clear view of
data (e.g. histogram, pie chart, pareto chart, pictograph, etc.).
LESSON 1.3 SAMPLING TECHNIQUES
Chapter 1 8
Census - It is a count or measure of an entire population. Taking a census provides complete

information, but it is often costly and difficult to perform.
Sampling - It refers to the process of selecting individuals from target population.
Sampling frame - A list of all elements or other units containing the elements or members in a
population.
SAMPLING TECHNIQUES
➢ Probability Sampling
➢ Nonprobability Sampling
Probability Sampling or Random Sampling is a process whose members had an equal chance
of being selected from the population.
• Types of Probability sampling
○ Simple Random Sampling ○ Cluster Sampling
○ Systematic Sampling ○ Multistage Sampling
○ Stratified Sampling
Simple Random Sampling - It is a process of selecting n sample size in the population via
random numbers or through lottery.
Systematic Sampling - A systematic sample is a sample in which each member of the

population is assigned a number. The members of the population are ordered in some way, a
starting number is randomly selected, and then sample members are selected at regular intervals
from the starting number. (For instance, every 3rd, 5th, or 100th member is selected.)
Chapter 1 9
Stratified Sampling - A stratified sample is a sample obtained by dividing the population into
subgroups, called strata, according to various homogeneous characteristics and then selecting
members from each stratum for the sample.
Cluster Sampling - Here the population is divided into groups called clusters by some means
such as geographic area or schools in a large school district, etc. Then the researcher randomly
selects some of these clusters and uses all members of the selected clusters as the subjects of the
samples.
Multistage Sampling - A sample design in which the elements of the sampling frame are
subdivided and the sample can be obtained by using combination of methods. This is usually
used for national, regional, provincial or country level studies.
Nonprobability sampling or nonrandom sampling
- Is a sampling procedure where samples selected in a deliberate manner with little or

no attention to randomization.
Chapter 1 10
- Some segments of the population do not have a chance of being selected or included
in the sample or cannot be specified
Types of Nonprobability sampling
○ Convenience Sampling ○ Snowball Sampling
○ Purposive Sampling ○ Networking Sampling
○ Quota Sampling
Convenience Sampling - A convenience sample consists only of available members of the

population.
Purposive Sampling - It is also called judgment sampling. The sampling units are selected
personally or subjectively by the researcher, who attempts to obtain a sample that appears to be
representative of the population.
Quota Sampling - in this method, the researcher determines the sampling size which should be
filled up. The basic idea is to set a target number of completed interviews with specified
subgroups of the population of interest.
Snowball Sampling - it involves starting a process with one individual or group and using their
contacts to develop the sample, hence “snowball”.
Chapter 1 11
Networking Sampling - This is used to find socially devalued urban populations such as
addicts, alcoholics, child abusers and criminals, because they are usually “hidden from
outsiders.”
▪ https://youtu.be/SFPGVTThJNk
▪ https://youtu.be/ZxV-kf0yBss
▪ https://youtu.be/hZxnzfnt5v8
▪ https://youtu.be/saO1yLxd1p8
Reference:
▪ https://youtu.be/SFPGVTThJNk
▪ https://youtu.be/ZxV-kf0yBss
▪ https://youtu.be/hZxnzfnt5v8
▪ https://youtu.be/saO1yLxd1p8
Chapter 1 12

MS2 CHP 1-10 by Mark Yu

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MS2 CHP 1-10 by Mark Yu

Uploaded by

Copyright:

Available Formats

MODULE BUSINESS STATISTICS

CHAPTER 10: Regression and Correlation

After reading this chapter, you should be able to:

 Explain regression and correlation.

When investigating the relationship between two or more numeric variables, it is

 Regression assumes X is fixed with no error, such as a dose amount or temperature

Key advantage of correlation

Key advantage of regression

LESSON 10.1 Correlation Coefficient

 1 indicates a strong positive relationship.

Graphs showing a correlation of -1, 0 and +1

Types of correlation coefficient formula.

There are several types of correlation coefficient formulas.

Sample correlation coefficient

Population correlation coefficient

LESSON 10.2 Testing Correlation Coefficient

USING PEARSON’S CORRELATION COEFFICIENT

S UBJE CT AGE X GLUCOS E LE VE L Y

S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2

S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2

S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2

S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2

1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2

1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Σ 247 486 20485 11409 40022

Step 6: Use the following correlation coefficient formula.

The answer is: 2868 / 5413.27 = 0.529809

From our table:

 6(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]

LESSON 10.3 Simple Linear Regression

Why use Linear Relationships?

What is Simple Linear Regression?

Simple linear regression for the amount of rainfall per year.

The Linear Regression Equation

How to Find a Linear Regression Equation: Steps

S UBJE CT AGE X GLUCOS E LE VE L Y XY X2 Y2

1 43 99 4257 1849 9801

2 21 65 1365 441 4225

3 25 79 1975 625 6241

4 42 75 3150 1764 5625

5 57 87 4959 3249 7569

6 59 81 4779 3481 6561

Σ 247 486 20485 11409 40022

Step 2: Use the following equations to find a and b.

Step 3: Insert the values into the equation.

CHAPTER 9: Test of Hypothesis

After reading this chapter, you should be able to:

Hypothesis testing is a statistical technique that is used in a variety of situations. Though

Lesson 9.1 : Basic Concepts of Hypothesis Testing

What is a Hypothesis Statement?

A good hypothesis statement should:

 Include an “if” and “then” statement (according to the University of California).