Professional Documents
Culture Documents
Introduction:
Central Tendency is the point about which the scores tend to cluster. It is the
center of concentration of scores in any set of data. It is single number that represents
the general level of performance of a group.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. calculate the mean, mode, median and range for a set of discrete data
2. determine the appropriate measure of central tendency for a given set of
data.
3. discuss the characteristics and uses of mean, median and mode.
Course Material:
Arithmetic Mean () is the average of the set of data. It is the center of the gravity of a
distribution. (Ungrouped data)
65 55 89 56 35 14 56 55 87 45 92
Grouped Data
Where:
f = frequency x = class marks
1
n = number of
samples
Example:
80-84 1 82 82
75-79 1 77 77
70-74 1 72 72
65-69 4 67 268
60-64 4 62 248
55-59 7 57 399
50-54 6 52 312
45-49 6 47 282
40-44 6 42 252
35-39 3 37 111
30-34 0 32 0
25-29 1 27 27
n = 40 2130
Median () is a positional value. It is the midpoint of the distribution when data are
ranked according to size. (Ungrouped data)
In order to calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
2
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is
the middle mark because there are 5 scores before it and 5 scores after it. This works
fine when you have an odd number of scores, but what happens when you have an
even number of scores? What if you had only 10 scores? Well, you simply have to take
the middle two scores and average the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and average
them to get a median of 55.5.
Grouped Data
Where:
= lower real limit
= cumulative less than frequency
f = frequency
n = sample
i = class size
Example
Class
Class
Frequency Boundaries ≤ CF
Intervals
lower
80-84 1 79.5 40
75-79 1 74.5 39
70-74 1 69.5 38
65-69 4 64.5 37
60-64 4 59.5 33
55-59 7 54.5 29
3
50-54 6 49.5 22
45-49 6 44.5 16
40-44 6 39.5 10
35-39 3 34.5 4
30-34 0 29.5 1
25-29 1 24.5 1
n = 40
Since , then find the possible position of 20 in the cumulative frequency. And it
will fall under 22 which has the interval of 50-54.
Using the interval (50-54)
Given:
= 49.5
= 16
f=6
n = 40
i=5
Mode ( ) is a frequency value. It is the value that occurs most frequently. (Ungrouped
data) Suppose we have the data below, what is the mean?
65 55 89 56 35 14 55 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 55 56 65 87 89 92
4
Where:
80-84 1
75-79 1
70-74 1
65-69 4
60-64 4
55-59 7
50-54 6
45-49 6
40-44 6
35-39 3
30-34 0
25-29 1
n = 40
Since the highest frequency in the given grouped data is 7, then the interval; 55-59 will
be used.
Solution
Given:
= 54.5 du = (7-6) = 1
5
dl = (7-4) = 3
i=5
Watch:
Mode, Median, Mean, Range, and Standard Deviation
https://www.youtube.com/watch?v=mk8tOD0t8M0
40 47 50 61 62 65 30 25 24 61
29 32 35 46 54 60 60 60 43 50
59 34 36 48 31 30 57 48 44 53
6
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 2– MEASURES OF DISPERSION
Introduction:
Measures of dispersion is a single number that describes how the data are
scattered or how much they are bunched. It is also called as measure of variability or
measure of spread.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. describe the range, variance and standard deviation
2. calculate the range, variance and standard deviation
Course Material:
MEASURES OF DISPERSION
a. Range is the simplest measure of dispersion. It is equal to the difference of
highest score and the lowest score of the set of scores. It involves only the two
extremes in a distribution.
b. Variance is a measure of variability that considers the position of each
observation relative to the mean of the set of scores.
Ungrouped data
SAMPLE VARIANCE
Where:
x = scores
= sample mean
= sample size
POPULATION VARIANCE
Where:
7
x = scores
= population mean
= population
x
2 2-6 = -4 16
3 3-6 = -3 9
4 4-6 = -2 4
5 5-6 = -1 1
6 6-6 = 0 0
8 8-6 = 2 4
10 10-6 = 4 16
10 10-6= 4 16
Grouped data
SAMPLE VARIANCE
Where:
f = frequency
m = class mark
= sample mean
= sample size
8
POPULATION VARIANCE
Where:
f = frequency
m = class mark
= population mean
= population
b. Standard Deviation is derived from the positive square root of variance. It
has been termed because it provides a standard unit for measuring distance of various
scores from the mean.
Ungrouped data
SAMPLE STANDARD DEVIATION
Where:
x = scores
= sample mean
= sample size
Where:
x = scores
= population
mean
= population
Example: A student was investigating the effect of synthetic fertilizer on the growth of
peanut seedlings. A random sample of those seedlings yielded the heights below in
inches. Find the standard deviation.
x
2 2-6 = -4 4
9
3 3-6 = -3 3
4 4-6 = -2 2
5 5-6 = -1 1
6 6-6 = 0 0
8 8-6 = 2 2
10 10-6 = 4 4
10 10-6= 4 4
Grouped data
SAMPLE STANDARD DEVIATION
Where:
f = frequency
m = class mark
= sample mean
= sample size
POPULATION STANDARD DEVIATION
Where:
f = frequency
m = class mark
= population mean
= population
Watch:
Measures of Variability
https://www.youtube.com/watch?v=Cx2tGUze60s
11
Introduction:
There are measures of position or location. These measures include standard
scores, percentiles, and quartiles. They are used to locate the relative position of a data
value in the data set. For example, if a value is located at the 80th percentile, it means
that 80% of the values fall below it in the distribution and 20% of the values fall above it.
The median is the value that corresponds to the 50th percentile, since one-half of the
values fall below it and one half of the values fall above it.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. use a variety of statistical tools to process and manage numerical data
2. advocate the use of statistical data in making important decisions
Course Material:
a. Z- SCORES. The areas under the normal curve are given in term of z-values
or scores. Either the z-scores locates X within a sample or population.
FORMULA
Where:
= given measurement
= population mean
= population standard
deviation
= sample mean
=sample standard
deviation
12
Example: Raul has taken two tests in his chemistry class. He scored 72 on the first test,
for which the mean of all scores was 65 and the standard deviation is 8. He received a
60 on a second test, for which the mean of all scores was 45 and the standard deviation
12. In comparison to the other students, did Raul do better on the first test or second
test?
Solution:
The z-scores indicate that Raul scored better on the second test than he did in the first
test.
Example: On a reading examination given to 900 students, Elaine’s score of 602 was
higher than the scores of 576 of the students who took the examination. What is the
percentile of Elaine’s score?
c. QUARTILES. The quartiles of a set of data are the three numbers Q1, Q2 and
Q3 that partition the ranked data into four equal groups. Q2 is the median of the data, Q1
is the median of the data less than Q2 ,and Q3 is the median of the data values greater
than Q2..
13
Where:
= quartile
n = number scores
i = position
Watch:
Percentiles, Quantiles and Quartiles in Statistics
https://www.youtube.com/watch?v=Ky7QeVgv-BA
14
3. On a placement examination, August scored lower than 1210 of 12,860 students
who took the exam. Find the percentile, rounded to the nearest percent, for
August’s score.
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 4– NORMAL DISTRIBUTION
Introduction:
Normal distribution is also known as a bell curve or a Gaussian distribution curve,
named for the German mathematician Carl Friedrich Gauss (1777–1855), who derived
its equation. No variable fits a normal distribution perfectly, since a normal distribution is
a theoretical distribution. However, a normal distribution can be used to describe many
variables, because the deviations from a normal distribution are very small.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. use a variety of statistical tools to process and manage numerical data
2. advocate the use of statistical data in making important decisions
Course Material:
15
PROPERTIES OF A NORMAL CURVE
It is bell-shaped.
The tails are asymptotic to the baseline.
The mean, median and mode in a normal curve, coincide. (they have the
same values.)
It is symmetrical about the mean
The total area of the curve is 100% or 1.00
Empirical Rule
On a normal distribution about 68% of data will be within one standard deviation
of the mean, about 95% will be within two standard deviations of the mean, and about
99.7% will be within three standard deviations of the mean
Example. A survey of 1000 gas station found that the price charged for a gallon of
regular gas could be closely approximate led by a normal distribution with a mean of
30.10PHP and a standard deviation of 1.8PHP. How many of the stations charge
between 27.4PHP and 34.6PHP for gallon of regular gas?
Solution. The 27.4PHP per gallon price is 2 standard deviation below the mean.
The 34.60PHP price is 2 standard deviations above the mean. In a normal distribution,
95% of all data lie within 2 standard deviations of the mean. Therefore the
approximately (95%) (1000)=950 of the stations charge between 27.40PHP and
34.60PHP a gallon of regular gas.
16
STANDARD NORMAL DISTRIBUTION
.
Solution.
1. Get the z-value and express into 2 decimal places
17
The z-table
The area is 0.4772.
Determining Probabilities
Probabilities associated with the standard normal random variables can be
shown as areas under the standard normal curve.
Probability Notations
• P(a < z < b) denotes the probability that the z-score is between a and b.
• P(z > a) denotes the probability that the z-score is greater than a.
• P(z < a) denotes the probability that the z-score is less than a.
where a and b are z-score values.
Example: The entrance exam scores of incoming freshmen in a state college are
normally distributed with a mean of 78 and the standard deviation of 10. What is the
probability that a randomly selected student has a score?
a. below 77?
1. Express z – value into 2 decimal places
= -1
18
2. Shade the required region using the normal curve.
19
2. The IQ scores of children in a special education class are normally distributed with =
95 and = 10.
a. What is the probability that one of the children has an IQ score below 100?
b. What is the probability that a child has an IQ score above 120?
c. What is the chances that a child has an IQ score of 90?
Introduction:
Correlation is a statistical method used to determine whether a linear relationship
between variables exists. Regression is a statistical method used to describe the nature
of the relationship between variables, that is, positive or negative, linear or nonlinear.
Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Use the methods of linear regression and correlations to predict the value of a
variable given certain conditions
2. advocate the use of statistical data in making important decisions
Course Material:
20
Pearson’s Correlation Coefficient
It measures the degree of association or closeness of relationship between
two variables. It is a measure of linear association.
The correlation coefficient is always between -1 and +1. If you ever an answer
outside this range, you have made an error in you calculations.
Let and represent the respective mean of the x and y values from the sample data.
Simplified Formula
21
Solution:
Capacity Efficiency
xy x2 y2
(x) (y)
LINEAR REGRESSION
Method of Least Squares is the sum of the squares of the vertical distances from
observed points to the line is as small as possible.
22
Then
Where.
= means of the values of y
= means of the values of x
a = y-intercept
b= slope
Prediction
Having determined the equation of the line, it is now possible to predict the value
of y given x.
Example: Determine the best linear relationship between speed and capacity, examine
the set of sample data pairs on speed and capacity as shown below.
Solution/ Computation:
23
x y XY x2
And with the capacity mean of 3.77 kg/min. and mean speed of 22.55 rpm
a =3.77 – 0.059(22.55)
a = 2.44
Therefore, the equation of the line is:
y = 2.44 + 0.059 x
24
Prediction
Determine the capacity when speed is 25 rpm. Using the equation,
y = 2.44 + 0.059 x
y = 2.44 + 0.059 (25)
y= 3.92 kg/min.
Watch:
The (Pearson) Correlation Coefficient Explained in One Minute: From
Definition to Formula
https://www.youtube.com/watch?v=WpZi02ulCvQ
How to Perform Simple Linear Regression by Hand
https://www.youtube.com/watch?v=GhrxgbQnEEU
25
26