You are on page 1of 26

LESSON 4: STATISTICS/DATA MANAGEMENT

UNIT 1– MEASURES OF CENTRAL TENDENCY

Introduction:
Central Tendency is the point about which the scores tend to cluster. It is the
center of concentration of scores in any set of data. It is single number that represents
the general level of performance of a group.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. calculate the mean, mode, median and range for a set of discrete data
2. determine the appropriate measure of central tendency for a given set of
data.
3. discuss the characteristics and uses of mean, median and mode.

Course Material:
Arithmetic Mean () is the average of the set of data. It is the center of the gravity of a
distribution. (Ungrouped data)

Suppose we have the data below, what is the mean?

65 55 89 56 35 14 56 55 87 45 92

Grouped Data

Where:
f = frequency x = class marks

1
n = number of
samples
Example:

Class Intervals Frequency Class Mark fx

80-84 1 82 82
75-79 1 77 77
70-74 1 72 72
65-69 4 67 268
60-64 4 62 248
55-59 7 57 399
50-54 6 52 312
45-49 6 47 282
40-44 6 42 252
35-39 3 37 111
30-34 0 32 0
25-29 1 27 27
n = 40 2130

Median () is a positional value. It is the midpoint of the distribution when data are
ranked according to size. (Ungrouped data)
In order to calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89 92

2
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is
the middle mark because there are 5 scores before it and 5 scores after it. This works
fine when you have an odd number of scores, but what happens when you have an
even number of scores? What if you had only 10 scores? Well, you simply have to take
the middle two scores and average the result. So, if we look at the example below:

65 55 89 56 35 14 56 55 87 45

We again rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 56 56 65 87 89

Only now we have to take the 5th and 6th score in our data set and average
them to get a median of 55.5.

Grouped Data

Where:
= lower real limit
= cumulative less than frequency
f = frequency
n = sample
i = class size
Example
Class
Class
Frequency Boundaries ≤ CF
Intervals
lower
80-84 1 79.5 40
75-79 1 74.5 39
70-74 1 69.5 38
65-69 4 64.5 37
60-64 4 59.5 33
55-59 7 54.5 29

3
50-54 6 49.5 22
45-49 6 44.5 16
40-44 6 39.5 10
35-39 3 34.5 4
30-34 0 29.5 1
25-29 1 24.5 1
n = 40
Since , then find the possible position of 20 in the cumulative frequency. And it
will fall under 22 which has the interval of 50-54.
Using the interval (50-54)
Given:

= 49.5
= 16
f=6
n = 40
i=5

Mode ( ) is a frequency value. It is the value that occurs most frequently. (Ungrouped
data) Suppose we have the data below, what is the mean?
65 55 89 56 35 14 55 55 87 45 92

We first need to rearrange that data into order of magnitude (smallest first):

14 35 45 55 55 55 56 65 87 89 92

The most frequent number in a given set of data is 55.


Grouped Data

4
Where:

= lower real limit


du = difference between the highest frequency and the frequency of the interval below
it.
dl = difference between the highest frequency and the frequency of the interval above it.
i = class size
Example

Class Intervals Frequency

80-84 1
75-79 1
70-74 1
65-69 4
60-64 4
55-59 7
50-54 6
45-49 6
40-44 6
35-39 3
30-34 0
25-29 1
n = 40
Since the highest frequency in the given grouped data is 7, then the interval; 55-59 will
be used.

Solution

Given:
= 54.5 du = (7-6) = 1
5
dl = (7-4) = 3
i=5

Watch:
 Mode, Median, Mean, Range, and Standard Deviation

https://www.youtube.com/watch?v=mk8tOD0t8M0

Activity/Assessment no. 10:


Answer the following questions.
1. In a selection of 15 slots of 200 automotive spare parts, the following numbers
of defective automotive parts were found: 3, 6, 2, 3, 5, 8, 1, 2, 7, 5, 5, 5, 4, 7, and 7.
Find the mean, median and modal number of defective parts.
2. The following table displays the ages of male actors when they starred in their
Oscar-swimming Best Actor performances.

40 47 50 61 62 65 30 25 24 61

29 32 35 46 54 60 60 60 43 50

59 34 36 48 31 30 57 48 44 53

6
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 2– MEASURES OF DISPERSION

Introduction:
Measures of dispersion is a single number that describes how the data are
scattered or how much they are bunched. It is also called as measure of variability or
measure of spread.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. describe the range, variance and standard deviation
2. calculate the range, variance and standard deviation

Course Material:
MEASURES OF DISPERSION
a. Range is the simplest measure of dispersion. It is equal to the difference of
highest score and the lowest score of the set of scores. It involves only the two
extremes in a distribution.
b. Variance is a measure of variability that considers the position of each
observation relative to the mean of the set of scores.
Ungrouped data
SAMPLE VARIANCE

Where:
x = scores
= sample mean
= sample size

POPULATION VARIANCE

Where:

7
x = scores
= population mean
= population

Example: A student was investigating the effect of synthetic fertilizer on the


growth of peanut seedlings. A random sample of those seedlings yielded the heights
below in inches. Find the variance.

x
2 2-6 = -4 16
3 3-6 = -3 9
4 4-6 = -2 4
5 5-6 = -1 1
6 6-6 = 0 0
8 8-6 = 2 4
10 10-6 = 4 16
10 10-6= 4 16

Grouped data
SAMPLE VARIANCE

Where:
f = frequency
m = class mark
= sample mean
= sample size

8
POPULATION VARIANCE

Where:
f = frequency
m = class mark
= population mean
= population
b. Standard Deviation is derived from the positive square root of variance. It
has been termed because it provides a standard unit for measuring distance of various
scores from the mean.
Ungrouped data
SAMPLE STANDARD DEVIATION

Where:
x = scores
= sample mean
= sample size

POPULATION STANDARD DEVIATION

Where:
x = scores
= population
mean
= population

Example: A student was investigating the effect of synthetic fertilizer on the growth of
peanut seedlings. A random sample of those seedlings yielded the heights below in
inches. Find the standard deviation.

x
2 2-6 = -4 4

9
3 3-6 = -3 3
4 4-6 = -2 2
5 5-6 = -1 1
6 6-6 = 0 0
8 8-6 = 2 2
10 10-6 = 4 4
10 10-6= 4 4

Grouped data
SAMPLE STANDARD DEVIATION

Where:
f = frequency
m = class mark
= sample mean
= sample size
POPULATION STANDARD DEVIATION

Where:
f = frequency
m = class mark
= population mean
= population
Watch:
 Measures of Variability
https://www.youtube.com/watch?v=Cx2tGUze60s

Activity / Assessment no. 11:


Answer the following.
10
1. In an advertising company, a random sample of 10 employees gave the
following information on the number of hours spent on consultation by prospective
clients each day: 3, 6, 4, 5, 3, 2, 2, 4, 3, and 2.
a. Find range

a. Find sample mean

b. Find the sample standard deviation.

LESSON 4: STATISTICS/DATA MANAGEMENT


UNIT 3– MEASURES OF RELATIVE POSITION

11
Introduction:
There are measures of position or location. These measures include standard
scores, percentiles, and quartiles. They are used to locate the relative position of a data
value in the data set. For example, if a value is located at the 80th percentile, it means
that 80% of the values fall below it in the distribution and 20% of the values fall above it.
The median is the value that corresponds to the 50th percentile, since one-half of the
values fall below it and one half of the values fall above it.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. use a variety of statistical tools to process and manage numerical data
2. advocate the use of statistical data in making important decisions

Course Material:
a. Z- SCORES. The areas under the normal curve are given in term of z-values
or scores. Either the z-scores locates X within a sample or population.
FORMULA

Z-SCORE FOR POPULATION DATA Z-SCORE FOR SAMPLE DATA

Where:
= given measurement
= population mean
= population standard
deviation
= sample mean
=sample standard
deviation

12
Example: Raul has taken two tests in his chemistry class. He scored 72 on the first test,
for which the mean of all scores was 65 and the standard deviation is 8. He received a
60 on a second test, for which the mean of all scores was 45 and the standard deviation
12. In comparison to the other students, did Raul do better on the first test or second
test?

Solution:

The z-scores indicate that Raul scored better on the second test than he did in the first
test.

b. PERCENTILE. A value x is called the pth percentile of a data set provided p%


of the data values are less than x. given a set of data and a data value x.

Example: On a reading examination given to 900 students, Elaine’s score of 602 was
higher than the scores of 576 of the students who took the examination. What is the
percentile of Elaine’s score?

Elaine’s score of 602 places her at 64th percentile.

c. QUARTILES. The quartiles of a set of data are the three numbers Q1, Q2 and
Q3 that partition the ranked data into four equal groups. Q2 is the median of the data, Q1
is the median of the data less than Q2 ,and Q3 is the median of the data values greater
than Q2..

13
Where:
= quartile
n = number scores
i = position

Example. Given the data, 26 51 44 23 25 61 45 65 23 43 41 55. Find Q3.


Solution:
Arrange data, 23 23 25 26 41 43 44 45 51 55 61 65

Watch:
 Percentiles, Quantiles and Quartiles in Statistics

https://www.youtube.com/watch?v=Ky7QeVgv-BA

Activity/Assessment no. 12:


Solve the following.
1. A data set has a mean of 75 and standard deviation of 11.5. find the z-score for
each of the following:
a.
b. x = 85
c. x = 60
d. x = 90
2.

14
3. On a placement examination, August scored lower than 1210 of 12,860 students
who took the exam. Find the percentile, rounded to the nearest percent, for
August’s score.
LESSON 4: STATISTICS/DATA MANAGEMENT
UNIT 4– NORMAL DISTRIBUTION

Introduction:
Normal distribution is also known as a bell curve or a Gaussian distribution curve,
named for the German mathematician Carl Friedrich Gauss (1777–1855), who derived
its equation. No variable fits a normal distribution perfectly, since a normal distribution is
a theoretical distribution. However, a normal distribution can be used to describe many
variables, because the deviations from a normal distribution are very small.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. use a variety of statistical tools to process and manage numerical data
2. advocate the use of statistical data in making important decisions

Course Material:

FREQUENCY DISTRIBUTIONS. A frequency distribution display a data set by


dividing the data into intervals, or classes, and listing the number of data values that fall
into each interval. A relative frequency distribution lists the percent of data in each
interval.
Example.

Normal Distribution of data is bell-shaped curve that is symmetric about


vertical line through mean. The y-value of each of point on the curve is the percent of
the data at the corresponding x-value. The total area under the curve is 1.

15
PROPERTIES OF A NORMAL CURVE

 It is bell-shaped.
 The tails are asymptotic to the baseline.
 The mean, median and mode in a normal curve, coincide. (they have the
same values.)
 It is symmetrical about the mean
 The total area of the curve is 100% or 1.00

Empirical Rule

On a normal distribution about 68% of data will be within one standard deviation
of the mean, about 95% will be within two standard deviations of the mean, and about
99.7% will be within three standard deviations of the mean

Example. A survey of 1000 gas station found that the price charged for a gallon of
regular gas could be closely approximate led by a normal distribution with a mean of
30.10PHP and a standard deviation of 1.8PHP. How many of the stations charge
between 27.4PHP and 34.6PHP for gallon of regular gas?

Solution. The 27.4PHP per gallon price is 2 standard deviation below the mean.
The 34.60PHP price is 2 standard deviations above the mean. In a normal distribution,
95% of all data lie within 2 standard deviations of the mean. Therefore the
approximately (95%) (1000)=950 of the stations charge between 27.40PHP and
34.60PHP a gallon of regular gas.

16
STANDARD NORMAL DISTRIBUTION

It is the normal distribution that has a mean of 0 and standard deviation of


1.

Example: Given the mean ( = 50 and the standard deviation ( = 4 of a population of


reading scores. Find the z-value that corresponds to a score of 58.

.
Solution.
1. Get the z-value and express into 2 decimal places

2.Find the area using the z-table.

17
The z-table
The area is 0.4772.

Determining Probabilities
Probabilities associated with the standard normal random variables can be
shown as areas under the standard normal curve.

Probability Notations

• P(a < z < b) denotes the probability that the z-score is between a and b.
• P(z > a) denotes the probability that the z-score is greater than a.
• P(z < a) denotes the probability that the z-score is less than a.
where a and b are z-score values.

STEPS IN DETERMINING PROBABILITY UNDER THE NORMAL CURVE


1. Express z – value into 2 decimal places
2. Shade the required region using the normal curve..
3. Consult the z-Table and find the area that corresponds to z
4. Examine the graph and use probability notation to form an equation showing the
appropriate operation to get the required area.

Example: The entrance exam scores of incoming freshmen in a state college are
normally distributed with a mean of 78 and the standard deviation of 10. What is the
probability that a randomly selected student has a score?

a. below 77?
1. Express z – value into 2 decimal places

= -1

18
2. Shade the required region using the normal curve.

3. Consult the z-Table and find the area that corresponds to z


Area = 0.5 – 0.3413 = 0.1587
4. Examine the graph and use probability notation to form an equation
showing the appropriate operation to get the required area.
P(z < -1) = 0.1587 x 100 = 15.87%
Watch:
 Standard Normal Distribution Tables, Z Scores, Probability & Empirical Rule
https://www.youtube.com/watch?v=CjF_yQ2N638

Activity/ Assessment no. 13:


Solve the following.
1. Determine each of the following areas and show these graphically. Use probability
notation in your final answer.
a. above z = 1.46
b. below z = - 0.50
c. between z = 2.20 and z =1.00

19
2. The IQ scores of children in a special education class are normally distributed with =
95 and = 10.
a. What is the probability that one of the children has an IQ score below 100?
b. What is the probability that a child has an IQ score above 120?
c. What is the chances that a child has an IQ score of 90?

LESSON 4: STATISTICS/DATA MANAGEMENT


UNIT 5– LINEAR REGRESSION AND CORRELATION

Introduction:
Correlation is a statistical method used to determine whether a linear relationship
between variables exists. Regression is a statistical method used to describe the nature
of the relationship between variables, that is, positive or negative, linear or nonlinear.

Learning Objectives:
After successful completion of this lesson, you should be able to:
1. Use the methods of linear regression and correlations to predict the value of a
variable given certain conditions
2. advocate the use of statistical data in making important decisions

Course Material:

20
Pearson’s Correlation Coefficient
It measures the degree of association or closeness of relationship between
two variables. It is a measure of linear association.
The correlation coefficient is always between -1 and +1. If you ever an answer
outside this range, you have made an error in you calculations.

Computing of Correlation Coefficient

Let and represent the respective mean of the x and y values from the sample data.
Simplified Formula

Example: A group of graduating Agricultural Engineering students designed,


constructed and tested a manually operated crop-grading machine as their thesis. Using
potato as a sample crop in the testing, the following data on capacity (kg/min) and
efficiency (%) of the machine were gathered for ten trials and are presented below.
Determine the correlation coefficient for these data.

Capacity and Efficiency of a Crop-Grading Machine


TRIAL CAPACITY EFFICIENCY
1 3.52 83.33
2 4.15 76.67
3 3.84 93.33
4 3.93 80.00
5 4.25 70.00
6 3.16 90.00
7 3.17 86.67
8 4.76 73.33
9 3.12 83.33
10 3.78 80.00

21
Solution:
Capacity Efficiency
xy x2 y2
(x) (y)

3.52 83.33 293.32 12.39 6943.89


4.15 76.67 318.18 17.22 5878.29
3.84 93.33 358.39 14.75 8710.49
3.93 80.00 314.40 15.44 6400.00
4.25 70.00 297.50 18.06 4900.00
3.16 90.00 284.40 9.99 8100.00
3.17 86.67 274.74 10.05 7511.69
4.76 73.33 349.05 22.66 5377.29
3.12 83.33 259.99 9.73 6943.89
3.78 80.00 302.40 14.29 6400.00
37.68 816.66 3052.37 144.58 67165.53

= 10 (3052.37) – (37.68)(816.66) = -248.05

= 10 (144.58) – (37.68)2 = 26.02

= 10(67165.53) – (8166.66)2 = 4721.74

The correlation coefficient of -0.708 suggest fairly strong negative correlation


between capacity and efficiency.

LINEAR REGRESSION
Method of Least Squares is the sum of the squares of the vertical distances from
observed points to the line is as small as possible.
22
Then

Where.
= means of the values of y
= means of the values of x
a = y-intercept
b= slope

Prediction
Having determined the equation of the line, it is now possible to predict the value
of y given x.

Example: Determine the best linear relationship between speed and capacity, examine
the set of sample data pairs on speed and capacity as shown below.

Speed (RPM) Capacity (kg/min)


20.45 3.52
30.86 4.15
14.40 3.84
22.50 3.93
29.14 4.25
12.45 3.16
16.25 3.17
38.00 4.76
17.49 3.12
24.00 3.78

Solution/ Computation:

23
x y XY x2

20.45 3.52 71.984 418.2025

30.86 4.15 128.069 952.3396

14.40 3.84 55.296 207.36

22.50 3.93 88.425 506.25

29.14 4.25 123.845 849.1396

12.45 3.16 39.342 155.0025

16.25 3.17 51.5125 264.0625

38.00 4.76 180.88 1444

17.49 3.12 54.5688 305.900

24.00 3.78 90.72 576

225.54 37.68 884.642 5678.257

And with the capacity mean of 3.77 kg/min. and mean speed of 22.55 rpm
a =3.77 – 0.059(22.55)
a = 2.44
Therefore, the equation of the line is:
y = 2.44 + 0.059 x

24
Prediction
Determine the capacity when speed is 25 rpm. Using the equation,

y = 2.44 + 0.059 x
y = 2.44 + 0.059 (25)
y= 3.92 kg/min.
Watch:
 The (Pearson) Correlation Coefficient Explained in One Minute: From
Definition to Formula

https://www.youtube.com/watch?v=WpZi02ulCvQ
 How to Perform Simple Linear Regression by Hand

https://www.youtube.com/watch?v=GhrxgbQnEEU

Activity/ Assessment no. 14:


Find the correlation coefficient.
1. A random sample of states yielded the following numbers of local school districts
and the corresponding numbers of secondary schools. Is there a significant
relationship between the data?
School districts 53 19 24 17 95 68
Secondary schools 50 120 180 80 134 210
2. A professor wanted to see if the students’ grades in Mathematics can be predicted
using the average grade in high school. Below are the data were recorded.
Mathematics
75 75 75 85 76 75 75 79 75 88
Grade
4th Year HS
Average 82 65 75 85 80 76 75 75 83 75
Grade
Calculate the regression equation. Estimate Math grade when the HS
average grade is 90.

25
26

You might also like