You are on page 1of 24

GEMMW

Mathematics in the Modern World

MODULE 3: DATA MANAGEMENT

Learning Outcomes:

At the end of this module, the students should be able to:

1. Solve and interpret the measures of central tendency for ungrouped data.
2. Solve and interpret the range, variance, standard deviation, coefficient of variation and
skewness.
3. Apply the correlation to determine the relationship between two variables.
4. Use linear regression to predict the value of a variable given certain conditions.
5. Use a variety of statistical tools to process and manage numerical data.

MEASURES OF CENTRAL TENDENCY

*Central Tendency – value/s that represents the whole set of data.

MEAN (𝒙̅)

- computational average
- the sum of all n values divided by the total frequency

 Arithmetic Mean
∑𝑥
𝑥̅ = Where: x represents the value of an observation
𝑛
n represents the total number of observations

 Weighted Mean
∑ 𝑤𝑥
𝑤𝑥̅ = Where: x represents each of the item values
∑𝑤
w represents the weight of each item value
∑ 𝑓𝑥
𝑤𝑥̅ = Where: f represents the frequency
𝑛
n represents the sample size

 Properties of the Mean:


1. Always a unique value in any set of data.
2. Associated with the interval or ratio data.
3. Strongly influenced by the extreme values in a set of data.
4. Most reliable measure of central tendency.
MEDIAN (𝒙̃)

- Positional average
- the center most or the middle most observation or value (when n is odd) or the
average of the two middle values (when n is even) when the data are arranged
(either ascending or descending)
- divides the set of data into two equal parts (half of the observation belongs to the
higher 50%, while the other half belongs to the lower 50% of the group)

 Properties of the Median:


1. Always a unique value in any set of data.
2. Associated with ordinal data.
3. Is not affected by extreme values.
4. A positional measure.

MODE (𝒙̂)

- Nominal average
- the most frequently occurring score in a distribution
- the observation or value which appears the most number of times in the set of values

 Properties of the Mode:


1. Not affected by extreme values.
2. It may not exist.
3. If the mode exists, it may not always be unique.
4. In finding the mode, we do not consider all the values in the distribution.
5. Associated with nominal data.

Examples:

Find the mean, median and mode of the following set of data. 1.

17 25 34 25 27 19 24

17+25+34+25+27+19+24 171
𝑥̅ = 7 = 7 ≈ 24.43
 In getting the median, arrange first the data (either ascending or descending),
then get the middlemost (if n is odd) or the average of the two middle values
(if n is even).

𝑥̃ ⇒ 17, 19, 24, 25, 25, 27, 34


𝑥̃ = 25

𝑥̂ = 25
2. 40 52 50 48 56 60 37 65 40 50 65

40(2)+52+50(2)+48+56+60+37+65(2) 563
𝑥̅ = 11 = 11 ≈ 51.18
𝑥̃ ⇒ 37, 40, 40, 48, 50, 50, 52, 56, 60, 65, 65
𝑥̃ = 50

𝑥̂ = 40, 50 and 65

3. 87 94 36 56 54 76 87 54 87 36
667
𝑥̅ = = 66.7
10

𝑥̃ ⇒ 36, 36, 54, 54, 56, 76, 87, 87, 87, 94


56+76 132
𝑥̃ = 2 = 2 = 66
𝑥̂ = 87

4. 21 23 16 15 26 27 19 24
171
𝑥̅ = = 21.375 ≈ 21.38
8

𝑥̃ ⇒ 15, 16, 19, 21, 23, 24, 26, 27


21+23 44
𝑥̃ = 2 = 2 = 22
𝑥̂ = no mode

 Weighted Mean

1. Supposed we are interested in computing the weighted mean of a BS Math student in a


certain university where he is enrolled in 6 subjects having different unit load, as follows:

No. of Grades
Subject wx
units (w) (x)
1 5 2.25 11.25 ∑ 𝑤𝑥 41.25
𝑤𝑥̅ = = = 2.29
2 3 2.75 8.25 ∑𝑤 18
3 4 3.00 12.00
4 3 1.25 3.75
5 1 2.00 2.00
6 2 2.00 4.00
∑ 𝑤 = 18 ∑ 𝑤𝑥 = 41.25
2. If 8 000 books of Algebra were sold at ₱320 each, 1 500 Business Mathematics at
₱380 each, 1 000 Mathematics of Investment at ₱300 each and 3 500 Statistics at
₱340 each, find the weighted mean sales for the four books.

Book Title No. of books (w) Price (x) wx


Algebra 8 000 ₱320 2 560 000
Business
1 500 ₱380 570 000
Mathematics
Mathematics of
1 000 ₱300 300 000
Investment
Statistics 3 500 ₱340 1 190 000
∑ 𝑤 = 14 000 ∑ 𝑤𝑥 = 4 620 000

∑ 𝑤𝑥 4 620 000
𝑤𝑥̅ = = = ₱330.00
∑𝑥 14 000

3. Miss Z has 21 students in a specific subject. These students were asked on how often
Miss Z gives assignment. Of these students, 18 answered (4) very often, 2 answered (3)
often, 1 for (2) seldom and nobody for (1) never.

∑ 𝑤𝑥 18(4)+2(3)+1(2)+0(1)
𝑤𝑥̅ = ∑𝑥 = 21 = 3.81(very often)
Name: Score:
Section: Date:

Activity 1
Measures of Central Tendency

Find the mean, median and mode of the following data.

a. 21 10 36 42 39 52 30 25 26

𝑥̅ = 𝑥̃ = 𝑥̂ =

b. 21 55 25 30 26 36 42 39 36 25

𝑥̅ = 𝑥̃ = 𝑥̂ =

c. 108 120 154 118 125 164 135

𝑥̅ = 𝑥̃ = 𝑥̂ =

d. 31 21 16 15 21 27 19 18

𝑥̅ = 𝑥̃ = 𝑥̂ =

e. 87 94 36 56 54 76 87 85 68 56 78 88

𝑥̅ = 𝑥̃ = 𝑥̂ =

f. A student gets the following grades in his seven subjects: 87 for Calculus, 82 for
Physics, 79 for Chemistry, 81 for English and 83 for History. Compute for his mean
grade if the weights for the five subjects are 5.0, 4.0, 4.0, 3.0 and 3.0, respectively. 𝑥̅
=

g. It was recorded that 5 brands of ballpen with tag prices of ₱7.50, ₱8.00,
₱9.00, ₱10.00 and ₱12.50 were bought by 16, 5, 4, 12 and 6 students. Find the mean
sale. 𝑥̅ =

h. Jessie Salvador, an Engineering student got 88%, 85%, 91% and 93% in four of his
subjects. What grade must he get in his fifth subject in order to obtain an average of
90%? 𝑥=
i. The table below shows the number of respondents who answered 5, 4, 3, 2 and 1 on
three questions. Compute for the weighted mean and give the mean interpretation
using the scale below:

Mean Interpretation
1.00 – 1.79 To a Very Slight Extent (VSE)
1.80 – 2.59 To a Slight Extent (SE)
2.60 – 3.39 To a Moderate Extent (ME)
3.40 – 4.19 To a Great Extent (GE)
4.20 – 5.00 To a Very Great Extent (VGE)

5 4 3 2 1 wx̅ Interpretation
To what extent do you think
Statistics will help you your 15 20 5 0 0
chosen career? in
To what extent do you think
Statistics will help you in doing 10 25 3 2 0
research?
To what extent do you think Statistics
will help you in real life situation? 11 16 8 5 0
MEASURES OF VARIABILITY OR DISPERSION

The measures of variability indicate the degree or extent to which numerical values
are dispersed or spread out about the average value (mean) in a distribution. The most
commonly used measures of variations are the range, variance and standard deviation.

RANGE (R)
The range, which is the simplest to compute, is the difference between the largest
and the lowest values in the set of numerical data. This is a poor and unstable measure of
variation, particularly, if we consider a large number of values. It is least reliable and
should be used only when someone wants to obtain a quick measure of variation.

THE VARIANCE (s2) AND THE STANDARD DEVIATION (s)


The variance is the average of the squared deviation values from the distribution’s
mean. The standard deviation which is the positive square root of the variance measures
the spread or dispersion of each value from the mean of the distribution. It is the most
used measure of spread since it improves interpretability by removing the variance
square and expressing deviations in their original unit, and is significantly related to
normal distributions. It is the most important measure of dispersion since it enables us to
determine with a great deal of accuracy where the values of the distribution are located in
relation to the mean.

The variance and the standard deviation are generally accepted measures of
dispersion, especially in discussions and presentation of reports containing basic
statistics. The standard deviation is more popularly used than the variance since its value
is expressed in the unit of observations and the mean.

Take note: The higher the standard deviation, the more spread or more dispersed the data
are. The smaller the standard deviation, the less spread and less dispersed, the
more homogeneous, more consistent or more uniform the data are.

2 ∑(𝑥−𝑥̅)2
s = 𝑛 ∑ 𝑥2−(∑ 𝑥)2
𝑛−1 or s2 = 𝑛(𝑛−1)

∑(𝑥−𝑥̅)2 2 𝑛 ∑ 𝑥2−(∑ 𝑥)2


s= √ 𝑛−1 or s =√
𝑛(𝑛−1)

Examples:

1. Find the value of the range, variance and standard deviation of the set of data: 17, 25,
24, 18, 20

R = HV – LV = 25 – 17 = 8
x (𝒙 − 𝒙̅) (𝒙 − 𝒙̅)𝟐 x2
17 17– 20.8 = –3.8 (–3.8)2 = 14.44 289
18 18 – 20.8 = –2.8 (–2.8)2 = 7.84 324
20 20 – 20.8 = –0.8 (–0.8)2 = 0.64 400
24 24 – 20.8 = 3.2 (3.2)2 = 10.24 576
25 25 – 20.8 = 4.2 (4.2)2 = 17.64 625
104 50.8 2 214

2 ∑(𝑥−𝑥̅)2 50.8 50.8


s = 𝑛−1 = 5−1 = 4 = 12.7 or
2 2 2
𝑛 ∑ 𝑥 −(∑ 𝑥) 5(2 214)−(104) 254
s2 = 𝑛(𝑛−1) = 5(5−1) = 20 = 12.7

s = √12.7 ≈ 3.56

2. Suppose two applicants, A and B for secretarial position were given an examination to
test and compare their typing speed. (Assume all factors are being equal). Each was
given nine trials (in minutes) and the results were as follows:
A: 14 16 18 20 22 24 26 28 30
B: 18 18 20 22 24 24 24 24 24

RA = 30 – 14 = 16 RB = 24 – 18 = 6

Secretary A Secretary B
x x2 x x2
14 196 18 324
16 256 18 324
18 324 20 400
20 400 22 484
22 484 24 576
24 576 24 576
26 676 24 576
28 784 24 576
30 900 24 576
198 4 596 198 4 412

2 𝑛 ∑ 𝑥2−(∑ 𝑥)2 9(4 596)−(198)2 2160


Secretary A: s = = = = 30 s = √30 ≈ 5.48
𝑛(𝑛−1) 9(9−1) 72

2 𝑛 ∑ 𝑥2−(∑ 𝑥)2 9(4 412)−(198)2 504


Secretary B: s = = = =7 s = √7 ≈ 2.65
𝑛(𝑛−1) 9(9−1) 72

 Secretary B is more consistent than Secretary A in terms of performance in the


typing test.
Name: Score:
Section: Date:

Activity 2
Measures of Variability or Dispersion

a. The monthly number of cars sold by a car dealer from January to October for a
particular year are: 20 24 12 10 18 4 15 6 11 19.

Find the range, variance and standard deviation.

b. Sample annual salaries, in thousands of pesos, for Manila and Makati are listed.
Manila: 34 25 17 17 27 25 29 33 26
Makati: 26 23 27 28 25 26 18 26 31

*Compute for the range, variance and standard deviation; and interpret the result.
*In which area salary is more consistent?
COEFFICIENT OF VARIATION

When the measures of absolute variability are expressed in some other measures, the
resulting measures are termed measures of relative dispersion. These measures express
the amounts of variation relative to the mean.

When the units of measurement are different, this relative dispersion may also be used
to compare the descriptions of the variability of sets of numerical data. For instance, you may
compare the variability of the ages of 9 children whose mean age is 10 years with a standard
deviation of 2 years, with their weights whose mean is 45 pounds with a standard deviation of
5 pounds, by calculating their measures of relative dispersion. While it is not logical to
compare the values of their standard deviations in as much as they are expressed in different
units of measure, it is, nevertheless, reasonable to determine measures that would indicate the
amounts of their variations relative to their means.

COEFFICIENT OF VARIATION (CV)


- Expresses the standard deviation as a percentage of the mean.

CV = s
× 100 Where: s = standard deviation and 𝑥̅ = mean


̅

Examples:

1. A dealer sells two classes of quality lamps, A and B. Lamp A has a mean life span of 2
000 hours with a standard deviation of 200 hours, while Lamp B has a mean life span of 2
500 hours with a standard deviation of 300 hours. Compare the dispersion.

Lamp A Lamp B

CV = s × 100 = 200 300


CV = × 100 =
s
× 100 = 10% × 100 = 12%
𝑥 2 000 𝑥 2 500

Interpretation:
 Lamp B (CV = 12%) has greater relative dispersion or is more variable; more
dispersed than Lamp A (CV = 10%).
 Lamp A has lesser relative dispersion or is more consistent; more uniform; more
homogenous; better than Lamp B.

2. An investor is considering the purchase of 1 of 2 stocks. The yield of company A has an


average of Php105 per share over the past ten years with a standard deviation of Php15 per
share. Company B has yielded an average of Php333 per share during
the same period, with a standard deviation of Php40. Which company is more
consistent?

Company A Company B

CV = s × 100 = 15 40
× 100 ≈ 14.29% CV = × 100
s
× 100 ≈ 12.01%
𝑥 105 = 333
𝑥

Interpretation:
 Company B is more consistent than Company A.
Name: Score:
Section:_ Date:

Activity 3
Coefficient of Variation

1. A random sample of 10 students in a Statistics class got a mean score of 78% with a
standard deviation of 7% and a mean weight of 105 pounds with a standard deviation of
10 pounds. Determine the coefficient of variation.

2. In a barangay health center with no more than a hundred patients, a distribution of two
different units is given to compare the dispersion of weights with the dispersion of
heights. The mean height is 5.7 feet with a standard deviation of 0.9 feet and the mean
weight is 72.5 kilograms with a standard deviation of 8.1 kilograms.

3. Two employees A and B are to compare their daily routine of work. A can finish his job
with an average of 1.5 hours with a standard deviation of 0.025 hour, whereas B can
finish the job with an average of 4 hours and a standard deviation of 0.01 hour. Who is
more consistent?

4. A dealer of an electronic adaptor sells two classes of adaptor, A and B. Adaptor A has a
mean life span of 2 100 hours with a standard deviation of 150 hours, while Adaptor B
has a mean life span of 2 600 hours with a standard deviation of 200 hours. Which
adaptor has the greater relative dispersion? Which is more consistent?
SKEWNESS

Another statistical measure like the central tendency (average) and the dispersion
(variation) is the skewness (symmetry). Skewness (sk) is the degree of symmetry or
departures from symmetry of a set of data. A skewed distribution is similar in shape to a
normal distribution except that it is not symmetrical: the half left of the polygon is not a
mirror image of the right half.

3(𝑥̅ −𝑥̃ )
sk = s

Shapes commonly observed:

1. Normal Distribution or Symmetrical


- bell–shaped curve
- the mean is equal to the median and mode
- sk = 0

2. Positively Skewed
- skewed to the right (longer right tail)
- the mean is greater than the median and mode
- sk > 0

3. Negatively Skewed
- skewed to the left (longer left tail)
- the mean is less than the median and mode
- sk < 0

Examples:

1. Determine the coefficient of skewness for each of the following:


i. 𝑥̅ = 40 𝑥̃ = 38 s=4
3(𝑥̅ −𝑥̃ ) 3(40−38)
sk = = = 1.5 positively skewed
s 4

ii. 𝑥̅ = 320 𝑥̃ = 350 s = 40


3(𝑥̅ −𝑥̃ ) 3(320−350)
sk = = = –2.25 negatively skewed
s 40

iii. 𝑥̅ = 70 𝑥̃ = 70 s = 10
3(𝑥̅ −𝑥̃ ) 3(70−70)
sk = = =0 symmetrical
s 10
2. A physician conducted a medical research on the study of the spread of cancer using a
group of patients. The results reveal that the mean is 70 days with a standard deviation of
44 days and a median of 65 days. What is the coefficient of skewness?
3(𝑥̅ −𝑥̃ ) 3(70−65)
sk = = ≈ 0.34 positively skewed
s 44
Name: Score:
Section:_ Date:

Activity 4
Skewness

1. Determine the coefficient of skewness for each of the following sets of data and describe
the result.
a. 𝑥̅ = 50 𝑥̃ = 40 s = 4.5
b. 𝑥̃ = 100 𝑥̅ = 120 s = 11.5
c. 𝑥̅ = 75 𝑥̃ = 85 s = 6.2
d. 𝑥̃ = 295 𝑥̅ = 250 s = 35

2. At Saint Mary’s Academy, the mean age of the students is 19.2 years, with a standard
deviation of 1.2 years. The median age is 18.6 years. Compute the coefficient of
skewness. Describe the skewness.
CORRELATION

In everyday discourse, almost all statements about the mutual relation between
variables are accepted without question. For example, age and physical capacity, income and
educational attainment, intelligence and academic performance, cigarette smoking and lung
disease, unemployment and the condition of the economy, and so on. In almost every field,
we find that one variable is somewhat related to another variable, or that relationship exists
between variables. It should be noted, however, that relationship does not mean causality.
That is, relationship does not necessarily imply that one variable is the cause of the other
variable.

The investigation of two or more variables requires not only procedures for defining
and measuring the variables under study, but also for describing the nature of relations
between them. A procedure that may be used to determine the relationship between variables
is the correlation.

Correlation is a statistical tool to measure the association of two or more quantitative


variables. This is a measure of the degree of relationship of two sets of variables, X and Y.
The statistics used to describe the degree or magnitude of relationship between variables is
called a correlation coefficient (r) which is composed of the direction and magnitude.

The types of correlation may be classified in terms of its magnitude and direction. The
degree or magnitude may be described as perfect, high, moderate or low. The direction may
be classified as positive correlation, negative correlation or zero correlation. A positive
correlation means that there is a direct relationship between variables. It exists when high
values in one variable are associated with high values in the other variable, and low values
in one variable are associated with low values in the other variable. For instance, if a student
top in test X, he is likely to lead in test Y; and if he is low in test X, he is also likely to be low
in test Y. The negative correlation, on the other hand, exists when high values in one
variable are associated with low values in the second variable, and vice–versa. For instance, a
student who gets a high score in test X is low in test Y and one who is lowest in test X is
highest in test Y. When values in one variable tend to score neither systematically high nor
systematically low in the other variable, then there is zero correlation.
Here is the correlation scale and the corresponding interpretation of r.
Value of r Interpretation
±1 Perfect Correlation
±0.80 – ±0.99 High Correlation
±0.60 – ±0.79 Moderately High Correlation
±0.40 – ±0.59 Moderate Correlation
±0.20 – ±0.39 Low Correlation
±0.01 – ±0.19 Negligible Correlation
0 No Correlation

Pearson Product Moment Correlation Coefficient

The most widely used measure of correlation is the Pearson Product Moment
Correlation Coefficient or Pearson r which was developed by Karl Pearson. This statistic is
used for interval and ratio type of data. If two variables, X and Y, are under investigation, the
correlation coefficient is determined by:

𝑛 ∑ XY−(∑ X)(∑ Y)
r = √[𝑛 ∑ X2−(∑ X)2][𝑛 ∑ Y2−(∑ Y)2]

Example:

Determine the degree of relationship between the midterm and final grade of 10
students at a certain university.

Student Midterm Grade Final Grade


A 84 85
B 88 89
C 78 86
D 79 83
E 91 88
F 84 87
G 77 81
H 83 86
I 85 82
J 86 85
Solution:

Midterm Final
Student XY X2 Y2
Grade (X) Grade (Y)
A 84 85 7 140 7 056 7 225
B 88 89 7 832 7 744 7 921
C 78 86 6 708 6 084 7 396
D 79 83 6 557 6 241 6 889
E 91 88 8 008 8 281 7 744
F 84 87 7 308 7 056 7 569
G 77 81 6 237 5 929 6 561
H 83 86 7 138 6 889 7 396
I 85 82 6 970 7 225 6 724
J 86 85 7 310 7 396 7 225
∑ X = 835 ∑ Y = 852 ∑ XY = 71 208 ∑ X = 69 901
2
∑ Y = 72 650
2

𝑛 ∑ XY−(∑ X)(∑ Y)
r = √[𝑛 ∑ X2−(∑ X)2][𝑛 ∑ Y2−(∑ Y)2]

10(71 208)−(835)(852)
r = √[10(69 901)−(835)2][10(72 650)−(852)2] 660 ≈ 0.64
= √(1 785)(596)

Interpretation: There is a moderately high positive correlation between the midterm and
final grade of 10 students.

Spearman Rank–Order Correlation Coefficient

The Spearman Rank–Order Correlation Coefficient or Spearman rho (ρ) is another


statistic in determining the correlation coefficient. This statistic is used to find out if there is a
significant relationship between two variables of ordinal type. In some cases, values from an
interval type of data, such as test scores and grade point average, may be transformed into
ranks. To obtain the value of Spearman rho, consider this formula:

6 ∑ D2
ρ=1−
𝑛(𝑛2−1) Where: D is the difference between ranks
Example:

Compute for the value of Spearman rho and determine the degree of relationship
between capital and profit of dried fish.

Businessmen Capital (X) Profit (Y) RX RY D D2


1 20 000 5 000 6 7 –1 1
2 50 000 15 000 3 3.5 –0.5 0.25
3 10 000 3 000 9 9.5 –0.5 0.25
4 100 000 30 000 2 2 0 0
5 18 000 4 000 7 8 –1 1
6 25 000 9 000 5 5 0 0
7 11 000 6 000 8 6 2 4
8 150 000 70 000 1 1 0 0
9 5 000 3 000 10 9.5 0.5 0.25
10 40 000 15 000 4 3.5 0.5 0.25
∑ D2 = 7

6 ∑ D2 42
ρ=1− =1−
6(7) ≈ 0.96
𝑛(𝑛 −1) =1−
2
10(102−1) 990

Interpretation: There is a high positive correlation between the capital and profit of 10
businessmen.
Name: Score:
Section:_ Date:

Activity 5
Correlation

1. The heights and weights of 10 basketball players in the PBA are randomly selected from
different teams. Calculate the value of Pearson r and interpret the result.

Player Height (X) Weight (Y) XY X2 Y2


A 68 180
B 72 200
C 76 175
D 70 190
E 74 180
F 69 195
G 70 145
H 70 172
I 73 190
J 68 160

2. Compute for the value of Spearman rho and determine the degree of relationship between
weight and height of bottle–fed infants using the same brand of milk.

Infant Weight (X) Height (Y) RX RY D D2


1 27 0.70
2 25 0.64
3 28 0.77
4 23 0.62
5 21 0.60
6 20 0.62
7 29 0.77
8 24 0.64
LINEAR REGRESSION

A linear regression is used to make predictions about a single value. Simple linear
regression involves discovering the equation for a line that most nearly fits the given data.
The linear equation is used to predict values for the data. Simple linear regression aims to
find a linear relationship between a response variable and a possible predictor variable by the
least square method.

A regression equation is a mathematical equation that is used to predict the values of


one dependent variable from unknown values of one or more independent variables. The
variable being predicted or explained is called dependent variable, while the variable that is
used to predict or explain the dependent variable is called the independent variable.

The least square regression equation can be formed from a set of sample data
using the formula:

y = a + bx

Where: y = predicted or dependent variable x =


predictor or independent variable
a = y–intercept (value of y at point where x = 0)

∑ Y(∑ X2)−∑ X(∑ XY)


a= 𝑛 ∑ X2−(∑ X)2 or

a = 𝑦̅ − b𝑥̅ Where: 𝑦̅ is the mean of y and 𝑥̅ is the mean of x


b is the slope of the line that represents the equation

𝑛(∑ XY)−(∑ X)(∑ Y) ∑ XY−𝑛𝑥̅ 𝑦̅


b= 𝑛(∑ X2)−(∑ X)2 or b= ∑ X2−𝑛𝑥̅2

Note: The constants a and b in the regression equation are called the regression
coefficients.

Example:

The number of hours 13 students spent in studying for a test and their scores on that test are
shown below, what would be the estimated score if a student studies for 6.5 hours?
Hours spent
0 1 2 4 4 5 5 5 6 6 7 7 8
studying, X
Test Score, Y 40 41 51 48 64 69 73 75 68 93 84 90 95
Solution:

From the data above: ∑ X = 60; ∑ Y = 891; ∑ XY = 4 620 and ∑ X2 = 346.

First, solve for b.

𝑛(∑ XY)−(∑ X)(∑ Y) 13(4 620)−(60)(891) 6 600


b= 𝑛(∑ X2)−(∑ X)2 = 13(346)−(60)2 = 898 ≈ 7.35

Then, compute for a.

∑ Y(∑ X2)−∑ X(∑ XY) 891(346)−(60)(4 620) 31 086


a= 𝑛 ∑ X2−(∑ X)2 = 13(346)−(60)2 = 898 ≈ 34.62

Therefore: y = a + bx Where x = 6.5


y = 34.62 + (7.35) (6.5) –regression equation
y = 34.62 + 47.78
y = 82.40
Name: Score:
Section: Date:

Activity 6
Linear Regression

1. The table below shows the monthly income (X) and the monthly expenses (Y) of 7
families in a certain barangay in Makati. Estimate the monthly expenditures of a family
whose income is ₱ 8 250.

Monthly Monthly
Family No. XY X2
Income (X) Expenses (Y)
1 6 600 4 980
2 5 875 4 680
3 7 250 5 650
4 4 925 3 700
5 5 678 5 668
6 5 975 4 260
7 6 950 6 380
References:

Sirug, W. S. (2018). Mathematics in the Modern World

Sirug, W. S. (2018). Elementary Statistics

Blay, B. E. (2013). Elementary Statistics

You might also like