You are on page 1of 16

MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD

HANDOUT – TG 4 Week No. 7 - 8

4. Managing and Understanding Data


Lesson 4.1 Preliminary to data Management
Lesson 4.2 Correlation and Regression Analysis
Lesson 4.3 Hypothesis Testing

Sustainable Development Goals: Learning Materials


SDG No. 4 Quality Education Lesson discussion using power point
SDG No. 5 - Gender Equality presentation
(45 - 60 minutes)
Learning Objectives
Video Presentation on Correlation
At the end of this lesson, student must be able to: Coefficient
1. Discuss management and interpretation of data https://www.youtube.com/watch?v=atLZN
using inferential statistics; GsTN6k
2. Scientifically gather, logically present, and critically
present data; Learning Materials: Scientific Calculators, Writing
3. Critically write, share findings, draw conclusions Materials, and Computers/Laptops
with the use of statistical tools like correlation and
linear regression; and
4. Competent and skilled to meet national,
international, and global standards.

Lesson Preview

This lesson is designed to equip you with the knowledge and skills of gathering, organizing, or
presenting of data in tables and graphs, and interpreting data. You will be asked to choose a relevant issue
which you want to research/study as your final output.

Concept Notes/Teacher-Led Discussion

Lesson 4.1 Preliminary to Data Management

What is Statistics?

Statistics is a field of mathematics that deals with the Collection, Organization, Analysis, and
Interpretation of quantitative data.

Collection of data is the process of gathering relevant information from the population.

Organization of data is the systematic arrangement of data into tables, graphs, or charts so that
logical and statistical conclusions can easily be derived from the collected information.

Analysis of data refers to the process of deducing relevant information from the given data so that
the numerical description can be formulated.

1
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

Interpretation of data is all about deriving conclusion from the data that have been analyzed. It also
involves making predictions and forecasts about large groups based on gathered data from small
groups.

Two Fields of Statistics


Statistics may be subdivided into two fields: the Descriptive and the Inferential fields.
1. Descriptive Statistics consist of the collection, organization, summarization, and presentation of data
 Here, the statistician tries to describe a given situation.

2. Inferential Statistics is another area of Statistics concerned with drawing conclusions about large
groups of data called the population based on selected elements of that population, known as
sample.
 Here, the statistician tries to make inferences from samples to population. This area also
makes use of the concept of probability.

Important Terms to Remember


The following are the terms you must remember:
1. Population (N) is the set of measurements corresponding in the entire collection of units about which
information is sought. It is the group of object about which conclusions are to be drawn. Represents
the target of an investigation.
umber of enrollees in MCC is 3600, therefore its population
Is N = 3600.

2. Sample (n) is the set of measurements that is collected in the course of investigation. It is the subset
of objects/subjects drawn from the population.
same example, if N = 3600 out of this population we will get the sample
like n = 2500.

3. Variable is the particular characteristic of the object or the individual. It varies from object to object.
A variable in any study maybe quantitative or qualitative in nature.
A quantitative variable has a value or numerical measurement for which operation can be applied.
Examples: age, height, weight
A qualitative variable describes an object or individual by placing the object into a category or group.
Examples: gender, nationality, color.

2
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

In Correlation Analysis variables can be an independent variable and dependent variable.


a. Independent Variable is the variable being used as the basis of prediction and is usually
goes on the x-axis
b. Dependent Variable (sometimes known as the responding variable) is what is being
studied and measured. The dependent variable always goes on the
y-axis.

How to Gather Data


Before discussing the different methods of gathering data, let us talk about determining the sample
size.

Determining the Sample Size


An appropriate sample size renders the research
more efficient; data generated are reliable,
resource investment is as limited as possible while
conforming to ethical standards

Therefore, samples should not be small and


should not be excessive.

There are many approaches to determine the sample size. Tis includes;
1. using a census for small populations,
2. using the sample size of similar studies,
3. using published tables by well-established authors such as the sample size table using

4. using sample size calculator (Raosoft Online Calculator) and


5. applying other formulas.

Terms to Consider in Conducting a Study


1. - it is the plus/minus number usually reported in the
newspaper or television when reporting the result of an opinion poll. It tells us how much ±
percentage points the results deviate from the real population value.

2. Confidence level (in %) tells the researcher how sure s/he can be that the response of the sample
represents that of the population.

3
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

For example: A 95% confidence interval with a 3 percent margin of error ( e = ±3%)means that our
statistics will be given ±3 percentage points of the real population with the value 95% of the time.

To illustrate: Let us use the results of the Jobstree.com survey which says thet Filipinos are the
happiest employees in South east Asia.

Jon Carlos Rodriguea, ABS CBN News: Posted on August 31, 2016 12:12PM / Updated
as of Sep 01, 2016 09:18PM

MANILA (UPDATE) Filipino employees are the happiest in Southeast Asia and Their
positive attitude is likely boost the economy, results of a Jobstreet.com survey
released August 31, 2016 showed.

The Philippines topped the seven-


saying they were happy with their jobs. Indonesia came in second at 71% whilw
Malaysia scored the lowest among the seven countries in Southeast Asia, at 41%.

Let us assume that the researcher used 5% margin of error, thus, the results are interpreted in the
following manner.

Country Job Happiness Index Using Assuming 𝒆 = ±𝟓%


Samples
1. Philippines 73% 68% to 78% Filipinos are happy with
their jobs.
2. Indonesia 71% 66% to 76% of Indonesians are happy
with their jobs.
3. Thailand 61% ?
4. Vietnam 60% ?
5. Hongkong 57% ?
Table 1: Selected Results of Job Happiness Index and its Interpretation
https://asianjournal.com/news/filipinos-happiest-employees-in-se-asia-report-says/

Sampling Techniques
The methods of selecting samples from a given population.

4
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

Simple Random Sampling: It is the most basic sampling technique where samples are selected from
a population entirely by chance, and each member of the population has equal or known chance of
being included in the sample.
Example: Lottery sampling, or the use of random numbers

Stratified Random Sampling: Stratified random sampling is a sampling method that subdivides the

attributes or characteristics. A sample from each stratum proportional to its size when compared to
the population is pooled to form a random sample

Data Gathering techniques

Two Types of Research Methods

Qualitative research seeks to give an in-depth picture of why and how people behave, or why a
phenomenon occurred by collecting data in words coming from interview, observations, focus group
discussions, open-ended questions, etc. in order to draw conclusions and make inferences.

Quantitative research is describing a phenomena or behavior by collecting numerical data or data in


words, which are translated into numbers in order to describe, generalize, and infer.

For easier reference:


Qualitative = Quality (Attributes/Characterisctics/in words)
Quantitative = Quantity (Numbers/Ranks/words translated to numerical values)

Data Gathering Techniques


Direct Interview In-depth interview
Indirect or questionnaire (paper/pencil or Indirect or questionnaire (paper/pencil or
web-based: closed-ended) web-based: closed-ended)
Registration Document Review
Experimental/clinical traits Focus group discussion
Observation Observation

Lesson 4.2 Correlation and Regression Analysis

Lesson Overview
In this lesson you will recognize that correlation and regression analysis can be used
in making decisions.

Correlation Analysis

Scatter Plot
A scatter plot is drawn so we can analyze if the two variables are related somehow. If there is correlation
found, depending upon the numerical values measured, this can be either positive or negative.

5
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

A scatter plot is a graph of ordered pairs (x, y) consisting of data from two data sets.
Positive correlation exists if one variable increases simultaneously with the other, i.e. the high numerical
values of one variable relate to the high numerical values of the other.

Negative correlation exists if one variable decreases when the other increases, i.e. the high numerical values
of one variable relate to the low numerical values of the other.

Example 1. Draw a scatter plot for the scores shown. Is there a relationship between the sets of scores?

6
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

Geometry Scores
(y)

Algebra Scores (x)

We can see from the graph that there appears a positive correlation between the Geometry and
Algebra scores, since the graph progress from lower left to upper right which means as the score in Geometry
increased(decreased) the scores in Algebra decreased (increased).

Example 2. Suppose the scores of the students in those subjects are as follows. Plot the scatter diagram. Is
there a relationship between the two sets of scores.

Geometry Scores (y)

Algebra Scores (x)

7
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

We can see from the graph that there appears a negative correlation between the Geometry and
Algebra scores, since the graph progress from upper left to lower right which means as the score in Geometry
increased(decreased) the scores in Algebra also increased (decreases).

Correlation Coefficient (r)

Deciding whether or not the two data sets are related by simply looking at a scatter plot is a pretty
subjective process, so it would be nice to have a way to quantify how strongly connected data sets are.
The correlation coefficient is a number that describes how strong the relationship between two data
sets. Correlation coefficients range from -1 (perfect negative correlation) to 1 (perfect positive correlation).
A correlation coefficient close to zero indicates that the data sets are most likely not linearly correlated (See
figure 1).

Figure 1.0

We use the letter r

Calculating the Value of Correlation Coefficient ( r )


In order to find the value of the correlation coefficient, we will use the following formula

n xy  ( x)( y )
r
[n( x 2 )  ( x) 2 ][n( y 2 )  ( y ) 2 ]

where:
n = the number of data pairs
∑ 𝑥 = the sum of the x values
∑ 𝑦 = the sum of the y values
∑ 𝑥𝑦 = the sum of the products of the x and y values for each pair
∑ 𝑥 2 = the sum of the squares of the x values
∑ 𝑦 2 = the sum of the squares of the y values

Obviously, this is a pretty complicated formula, so arranging information in orderly table is a big help,

8
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

To interpret its value, refer to Table 1.0 in which of the following values your correlation r is closest to.

Example 3. Is there a significant relationship between the two sets of test scores in
Algebra and Geometry of ten students? Find the correlation coefficient for
the data and discuss what you think it indicates.

Solution: n - 10
Use a table to organize your data

9
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

Substitute all the sums of the data on the formula, then compute r:
n xy  ( x)( y )
r
[n( x 2 )  ( x) 2 ][n( y 2 )  ( y ) 2 ]

10(2045)  (137)(146)
r
[10(1,933)  (137) 2 ][10(2,186)  (146) 2 ]
20,450−20,002
𝑟=
√[(19330)−(18769)][(21,860−21,316)]

448
𝑟=
√[561][545]

448 448
𝑟= =
√305,745 552.94213

𝑟 = 0.8102 ≈ 0.81 (round off to the nearest tenths digit)


𝑟 = 0.81
Interpretation: From table 1.0 a coefficient of 0.81 indicates a high positive
correlation, when scores in Algebra increased (decreased), scores in
Geometry increased (decreased).
Example 4. Suppose the scores of the students in those two subjects a happen to be as
follows:
Student 1 2 3 4 5 6 7 8 9 10

Algebra Scores (x) 9 3 4 7 6 1 2 5 10 2

Geometry 3 6 7 4 2 9 8 4 2 10
Scores(y)
Solution: n = 10

9 3 27 81 9
3 6 18 9 36
4 7 28 15 49
7 4 28 49 16
6 2 12 36 4
1 9 9 1 81

10
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

2 8 16 4 64
5 4 20 25 16
10 2 20 100 4
2 10 20 16 100

Substitute all the sums of the data on the formula, then compute r:

n xy  ( x)( y )
r
[n( x 2 )  ( x) 2 ][n( y 2 )  ( y ) 2 ]

10(198) − (49)(55)
𝑟=
√[10(325)−(49)2 ][10(379)−(55)2 ]

1980−2685
𝑟=
√[(3250)−(2401)][(3790−3025)]

−705
𝑟=
√[849][765]

−705 −705
𝑟= =
√649485 805.90632

𝑟 = −0.875 ≈ - 0.88 (round off to the nearest tenths digit)

𝒓 = −𝟎. 𝟖𝟖 Interpretation: From table 1.0 a coefficient of -0.88 indicates a high negative
correlation, very dependable relationship, when scores in Algebra
increased (decreased), scores in Geometry decreased (increased).

Regression Analysis

Once we have concluded that there is a significant relationship between the two variables the next step
is to find the equation of the regression line through the data points.

If you look back at the scatter plot of Example 1, you can see a general trend among the points from
lower left to upper right. You could probably put a straightedge down and draw what seems like the closest
ta

distance from each point in the line is a minimum. For this reason the regression line is also called the line of
best fit.

Recall from algebra that the equation of a line in slope-intercept form is 𝑦 = 𝑚𝑥 + 𝑏 , where m is the
slope b is the y intercept. In statistics, the equation of the regression line is written as 𝒚 = 𝒂 + 𝒃𝒙 , where a
is the y-intercept and b is the slope. This is the equation that will be used here. In order to find the values for
a and b, we need two formulas.

11
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

Formulas for Finding the Values of a and b for the Equation of the Regression Line

Slope (b)

𝑛(∑ 𝑥𝑦) − [(∑ 𝑥)(∑ 𝑦)]


𝒃=
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2

y-intercept (a)

∑ 𝑦 − 𝑏(∑ 𝑥)
𝑎=
𝑛

Finding a Regression Line

Example 5. Find the equation of the regression line for the data in Example 3.

Solution
We already calculated the values need for each formula when we found the correlation coefficient in
Example 3. Substitute into the first formula to find the value of the slope.

𝑛(∑ 𝑥𝑦) − [(∑ 𝑥)(∑ 𝑦)]


𝑏=
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2

20,450−20,002 448
𝑏= = 561 = 0.798 ≈ 0.80
19,330−18,769

Substitute into the second formula to find the value of a (y-intercept) when b = 0.80

∑ 𝑦−𝑏(∑ 𝑥) 146 –[0.80(137)] 146−109.60 36.40


𝑎= = = = = 3.64
𝑛 10 10 10

The equation of the regression is 𝒚 = 𝟑. 𝟔𝟒 + 𝟎. 𝟖𝟎𝒙.

Example 6. Find the equation of the regression line in the data in Example 4.

𝑛(∑ 𝑥𝑦) − [(∑ 𝑥)(∑ 𝑦)]


𝑏=
𝑛(∑ 𝑥 2 ) − (∑ 𝑥)2

10(198)−[(49)(55)] 1980−2695 −715


𝑏= = = = 0.842 ≈ − 0.84
10(325)−(49)2 3250 − 2401 849

Substitute into the second formula to find the value of a (y-intercept) when b = 0.84

12
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

∑ 𝑦−𝑏(∑ 𝑥) 55 –[0.84(49)] 55−41.16 13.84


𝑎= = = = = 1.38
𝑛 10 10 10

The equation of the regression is 𝒚 = 𝟏. 𝟑𝟖 + (−𝟎. 𝟖𝟒𝒙) ≈ 𝒚 = 𝟏. 𝟑𝟖 − 𝟎. 𝟖𝟒𝒙

Lesson 4.3 Hypothesis Testing


Objective: To be able to construct the null and alternative hypotheses.
To differentiate between the null hypothesis and the alternative hypothesis.
To perform the step by step procedure for hypothesis testing.

A hypothesis is a speculation or theory based on insufficient evidence that lends itself to further testing
and experimentation. With further testing, a hypothesis can usually be proven true or false.

Two (2) types of hypothesis


1. A null hypothesis (Ho) is a hypothesis that says there is no statistical significance between the two
variables. It is usually the hypothesis a researcher or experimenter will try to disprove or discredit.
Example: There is no significant relationship between the test scores in Algebra and
Geometry.

2. An alternative hypothesis (Ha) is one that states there is a statistically significant relationship
between two variables.
Example: There is a significant relationship between the test scores in Algebra and
Geometry

Why do we need to test a hypothesis?


Hypothesis testing is an essential procedure in statistics. A hypothesis test evaluates two mutually
exclusive statements about a population to determine which statement is best supported by the sample data
like when we say that a finding is statistically significant.

Procedure in Testing a Hypothesis

Step 1. Set/State up the hypotheses. (Null and Alternative)


Ho: There is no significant relationship between the scores in Algebra and Geometry.
Ha: There is a significant relationship between the scores in Algebra and Geometry.

Step 2: Calculate the value of correlation coefficient, r.


Step 3. Level of significance, 𝛂 = 𝟎. 𝟎𝟓
Step 4. Calculate the value of t computed using the formula below:

where : n = sample size


r = correlation coefficient (refer to lesson 4.1)

Step 5. Statistical decision for hypothesis testing


Calculate the degrees of freedom to find the value of t critical on the t-table of values:

13
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

where n = sample size

If tcomputed  tcritical, do not reject H0


If tcomputed  tcritical, reject H0

Step 6. Conclusion

Example 7. Let us test the hypothesis for Example 3 in lesson 4.1.


Is there a significant relationship between the two sets of test scores in
Algebra and Geometry of ten students? Find the correlation coefficient for the
data and discuss what you think it indicates.

For this problem we have computed the correlation coefficient which is 𝑟 = 0.81 ,
you will use this coefficient in testing the hypothesis.

Solution:
Step 1. State the Null and alternative hypotheses.
Ho: There is no significant relationship between the scores in Algebra and
Geometry.
Ha: There is a significant relationship between the scores in Algebra and Geometry.

Step 2. Calculate the correlation coefficient. (refer to lesson 4.1, example 3)


.

𝑟 = 0.81 (this will be used in finding the t-computed in step 4)

Step 3.

Step 4. Calculate the value of t computed.

𝑡𝑐𝑜𝑚𝑝 = 3.906 (Compare this value if it is > 𝑜𝑟 < to the t critical value from
the t- table of values at 0.05 level of significance)

Step 5. Decision
From the t-table of values, at 0.05 level of significance tcritical = 2.2306.

14
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

Since t computed is 3.906 > t critical = 2.2306. Reject the Ho and accept Ha.

Step 6. Conclusion
We can conclude that there is a highly significant correlation between
Algebra and Geometry scores. Hence, when the scores in Algebra are
increased (or decreased) then the scores in Geometry are also increased (or
decreased).

 Activity

1. Accomplish Worksheet 4.1 Correlation and Regression Analysis and Worksheet 4.2 Hypothesis
Testing. Due Date: May 2, 2023
2. Long Quiz (Nature of Mathematics to Statistical Tools). Due Date: May 2, 2023
3. Group Output: Project Proposal for a Quantitative Study. (Refer to the General Instructions and Scoring
Rubric). Due Date: May 30, 2023

References
Aufmann, Lockwood, Nation and Clegg. (2013). Mathematical Excursions, Third Edition.
Cengage Learning. Belmont, CA 94002-3098 USA.
Baltazar, E Ethel Cecille et. Al. (2018). Mathematics in the Modern World. C and E
Publishing, Inc. Quezon City, Philippines.
Sobecki, Dave. (2018). Math in Our World, Fourth Edition. Mc Graw Hill Education.
New York, New York 10121.
Stewart, Ian. (1995). Nature’s Numbers. BasicBooks, 10 East 53rd Street, New York, NY
10022-5299

It is not the intention of the author/s nor the publisher of this te to have monetary gain in using the textual
information, imageries, and other references used in its production. This guide is only for the exclusive use of a bona fide
student of Mabalacat City College.

In addition, this guide or no part of it thereof may be reproduced, stored in a retrieval system, or transmitted,
in any form or by any means, electronic, mechanical, photocopying, and/or otherwise, without the prior permission of
Mabalacat City College.

Compiled by: Prepared by: Recommending Approval: Approved by:

April Ann L. Galang GRACIA T. CANLAS, LPT, MAED MARILYN S. ARCILLA, RN, LPT, MAN MICHELLE AGUILAR-ONG, DPA
Clerk, IAS MATH 101, Instructor Dean, IAS VPAA

15
MABALACAT CITY COLLEGE MATH 101 | MATHEMATICS IN THE MODERN WORLD
HANDOUT – TG 4 Week No. 7 - 8

16

You might also like