Lesson 5 MMW Maeb

LEARNING MODULE IN GE-4 (Mathematics in the modern worlD)
AGUSAN DEL SUR STATE COLLEGE OF AGRICULTURE AND TECHNOLOGY

MAIN CAMPUS, BUNAWAN AGUSAN DEL SUR
Agusan del Sur State College of Agriculture and Technology

Bunawan, Agusan del Sur
GE 4- MATHEMATICS IN THE MODERN WORLD

COURSE LEARNING PACKET (CLP)
Prepared by:
RACHEL M. PATANGAN
Instructor III
Faculty from the College of Arts and Sciences

Agusan del Sur State College of Agriculture and Technology
COLLEGE OF ARTS AND SCIENCES

BACHELOR OF SCIENCE IN APPLIED MATHEMATICS
INSTRUCTIONAL MATERIALS DEVELOPMENT ©
2020
LESSON 5.1
DATA GATHERING AND ORGANIZING DATA; REPRESENTING DATA USING GRAPHS AND
CHARTS; INTERPRETING ORGANIZED DATA
Fact sheets
You may be familiar with probability and statistics through radio, television,
newspapers, and magazines. For example, you may have read statements like the following:
1. In Philippines, 60% of adults ranging 55 – 65 years old were being infected by the
Coronavirus diseases.
2. The back-to-school student plans to spend, on average, 15,050 on electronics and
computer-related items.
3. A person need an average of 8 hours of sleep per day.
4. The average in-state college tuition and fees for 4-year public college is 10,500
Students study statistics for several reasons:
1. Like professional people, you must be able to read and understand the various
statistical studies performed in your fields. To have this understanding, you must be
knowledgeable about the vocabulary, symbols, concepts, and statistical procedures
used in these studies.
2. You may be called on to conduct research in your field, since statistical procedures
are basic to research. To accomplish this, you must be able to design experiments;
collect, organize, analyze, and summarize data; and possibly make reliable
predictions or forecasts for future use. You must also be able to communicate the
results of the study in your own words.
3. You can also use the knowledge gained from studying statistics to become better
consumers and citizens. For example, you can make intelligent decisions about what
products to purchase based on consumer studies, about government spending based
on utilization studies, and so on.
These reasons can be considered the goals for studying statistics. It is the purpose of this
chapter to introduce the goals for studying statistics by answering questions such as the
following:
1. What are the branches of statistics?

2. What are data?
3. How are samples selected?
Descriptive and Inferential Statistics
To gain knowledge about seemingly haphazard situations, statisticians collect

information for variables, which describe the situation.

2020
Data are the values (measurements or observations) that the variables can assume.
Random Variables - variables whose values are determined by chance.
A collection of data values forms a data set. Each value in the data set is called a data value
or a datum.
Data can be used in different ways. The body of knowledge called statistics is
sometimes divided into two main areas, depending on how data are used. The two areas are
1. Descriptive statistics
2. Inferential statistics
In descriptive statistics the statistician tries to describe a situation. Consider the

national census conducted by the U.S. government every 10 years. Results of this census
give you the average age, income, and other characteristics of the U.S. population. To obtain
this information, the Census Bureau must have some means to collect relevant data. Once
data are collected, the bureau must organize and summarize them. Finally, the bureau needs
a means of presenting the data in some meaningful form, such as charts, graphs, or tables.
The second area of statistics is called inferential statistics
Here, the statistician tries to make inferences from samples to populations. Inferential
statistics uses probability, i.e., the chance of an event occurring. You may be familiar with the
concepts of probability through various forms of gambling. If you play cards, dice, bingo, and
lotteries, you win or lose according to the laws of probability. Probability theory is also used in
the insurance industry and other areas.
It is important to distinguish between a sample and a population.
Most of the time, due to the expense, time, size of population, medical concerns, etc., it is not
possible to use the entire population for a statistical study; therefore, researchers use samples.
If the subjects of a sample are properly selected, most of the time they should
possess the same or similar characteristics as the subjects in the population. The techniques
used to properly select a sample will be explained in the next lesson.

2020
2.2 Variables and Types of Data
Variables can be classified as qualitative or quantitative.
Qualitative variables are variables that can be placed into distinct categories, according to
some characteristic or attribute. For example, if subjects are classified according to gender
(male or female), then the variable gender is qualitative. Other examples of qualitative
variables are religious preference and geographic locations.
Quantitative variables are numerical and can be ordered or ranked. For example, the variable
age is numerical, and people can be ranked in order according to the value of their ages.
Other examples of quantitative variables are heights, weights, and body temperatures.
Quantitative variables can be further classified into two groups: discrete and continuous.
Discrete variables can be assigned values such as 0, 1, 2, 3 and are said to be countable.
Examples of discrete variables are the number of children in a family, the number of students
in a classroom, and the number of calls received by a switchboard operator each day for a
month.
Continuous variables, by comparison, can assume an infinite number of values in an interval

between any two specific values. Temperature, for example, is a continuous variable, since
the variable can assume an infinite number of values between any two given temperatures.
The classification of variables can be summarized as follows:
In addition to being classified as qualitative or quantitative, variables can be classified

by how they are categorized, counted, or measured. For example, can the data be organized
into specific categories, such as area of residence (rural, suburban, or urban)? Can the data
values be ranked, such as first place, second place, etc.? Or are the values obtained from
measurement, such as heights, IQs, or temperature? This type of classification—i.e., how
variables are categorized, counted, or measured—uses measurement scales, and four
common types of scales are used: nominal, ordinal, interval, and ratio.
The first level of measurement is called the nominal level of measurement. A sample
of college instructors classified according to subject taught (e.g., English, history, psychology,
or mathematics) is an example of nominal-level measurement. Classifying survey subjects as
male or female is another example of nominal-level measurement. No ranking or order can be
placed on the data. Classifying residents according to zip codes is also an example of the
nominal level of measurement. Even though numbers are assigned as zip codes, there is no
meaningful order or ranking. Other examples of nominal-level data are political party
(Democratic,

2020
Republican, Independent, etc.), religion (Christianity, Judaism, Islam, etc.), and marital status
(single, married, divorced, widowed, separated).
The next level of measurement is called the ordinal level. Data measured at this level
can be placed into categories, and these categories can be ordered, or ranked. For example,
from student evaluations, guest speakers might be ranked as superior, average, or poor.
Floats in a homecoming parade might be ranked as first place, second place, etc. Note that
precise measurement of differences in the ordinal level of measurement does not exist.
For instance, when people are classified according to their build (small, medium, or large), a
large variation exists among the individuals in each class.
The third level of measurement is called the interval level. This level differs from the
ordinal level in that precise differences do exist between units. For example, many standardized
psychological tests yield values measured on an interval scale. IQ is an example of such a
variable. There is a meaningful difference of 1 point between an IQ of 109 and an IQ of 110.
Temperature is another example of interval measurement, since there is a meaningful difference
of 1F between each unit, such as 72 and 73F. One property is lacking in the interval scale:
There is no true zero. For example, IQ tests do not measure people who have no intelligence.
For temperature, 0F does not mean no heat at all.
The final level of measurement is called the ratio level. Examples of ratio scales are
those used to measure height, weight, area, and number of phone calls received. Ratio scales
have differences between units (1 inch, 1 pound, etc.) and a true zero. In addition, the ratio
scale contains a true ratio between values. For example, if one person can lift 200 pounds
and another can lift 100 pounds, then the ratio between them is 2 to 1. Put another way, the
first person can lift twice as much as the second person.
Examples of Measurement Scales

2020
When conducting a statistical study, the researcher must gather data for the particular
variable under study. For example, if a researcher wishes to study the number of people who
were bitten by poisonous snakes in a specific geographic area over the past several years, he
or she has to gather the data from various doctors, hospitals, or health departments.
To describe situations, draw conclusions, or make inferences about events, the

researcher must organize the data in some meaningful way. The most convenient method of
organizing data is to construct a frequency distribution.
After organizing the data, the researcher must present them so they can be
understood by those who will benefit from reading the study. The most useful method of
presenting the data is by constructing statistical charts and graphs. There are many different
types of charts and graphs, and each one has a specific purpose.
This lesson explains how to organize data by constructing frequency distributions and
how to present the data by constructing charts and graphs. The charts and graphs illustrated
here are histograms, frequency polygons, ogives, and pie graphs.
5.1.1. Organizing Data

Suppose a researcher wished to do a study on the ages of the top 50 wealthiest
people in the world. The researcher first would have to get the data on the ages of the people.
In this case, these ages are listed in Forbes Magazine. When the data are in original form,
they are called raw data and are listed next.
Since little information can be obtained from looking at raw data, the researcher
organizes the data into what is called a frequency distribution. A frequency distribution
consists of classes and their corresponding frequencies. Each raw data value is placed into a
quantitative or qualitative category called a class. The frequency of a class then is the number
of data values contained in a specific class. A frequency distribution is shown for the
preceding data set.

2020
Now some general observations can be made from looking at the frequency
distribution. For example, it can be stated that the majority of the wealthy people in the study
are over 55 years old.
The classes in this distribution are 35–41, 42–48, etc. These values are called class
limits. The data values 35, 36, 37, 38, 39, 40, 41 can be tallied in the first class; 42, 43, 44,
45, 46, 47, 48 in the second class; and so on.
Two types of frequency distributions that are most often used are the categorical
frequency distribution and the grouped frequency distribution. The procedures for constructing
these distributions are shown.
5.1.2 Categorical Frequency Distributions

The categorical frequency distribution is used for data that can be placed in specific
categories, such as nominal- or ordinal-level data. For example, data such as political
affiliation, religious affiliation, or major field of study would use categorical frequency
distributions.
Example 5.1.2.1
Twenty-five army inductees were given a blood test to determine their blood type. The
data set is
Construct a frequency distribution for the data.

INSTRUCTIONAL MATERIALS DEVELOPMENT © 2020
5.1.3 Grouped Frequency Distribution

When the range of the data is large, the data must be grouped into classes that are
more than one unit in width, in what is called a grouped frequency distribution. For example, a
distribution of the number of hours that boat batteries lasted is the following.
The procedure for constructing the preceding frequency distribution is given in

Example 5.1.3.1; however, several things should be noted. In this distribution, the values 24
and 30 of the first class are called class limits. The lower class limit is 24; it represents the
smallest data value that can be included in the class. The upper class limit is 30; it represents
the largest data value that can be included in the class. The numbers in the second column
are called class boundaries. These numbers are used to separate the classes so that there
are no gaps in the frequency distribution. The gaps are due to the limits; for example, there is
a gap between 30 and 31.
Students sometimes have difficulty finding class boundaries when given the class limits.
The basic rule of thumb is that the class limits should have the same decimal place value as the
data, but the class boundaries should have one additional place value and end in a 5. For
example, if the values in the data set are whole numbers, such as 24, 32, and 18, the limits for a
class might be 31–37, and the boundaries are 30.5–37.5. Find the boundaries by subtracting 0.5
from 31 (the lower class limit) and adding 0.5 to 37 (the upper class limit).
Lower limit - 0.5 = 31 - 0.5 = 30.5 = lower boundary

Upper limit - 0.5 = 37 - 0.5 = 37.5 = upper boundary
If the data are in tenths, such as 6.2, 7.8, and 12.6, the limits for a class hypothetically
might be 7.8–8.8, and the boundaries for that class would be 7.75–8.85. Find these values by
subtracting 0.05 from 7.8 and adding 0.05 to 8.8.
Finally, the class width for a class in a frequency distribution is found by subtracting
the lower (or upper) class limit of one class from the lower (or upper) class limit of the next
class. For example, the class width in the preceding distribution on the duration of boat
batteries is 7, found from 31 - 24 = 7.
The class width can also be found by subtracting the lower boundary from the upper
boundary for any given class. In this case, 30.5 - 23.5 = 7.
Note: Do not subtract the limits of a single class. It will result in an incorrect answer.
The researcher must decide how many classes to use and the width of each class. To
construct a frequency distribution, follow these rules:

2020

2020
Example 5.1.3.1
These data represent the record high temperatures in degrees Fahrenheit (0F) for each of the
50 states. Construct a grouped frequency distribution for the data using 7 classes.

2020

2020
Cumulative frequencies are used to show how many data values are accumulated up
to and including a specific class. In Example 5.1.3.1, 28 of the total record high temperatures
are less than or equal to 114F. Forty-eight of the total record high temperatures are less than
or equal to 124 0F.
After the raw data have been organized into a frequency distribution, it will be
analyzed by looking for peaks and extreme values. The peaks show which class or classes
have the most data values compared to the other classes. Extreme values, called outliers,
show large or small data values that are relative to other data values.
When the range of the data values is relatively small, a frequency distribution can be
constructed using single data values for each class. This type of distribution is called an
ungrouped frequency distribution and is shown next.
Example 5.1.3.2
The data shown here represent the number of miles per gallon (mpg) that 30 selected four-
wheel-drive sports utility vehicles obtained in city driving. Construct a frequency distribution,
and analyze the distribution.

2020

2020
The steps for constructing a grouped frequency distribution are summarized in the
following Procedure Table.
5.1.4 Graphs and Charts
After you have organized the data into a frequency distribution, you can present them
in graphical form. The purpose of graphs in statistics is to convey the data to the viewers in
pictorial form. It is easier for most people to comprehend the meaning of data presented
graphically than data presented numerically in tables or frequency distributions. This is
especially true if the users have little or no statistical knowledge.
Statistical graphs can be used to describe the data set or to analyze it. Graphs are
also useful in getting the audience’s attention in a publication or a speaking presentation.
They can be used to discuss an issue, reinforce a critical point, or summarize a data set. They
can also be used to discover a trend or pattern in a situation over a period of time.
The three most commonly used graphs in research are
1. The histogram.
2. The frequency polygon.
3. The cumulative frequency graph, or ogive (pronounced o-jive).

2020
Example 5.1.4.1
Construct a histogram to represent the data shown for the record high temperatures for
each of the 50 states (see Example 5.1.3.1).
Example 5.1.4.2 Using the frequency distribution given in Example 5.1.4.1, construct a
frequency polygon.

2020
The third type of graph that can be used represents the cumulative frequencies for
the classes. This type of graph is called the cumulative frequency graph, or ogive. The
cumulative frequency is the sum of the frequencies accumulated up to the upper boundary of
a class in the distribution.

2020
Example 5.1.4.3
Construct an ogive for the frequency distribution described in Example 5.1.4.1.

2020
The steps for drawing these three types of graphs are shown in the following Procedure Table.
LESSON 5.2
Measures of central tendency: mean, median, mode, weighted mean
Chapter 5.1 stated that statisticians use samples taken from populations; however, when
populations are small, it is not necessary to use samples since the entire population can be
used to gain information. For example, suppose an insurance manager wanted to know the
average weekly sales of all the company’s representatives. If the company employed a large
number of salespeople, say, nationwide, he would have to use a sample and make n
inference to the entire sales force. But if the company had only a few salespeople, say, only
87 agents, he would be able to use all representatives’ sales for a randomly chosen week and
thus use the entire population.
Measures found by using all the data values in the population are called parameters.
Measures obtained by using the data values from samples are called statistics; hence, the
average of the sales from a sample of representatives is a statistic, and the average of sales
obtained from the entire population is a parameter.
The Mean
Example 5.2.1
The data represent the number of days off per year for a sample of individuals selected from
nine different countries. Find the mean.
20, 26, 40, 36, 23, 42, 35, 24, 30

2020
Hence, the mean of the number of days off is 30.7
days. The Median
Example 5.2.2
The number of rooms in the seven hotels in downtown Pittsburgh is 713, 300, 618, 595, 311,
401, and 292. Find the median.
Hence, the median is 401 rooms.
Example 5.2.3
The number of tornadoes that have occurred in the United States over an 8-year period
follows. Find the median. 684, 764, 656, 702, 856, 1133, 1132, 1303
The median number of tornadoes is 810.
In Example 5.2.2 each had an odd number of values in the data set; hence, the median was
an actual data value. When there are an even number of values in the data set, the median
will fall between two given values, as illustrated in Examples 5.2.3.
The Mode
A data set that has only one value that occurs with the greatest frequency is said to be
unimodal. If a data set has two values that occur with the same greatest frequency, both
values are considered to be the mode and the data set is said to be bimodal. If a data set has
more than two values that occur with the same greatest frequency, each value is used as the
mode, and the data set is said to be multimodal. When no data value occurs more than once,
the data set is said to have no mode. A data set can have more than one mode or no mode at
all.

2020
Example 5.2.4
Find the mode of the signing bonuses of eight NFL players for a specific year. The bonuses in
millions of dollars are 18.0, 14.0, 34.5, 10, 11.3, 10, 12.4, 10
Example 5.2.5
Find the mode for the number of coal employees per county for 10 selected counties in
Southwestern Pennsylvania.
110, 731, 1031, 84, 20, 118, 1162, 1977, 103, 752
The Midrange
Example 5.2.6 In the last two winter seasons, the city of Brownsville, Minnesota, reported these
numbers of water-line breaks per month. Find the midrange.
2, 3, 6, 8, 4, 1
Summary of Measures of Central Tendency

2020
LESSON 5.3
Measures of dispersion: range, standard deviation and
variance
Measures of Dispersion

2020
Example 5.3.1
Find the variance and standard deviation for 35, 45, 30, 35, 40, 25.
Sample Variance and Standard Deviation

INSTRUCTIONAL MATERIALS DEVELOPMENT © 2020
Example 5.3.2 Find the sample variance and standard deviation for the amount of European
auto sales for a sample of 6 years shown. The data are in millions of dollars.
11.2, 11.9, 12.0, 12.8, 13.4, 14.3
LESSON 5.4
Measures of relative position: z-score, percentiles, quartiles and box and
whiskers plots
Measures of Position
In addition to measures of central tendency and measures of variation, there are

measures of position or location. These measures include standard scores, percentiles,
deciles, and quartiles. They are used to locate the relative position of a data value in the data
set. For example, if a value is located at the 80th percentile, it means that 80% of the values
fall below it
2020
in the distribution and 20% of the values fall above it. The median is the value that
corresponds to the 50th percentile, since one-half of the values fall below it and one-half of
the values fall above it. This section discusses these measures of position.
Standard Scores
There is an old saying, “You can’t compare apples and oranges.” But with the use of
statistics, it can be done to some extent. Suppose that a student scored 90 on a music test
and 45 on an English exam. Direct comparison of raw scores is impossible, since the exams
might not be equivalent in terms of number of questions, value of each question, and so on.
However, a comparison of a relative standard similar to both can be made. This comparison
uses the mean and standard deviation and is called a standard score or z score. (We also use
z scores in later chapters.)
A standard score or z score tells how many standard deviations a data value is above or below the
mean for a specific distribution of values. If a standard score is zero, then the data value is the
same as the mean.
For the purpose of this section, it will be assumed that when we find z scores, the data were
obtained from samples.
Example 5.4.1
A student scored 65 on a calculus test that had a mean of 50 and a standard deviation of 10;
she scored 30 on a history test with a mean of 25 and a standard deviation of 5. Compare her
relative positions on the two tests.

2020
Percentiles
Percentiles are position measures used in educational and health-related fields to indicate
the position of an individual in a group.
Example 5.4.2
A teacher gives a 20-point test to 10 students. The scores are shown here. Find the
percentile rank of a score of 12.
18, 15, 12, 6, 8, 2, 3, 5, 20, 10
Example 5.4.3
Using the data in Example 5.4.2, find the percentile rank for a score of 6.
Example 5.4.4
Using the scores in Example 5.4.2, find the value corresponding to the 25 th percentile.

2020
Example 5.4.5
Using the scores in Example 5.4.2, find the value corresponding to the 60 th percentile.

2020
Quartiles and Deciles
Quartiles divide the distribution into four groups, separated by Q1, Q2, Q3.
Note that Q1 is the same as the 25th percentile; Q2 is the same as the 50th percentile, or the median; Q3
corresponds to the 75th percentile, as shown:
Example 5.4.6
Find Q1, Q2, and Q3 for the data set 15, 13, 6, 5, 12, 50, 22, 18.
In addition to dividing the data set into four groups, quartiles can be used as a rough measurement of
variability. The interquartile range (IQR) is defined as the difference between Q1 and Q3 and is the range of
the middle 50% of the data.
Deciles divide the distribution into 10 groups, as shown. They are denoted by D1, D2, etc.

2020
Summary of Position Measures
LESSON 5.5
Probabilities and normal distributions
What is Normal?
Medical researchers have determined so-called normal intervals for a person’s blood
pressure, cholesterol, triglycerides, and the like. For example, the normal range of systolic
blood pressure is 110 to 140. The normal interval for a person’s triglycerides is from 30 to 200
milligrams per deciliter (mg/dl). By measuring these variables, a physician can determine if a
patient’s vital statistics are within the normal interval or if some type of treatment is needed to
correct a condition and avoid future illnesses. The question then is, how does one determine
the so-called normal intervals?
In this chapter, you will learn how researchers determine normal intervals for specific
medical tests by using a normal distribution. You will see how the same methods are used to
determine the lifetimes of batteries, the strength of ropes, and many other traits.
The normal distribution curve, can be used to study many variables that are not
perfectly normally distributed but are nevertheless approximately normal.
The mathematical equation for a normal distribution is
This equation may look formidable, but in applied statistics, tables or technology is used for
specific problems instead of the equation. Another important consideration in applied statistics
is that the area under a normal distribution curve is used more often than the values on the y
axis. Therefore, when a normal distribution is pictured, the y axis is sometimes omitted.

2020
The shape and position of a normal distribution curve depend on two parameters, the
mean and the standard deviation. Each normally distributed variable has its own normal
distribution curve, which depends on the values of the variable’s mean and standard
deviation. Figure (a) shows two normal distributions with the same mean values but different
standard deviations. The larger the standard deviation, the more dispersed, or spread out, the
distribution is. Figure (b) shows two normal distributions with the same standard deviation but
with different means. These curves have the same shapes but are located at different
positions on the x axis. Figure (c) shows two normal distributions with different means and
different standard deviations.

2020
Areas Under a Normal Distribution Curve
The Standard Normal Distribution
Since each normally distributed variable has its own mean and standard deviation, as
stated earlier, the shape and location of these curves will vary. In practical applications, then,
you would have to have a table of areas under the curve for each variable. To simplify this
situation, statisticians use what is called the standard normal distribution.
Finding Areas Under the Standard Normal Distribution Curve

2020
Finding the Area Value using Standard Normal Table
Table E for Standard Normal Distribution gives the area under the normal distribution curve to
the left of any z value given in two decimal places. For example, the area to the left of a z
value of 1.39 is found by looking up 1.3 in the left column and 0.09 in the top row. Where the
two lines meet gives an area of 0.9177. See Figure below

2020
Example 5.5.1
Find the area to the left of z =1.99.

2020
Example 5.5.2
Find the area to the right of z = - 1.16
Example 5.5.3
Find the area between z = 1.68 and z = - 1.37.
Application of the Normal Distribution
The standard normal distribution curve can be used to solve a wide variety of practical
problems. The only requirement is that the variable be normally or approximately normally
distributed. There are several mathematical tests to determine whether a variable is normally
distributed. For all the problems presented in this chapter, you can assume that the variable is
normally or approximately normally distributed.
To solve problems by using the standard normal distribution, transform the original
variable to a standard normal distribution variable by using the formula

2020
Example 5.5.4
A survey by the National Retail Federation found that women spend on average $146.21
for the Christmas holidays. Assume the standard deviation is $29.44. Find the percentage
of women who spend less than $160.00. Assume the variable is normally distributed.
Example 5.5.5
Each month, an American household generates an average of 28 pounds of newspaper for

garbage or recycling. Assume the standard deviation is 2 pounds. If a household is selected
at random, find the probability of its generating
a. Between 27 and 31 pounds per month

b. More than 30.2 pounds per month
Assume the variable is approximately normally distributed

2020

2020
LESSON 5.6
Linear regression and correlation, least squares line, linear correlation
coefficient
REGRESSION AND CORRELATION
Correlation - is a statistical method used to determine whether a relationship between

variables exists.
Regression - is a statistical method used to describe the nature of the relationship between
variables, that is, positive or negative, linear or nonlinear.
The purpose of this chapter is to answer these questions statistically;
1. Are two or more variables related?

2. If so, what is the strength of the relationship?
3. What type of relationship exists?
4. What kind of predictions can be made from the relationship?
To answer the first two questions, statisticians use a numerical measure to determine
whether two or more variables are related and to determine the strength of the relationship
between or among the variables. This measure is called a correlation coefficient. For
example, there are many variables that contribute to heart disease, among them lack of
exercise, smoking, heredity, age, stress, and diet. Of these variables, some are more
important than others; therefore, a physician who wants to help a patient must know which
factors are most important.
To answer the third question, you must ascertain what type of relationship exists.
There are two types of relationships: simple and multiple. In a simple relationship, there
are two variables—an independent variable, also called an explanatory variable or a
predictor variable, and a dependent variable, also called a response variable. A simple
relationship analysis is called simple regression, and there is one independent variable that
is used to predict the dependent variable. For example, a manager may wish to see whether
the number of years the salespeople have been working for the company has anything to
do with the amount of sales they make. This type of study involves a simple relationship,
since there are only two variables—years of experience and amount of sales
Simple relationships can also be positive or negative. A positive relationship exists

when both variables increase and decrease at the same time. While, negative relationship, as
one variable increases, the other variable decreases, and vice versa.
Finally, the fourth question asks what type of predictions can be made. Predictions are
made in all areas and daily. Examples include weather forecasting, stock market analyses,
sales predictions, crop predictions, gasoline price predictions, and sports predictions. Some
predictions are more accurate than others, due to the strength of the relationship. That is,
the stronger the relationship is between variables, the more accurate the prediction is.
The scatter plot is a visual way to describe the nature of the relationship between the
independent and dependent variables.

2020
Example 5.6.1

2020
Example 5.6.2:

2020
REGRESSION
In studying relationships between two variables, collect the data and then construct a
scatter plot. The purpose of the scatter plot, as indicated previously, is to determine the nature
of the relationship. The possibilities include a positive linear relationship, a negative linear
relationship, a curvilinear relationship, or no discernible relationship. After the scatter plot is
drawn, the next steps are to compute the value of the correlation coefficient and to test the
significance of the relationship. If the value of the correlation coefficient is significant, the next
step is to determine the equation of the regression line, which is the data’s line of best fit.
Note: Determining the regression line when r is not significant and then making predictions
using the regression line are meaningless. The purpose of the regression line is to enable the
researcher to see the trend and make predictions on the basis of the data.
Line of Best Fit
Figure 5.6.1 shows a scatter plot for the data of two variables. It shows that several
lines can be drawn on the graph near the points.
Given a scatter plot, you must be able to draw the line of best fit. Best fit means that
the sum of the squares of the vertical distances from each point to the line is at a minimum.
The reason you need a line of best fit is that the values of y will be predicted from the values
of x; hence, the closer the points are to the line, the better the fit and the prediction will be.
See Figure 5.6.2. When r is positive, the line slopes upward and to the right. When r is
negative, the line slopes downward from left to right.

2020
Figure 5.6.1 Figure 5.6.2
Determination of the Regression Line Equation
In algebra, the equation of a line is usually given as y = mx + b, where m is the slope

of the line and b is the y intercept. In statistics, the equation of the regression line is written as
y, = a + bx, where a is the y, intercept and b is the slope of the line. See Figure 5.6.3. There
are several methods for finding the equation of the regression line. Two formulas are given
here. These formulas use the same values that are used in computing the value of the
correlation coefficient. The mathematical development of these formulas is beyond the scope
of this book.
Figure 5.6.3

2020
Example 5.6.3:
Find the equation of the regression line for the data in Car Rental Companies, as shown
below, and graph the line on the scatter plot of the data.
Then plot the two points (15, 1.986) and (40, 4.639) and draw a line connecting the two points.
See Figure 5.6.4.
Figure 5.6.4

2020
Example 5.6.4:
START
ACTIVITY 5.1.1: Let’s Diagnose Your Knowledge
The following items talk about the gathering and organizing data that diagnose your
understanding related on representing data using graphs and charts and to interpret the data.
Read carefully each item and determine whether each statement is true or false.
1. In the construction of a frequency distribution, it is a good idea to have overlapping

class limits, such as 10–20, 20–30, 30–40.
2. Histograms can be drawn by using vertical or horizontal bars.
3. It is not important to keep the width of each class the same in a frequency distribution.
4. Frequency distributions can aid the researcher in drawing charts and graphs.
5. The type of graph used to represent data is determined by the type of data collected
and by the researcher’s purpose.
6. In construction of a frequency polygon, the class limits are used for the x axis.
7. Data collected over a period of time can be graphed by using a histogram.
8. Another name of ogive is frequency polygon.
9. In a frequency distribution table, it is customary to list the possible scores from highest
to lowest.
10. The larger the number of observations in a numerical data set, the smaller of class
intervals needed for grouped frequency distribution.
DISCOVER
ACTIVITY 5.1.2: organizing data INQUIRY

The following items talk about the gathering and organizing data that diagnose your
understanding related on representing data using graphs and charts and to interpret the data.
Read carefully each item and encircle the letter of your choice.
1. It deals largely with calculations and graphical displays to describe important features
of the data.

2020
A. Inferential Statistics C. Descriptive Statistics

B. Statistics D. Population
2. The number of absences per year that a worker has is an example of what type of data?
A. Nominal B. Qualitative C. Discrete D. Continuous
3. Agusan del Sur State College of Agriculture and Technology (ASSCAT) randomly
selected 250 teachers to find out which online platform is the most effective. 100
teachers chose Messenger, 85 selected Google Clasroom, 15 chose Edmodo, and 50
chose Zoom. ASSCAT concluded that all teachers prefer Messenger. What is the
population?
A. 250 teachers B. 100 teachers C. 15 teachers D. All teachers
4. Fifty bottles of water were randomly selected from a large collection of bottles in a
company’s warehouse. These fifty bottles are referred to as the
A. Parameter B. Population C. Sample D. Statistic
5. Which of the following is not a category that is used to distinguish between different
types of variable
A. Categorical B. Continuous C. Discrete D. Continuum
6. For the variable “Colour” of the respondents eyes, which has the available answers;
Brown, Blue, Green, and Other, how many levels of this variable would the researcher
identify for coding the data?
A. 3 B. 4 C. 6 D. 7
7. Marital status of faculty members in ASSCAT, classify as
A. Qualitative C. Quantitative
B. Both A and C D. None of the Above
8. The result of IQ test, classify as
A. Nominal B. Ordinal C. Interval D. Ratio
9. “Amount of fat (in grams) in cookies”. Which level of measurement is most appropriate
for this data?
10. Data that can be classified according to color are measured on what scale?
EXAMINE
ACTIVITY 5.2.1: ONE MORE TRY
The following items talk about the data management that diagnosed your understanding
related on measures of central tendency. Read carefully each item and encircle the letter of
your choice.
1. Which of the following statistics is not a measure of central tendency?

A. Mean B. Median C. Mode D. Variance
2. Which of the following is the easiest to compute?
3. The observation which occurs most frequently in a sample is
4. What is the median of the sample 0, 1, 2, 3, 4?
A. 1 B. 2 C. 3 D. 4
5. What is the mode of the sample 0, 0, 1, 2, 3, 4?
A. 1 B. 2 C. 3 D. 0

2020
ACTIVITY 5.2.2: We find!

The marks obtained out of 25 by 30 students of a class in the examination are given below.
20, 6, 23, 19, 9, 14, 15, 3, 1, 12, 10, 20, 13, 3, 17, 10, 11, 6, 21, 9, 6, 10, 9, 4, 5, 1, 5, 11, 7,
24
Find the (a.) mean, (b.) median, (c.) mode, and (d.) midrange.
EVALUATE
ACTIVITY 5.3.1: We solve!
1. The number of incidents in which police were needed for a sample of 10 schools in
Allegheny County is 7, 37, 3, 8, 48, 11, 6, 0, 10, 3.
a. Find the variance and standard deviation.

b. Are the data consistent or do they vary? Explain your answer.
2. In this data set: 10, 20, 30, 40, 50

a. Find the standard deviation.
b. Add 5 to each value, and then find the standard deviation.
c. Subtract 5 from each value and find the standard deviation.
d. Multiply each value by 5 and find the standard deviation.
e. Divide each value by 5 and find the standard deviation.
f. Generalize the results of parts b through e.

related on measures of relative position. Read carefully each item and determine whether
each statement is true or false.
1. Quartiles divide the data into four equal parts

2. In a set of numerical data, the value for the third quartile can never be smaller than
the value for the first quartile.
3. A percentile can be a decile, but a decile cannot be a quartile
4. The median is equivalent to second quartile
5. Percentiles divide the distribution into 10 equal parts
related on probabilities and normal distributions. Read carefully each item and determine
whether each statement is true or false.
1. The normal distribution is a probability distribution for discrete random variables.

2. Theoretically speaking, the total under the normal curve is always equal to 1.
3. Normal distribution is a round shape curve

2020
4. The normal distribution is symmetric bout the mean.
5. If the most of the fall to left then a normal curve is skewed to the left.

2020
ACTIVITY 5.5.2: probabilities and normal distribution INQUIRY

related on probabilities and normal distributions. Read carefully each item and encircle the
letter of your choice.
1. For a standard normal distribution, the value of mean is

A. 0 B. 1 C. 2 D. 3
2. For a standard normal distribution, the value of standard deviation is
A. 0 B. 1 C. 2 D. 3
3. In a normal curve, the ordinate is the highest at:
A. Mean B. Variance C. Standard deviation D. Quartile
4. The shape of the normal curve depends upon the value of:
A. Mean B. Variance C. Standard deviation D. Quartile
5. Which of the following is true for the normal curve:
A. Symmetrical B. Unimodel C. Bell-shaped D. All of the
above
related on linear regression and correlation. Read carefully each item and determine whether
each statement is true or false.
1. The dependent variable is the variable that is being described, predicted, or controlled.
2. A simple linear regression model is an equation that describes the straight-line
relationship between a dependent variable and an independent variable.
3. The residual is the difference between the observed value of the dependent variable and
the predicted value of the dependent variable.
4. When using simple regression analysis, if there is a strong correlation between the
independent and dependent variable, then we can conclude that an increase in the value
of the independent variable causes an increase in the value of the dependent variable.
5. In a simple linear regression model, the correlation coefficient not only indicates the
strength of the relationship between independent and dependent variable, but also shows
whether the relationship is positive or negative.
6. If r = -1, then we can conclude that there is a perfect relationship between X and Y.
7. The slope of the simple linear regression equation represents the average change in the
value of the dependent variable per unit change in the independent variable (X).
8. The least squares simple linear regression line minimizes the sum of the vertical
deviations between the line and the data points.
9. The notation refers to the average value of the dependent variable Y.
Yˆ
10. A significant positive correlation between X and Y implies that changes in X cause Y to
change.
11. The estimated simple linear regression equation minimizes the sum of the squared
deviations between each value of Y and the line.
Deadline of Submission: On or before November 25, 2021

2020

Lesson 5 MMW Maeb

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lesson 5 MMW Maeb

Uploaded by

Copyright:

Available Formats

LEARNING MODULE IN GE-4 (Mathematics in the modern worlD)

AGUSAN DEL SUR STATE COLLEGE OF AGRICULTURE AND TECHNOLOGY

Agusan del Sur State College of Agriculture and Technology

GE 4- MATHEMATICS IN THE MODERN WORLD

Faculty from the College of Arts and Sciences

COLLEGE OF ARTS AND SCIENCES

Students study statistics for several reasons:

1. What are the branches of statistics?

Descriptive and Inferential Statistics

To gain knowledge about seemingly haphazard situations, statisticians collect

COLLEGE OF ARTS AND SCIENCES

Random Variables - variables whose values are determined by chance.

In descriptive statistics the statistician tries to describe a situation. Consider the

The second area of statistics is called inferential statistics

It is important to distinguish between a sample and a population.

COLLEGE OF ARTS AND SCIENCES

2.2 Variables and Types of Data

Variables can be classified as qualitative or quantitative.

Continuous variables, by comparison, can assume an infinite number of values in an interval

The classification of variables can be summarized as follows:

In addition to being classified as qualitative or quantitative, variables can be classified

COLLEGE OF ARTS AND SCIENCES

Examples of Measurement Scales

COLLEGE OF ARTS AND SCIENCES

To describe situations, draw conclusions, or make inferences about events, the

5.1.1. Organizing Data

COLLEGE OF ARTS AND SCIENCES

5.1.2 Categorical Frequency Distributions

Construct a frequency distribution for the data.

COLLEGE OF ARTS AND SCIENCES

5.1.3 Grouped Frequency Distribution

The procedure for constructing the preceding frequency distribution is given in

Lower limit - 0.5 = 31 - 0.5 = 30.5 = lower boundary

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

5.1.4 Graphs and Charts

The three most commonly used graphs in research are

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

Construct an ogive for the frequency distribution described in Example 5.1.4.1.

COLLEGE OF ARTS AND SCIENCES

20, 26, 40, 36, 23, 42, 35, 24, 30

COLLEGE OF ARTS AND SCIENCES

Hence, the mean of the number of days off is 30.7

days. The Median

Hence, the median is 401 rooms.

The median number of tornadoes is 810.

COLLEGE OF ARTS AND SCIENCES

Summary of Measures of Central Tendency

COLLEGE OF ARTS AND SCIENCES

COLLEGE OF ARTS AND SCIENCES

Sample Variance and Standard Deviation

COLLEGE OF ARTS AND SCIENCES

11.2, 11.9, 12.0, 12.8, 13.4, 14.3

In addition to measures of central tendency and measures of variation, there are

COLLEGE OF ARTS AND SCIENCES

18, 15, 12, 6, 8, 2, 3, 5, 20, 10

COLLEGE OF ARTS AND SCIENCES