Statistics for Informed Decision-Making
Statistics for Informed Decision-Making
City of Olongapo
GORDON COLLEGE
COLLEGE OF EDUCATION, ARTS AND SCIENCES
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
Program:
Course Code: GEC 04
Course Title: Mathematics in the Modern World
Course Description:
This course deals with nature of mathematics, appreciation of its practical, intellectual, and aesthetic
dimensions, and application of mathematical tools in daily life.
This course begins with an introduction to the nature of mathematics as an exploration of patterns (in
nature and the environment) and as an application of inductive and deductive reasoning. By exploring
these topics, students are encouraged to go beyond the typical understanding of mathematics as merely a
set of formulas but as a source of aesthetics in patterns of nature, for example, and a rich language in
itself (and science) governed by logic and reasoning.
The course then proceeds to survey ways in which mathematics provides a tool for understanding and
dealing with various aspects of present-day living, such as managing personal finances, making social
choices, appreciating geometric designs, understanding codes used in data transmission and security, and
dividing limited resources fairly. These aspects will provide opportunities for actually doing mathematics in
a board range of exercises that bring out the various dimensions of mathematics as a way of knowing, and
test the students’ understanding and capacity. (CMO No. 20, series of 2013)
• Statistical Concepts
• Sampling Method and Sources of Data
• Visualizing Data
• Organizing Data
• Descriptive measurements of Ungrouped Data and Grouped Data
• Correlation and Regression Analysis
STATISTICAL CONCEPTS
TWO FIELDS OF STATISTICS
Inferential Statistics - they're usually (but not always) the next move once you've compiled and
summarized results. Inferential statistics are used to draw inferences about a possibly larger set of data
(population) based on a smaller group of data (sample size).
The sampling process, data processing, and decision-making for the entire population are all part by the
flow of using inferential statistics.
Inferential statistics consists of generalizing from samples to populations, performing
estimations and hypothesis tests, determining relationships among variables, and making
predictions.
Examples:
1. According to the recent poll, 43% of Filipinos brush their teeth incorrectly.
2. Based from a study, smoking increases the risk of lung cancer.
3. A prediction has been made that the chance that a person will be robbed in a certain city is 15%.
4. The chances of you getting a new car is about the same as passing your math class.
5. Using this product will burn 74% more calories.
Population - is the whole group about which you'd like to draw conclusions.
Sample - is a subset of the population from which you can gather data. The
sample size is always less than the population's overall size.
A population in research does not necessarily refer to people. It may refer to a
collection of elements from whatever you're studying, such as objects, activities,
organizations, nations, plants, organisms, and so on. (Population vs Sample |
Definitions, Differences & Examples, 2020)
Example2: 25% of the adult population of the Philippines has an allergy. A total of 22.3 percent of 1080
randomly selected adults reported having an allergy.
Population: All Filipino adults.
Sample: 1080 randomly selected Filipino adults.
Example: A nutritionist wants to estimate how much sodium children under the age of 10 eat. The
nutritionist obtains a sample mean of 2899 milligrams of sodium ingested from a random sample of 85
children under the age of ten.
Parameter: The mean amount of sodium consumed by all children under the age of 10.
Statistic: The mean of 2899 milligrams of sodium obtained from a random sample of 85 children under
the age of 10.
Example 2: A researcher wants to approximate the average farm size in Philippines. From a simple random
sample of 60 farms, the researcher obtains a sample mean farm size of 845 acres.
Parameter: The average farm size in the Philippines.
Statistic: The mean farm size of 845 acres from the sample of 60 farms in the Philippines.
DATA SET
ELEMENT
OBSERVATIONS
All the values for an individual item in the sample. (Each individual measurement)
Example: Weights of students: x1 = 110lbs, x2 = 123lbs, x3 = 115lbs, etc.
VARIABLE
Characteristics:
1. A variable is an attribute that describes a person, place, thing, or idea.
2. The value of the variable can "vary" from one entity to another.
Variable is a picture, opinion, a concept that is measured with a certain scale whose value can
change. A thing or concept that can be measured can be called a variable. (“Variable Definition
in Statistics,” 2020)
The concept is a picture or perception that can be different from one person to another. Concepts are
often difficult to define or measure and are subjective. While, a variable is something that can be
measured even with a variety of sizes and produces accuracy that is comparative.
Examples of concepts are effectiveness, happiness, wealth, etc. My question is, how do way more effective
in measuring these things? Then, we can change the concept into variables. For example, effectiveness
is transformed into an index of effectiveness, happiness is changed to an index of happiness, wealth is
changed to an income level.
Variable
Concept Indicator
Variable Classification Decision
Working hours less than 1 If working hours less than 1
Unemployment Work-hours
hour in one week hour in one week
If net-income per month below
Poverty Income Net-income per month
the poverty line
If yearly asset lower than
Net Asset Yearly Asset
P250,000.00
Micro-business
If yearly income lower than
Income Yearly Income
P1,500,000.00
(“Variable Definition in Statistics,” 2020)
TYPES OF VARIABLE
In general, there are 3 classifications of variable:
1. Based on cause-and-effect relationships
2. Based on the study design
3. Based on the measurement scale
BASED ON CAUSE-AND-EFFECT
Suppose, you will examine poverty or criminality. You can use the following model in formulating the
variables that you use.
Figure 4. https://worldsustainable.org/variable-definition-in-statistics/
Based on the measurement scale, there are two types of variable classification:
1. Qualitative variable - these are variable that are considered non-numeric by nature. It takes on values
that are names or labels.
Example: The color of a ball (e.g., red, green, blue) or the breed of a dog (e.g., Labrador, shepherd,
terrier) would be examples of qualitative or categorical variables.
- gender, eye color, political preferences, religion, blood type, civil status, year level, course, profession,
socioeconomic status etc.
2. Quantitative variable - these are the variable that can be expressed numerically. They represent a
measurable quantity.
Example: When we speak of the population of a city, we are talking about the number of people in the
city - a measurable attribute of the city. Therefore, population would be a quantitative variable.
- age in years, height in cm., grade in statistics, number of children, speed etc.
2. Continuous Variable - it is a quantitative variable that can take an infinite number of values and may
not be measured accurately. We may think of continuous variable as one whose possible values have no
break or gap.
- temperature, distance, area, age, height, weight, and time
CLASSIFICATION OF VARIAVLES
(Based on Nature of Data)
VARIABLE
(DATA)
NUMERICAL CATEGORICAL
Scales of Measurement
The level of measurement of the variable determines the algebraic operations that can be performed and
the statistical tools that can be applied to the set of variables.
1. Nominal Level – variables in this level are sometimes referred to as CATEGORICAL Variable. If the
variable can be categorized into two or more categories and cannot be arranged in an ordering scheme,
then the variable is said to be a NOMINAL.
Example: Gender, Religion, political Affiliation, Civil Status, race, name, eye color, type of profession,
classification of people (introvert or extrovert), Score (pass or failed).
2. Ordinal Level – variables that can be arranged in an ordering scheme. However, differences between
variables either cannot be determined or are meaningless because ordinal level implies only DEGREE OF
DIFFERENCES and does not imply measurable AMOUNTS OF DIFFERENCES. Addition, subtraction,
multiplication, and division of values are meaningless in this level.
Example: Product satisfaction, Professional rank, academic awards, pain level, military rank.
3. Interval Level – variables in this level take all the properties of ordinal variables and can be quantified
and compared. Interval variables have no inherent (natural) zero starting point. Addition and subtraction
have meaning, while multiplication and division have a doubtful meaning in this level.
Example: Temperature (in degree Celsius or Fahrenheit), standardized exam, calendar time.
4. Ratio Level – Variable in this level take all the properties of interval level with identifiable absolute zero
point. In this level, addition, subtraction, multiplication and division of values are meaningful.
Example: Height, Age, Weight, Salary, Temperature in Kelvin
Learning Task
Knowledge Check
Census Example: The Filipino population census or socio-economic survey of the whole community by a
college planning forum.
Sample Example: National sample survey enquiries: Paid survey online, National Status of Children of
Indigenous People Survey, etc.
SAMPLING TECHNIQUES
SAMPLING - The process by which members of a population are selected for a sample.
Sampling Advantages
Types of Sampling
1. Probability Sampling - involves a random selection process that allows you to draw strong statistical
conclusions about the entire group.
- This can be done in one of two ways: the lottery or random number method.
o In the lottery method, you choose the sample at random either by “drawing from a hat”
or by using a computer program that simulates the same operation.
o In the random number method, you assign a number to each individual. You then
choose a subset of the population at random using a random number generator or random
number tables. To generate random numbers, you can use Microsoft Excel's random
number function (RAND).
• Stratified Sampling - Researchers divide a population into consistent subpopulations called strata
(plural of stratum) based on its specific characteristics (e.g., race, gender, location, age,
departments, etc.). That means every member of the population can be clearly classified into
exactly one subgroup. In every group there must have a representative to be part of the survey
respondents.
- Using proportion was suggested.
• Cluster Sampling - is a technique in which researchers divide a population into smaller groups
called clusters. They then generate a sample by selecting clusters at random. This method is
frequently used to study large populations, particularly those that are geographically dispersed
(scattered).
o Single Stage Cluster Sampling - Entire cluster is selected randomly for sampling.
o Two Stage Cluster Sampling - We randomly select clusters and then from those
selected clusters we randomly select elements for sampling.
• Convenience Sampling - A convenience sample is made up of people who are most easily
available to the researcher.
- Example: You're doing research into student support systems at your college, so you ask your
classmates to complete a survey on the subject after each of your classes. This is a simple way to
collect data, but the study is not representative of all students at your college since you only
surveyed students who were taking the same classes as you at the same level.
• Purposive Sampling - This method of sampling, also known as judgment sampling, entails
the researcher using their knowledge to choose a sample that is most useful to the research's
goals. It's often used in qualitative research, particularly when the researcher wants to learn more
about a particular phenomenon rather than making statistical inferences, or when the population
is small and specific. In order to be accurate, a purposive sample must have clear inclusion
requirements and rationale.
- Example. You want to know more about disabled students' perspectives and experiences at your
university, so you purposefully pick a group of students with varying support needs in order to
collect a varied range of data on their experiences with student services.
“Serving the community with a culture of excellence”
NOT FOR SALE. EXCLUSIVE FOR GORDON COLLEGE ONLY
Dante P. Sardina info.ceas@gordoncollege.edu.ph www.gordoncollege.edu.ph
Republic of the Philippines
City of Olongapo
GORDON COLLEGE
COLLEGE OF EDUCATION, ARTS AND SCIENCES
Olongapo City Sports Complex, Donor St., East Tapinac, Olongapo City
• Snowball Sampling - Snowball sampling can be used to recruit participants via other participants
if the population is difficult to access. The number of people you have access to “snowballs” as you
get in contact with more people.
- Example. You're doing research into homelessness in your area. Probability sampling is impossible
since there is no list of all homeless people in the area. You meet one person who decides to take
part in the study, and she connects you with other homeless people living in the area who she
knows.
(Sampling Methods | Types and Techniques Explained, 2019)
SOURCES OF DATA
1. Primary Data - First-hand knowledge gathered by the surveyor, as the name implies. The information
gathered is pure and original, and it was gathered for a particular reason. They've never been subjected
to any kind of statistical analysis before. The information gathered could also be made public. The Census
is an example of primary data. (www.toppr.com)
- The data is mostly collected through observations, physical testing, mailed questionnaires, surveys,
personal interviews, telephonic interviews, case studies, and focus groups, etc. (Difference
between Primary Data and Secondary Data, n.d.)
2. Secondary Data - Secondary data are opposite to primary data. They are collected and published
already (by some organization, for instance). They can be used as a source of data and used by surveyors
to collect data from and conduct the analysis. Secondary data are impure in the sense that they have
undergone statistical treatment at least once. (www.toppr.com)
- It is accessible in the form of data collected from different sources such as government publications,
censuses, internal records of the organization, books, journal articles, websites and reports, etc.
(Difference between Primary Data and Secondary Data, n.d.)
The differences between the primary and secondary data are represented in a comparison format as
follows:
Definition
Primary data are those that are collected for the Secondary data refer to those data that have
first time. already been collected by some other person.
Originality
These are original because these are collected by These are not original because someone else has
the investigator for the first time. collected these for his own purpose.
Nature of Data
These are in the form of raw materials. These are in the finished form.
These are more reliable and suitable for the These are less reliable and less suitable as
enquiry because these are collected for a someone else has collected the data which may
particular purpose. not perfectly match our purpose.
Collecting primary data is quite expensive both in Secondary data requires less time and money;
the terms of time and money. hence it is economical.
No particular precaution or editing is required Both precaution and editing are essential as
while using the primary data as these were secondary data were collected by someone else
collected with a definite purpose. for his own purpose.
VISUALIZING DATA
The graphical representation of information and data is known as data visualization. Data visualization tools
are useful to see and understand trends, outliers, and patterns in data by using visual elements including
charts, graphs, and maps.
1. Bar Chart - illustrates a categorical variable as a series of bars, each bar indicating the amounts for a
particular category. Bar graphs measure the frequency of categorical data. A categorical variable is one
that has two or more categories, such as gender or hair color. Figures 6 and 10.
2. Pie Chart - uses parts of a circle to represent the tallies of each category. Figure 5.
3. Pareto Chart - On the same chart, the tallies for each group are plotted as vertical bars in descending
order according to their frequencies, and a cumulative percentage line is combined with them. Figure 8.
4. Side-by-side Chart - A side-by-side bar chart uses sets of bars to show the joint responses from two
categorical variables. Figures 7 and 9.
(Rajasekaran, 2018)
1. Stem-and-Leaf Plot – breaks each value of a quantitative data set into two pieces: a stem, typically
for the highest place value, and a leaf for the other place values. It provides a way to list all data values in
a compact form.
2. Histogram – This type of graph is used with quantitative data. Ranges of values, called classes, are
listed at the bottom, and the classes with greater frequencies have taller bars. Are used for data that
involves ordinal variables, or things that are not easily quantified, like feelings or opinions.
3. Ogive – The word Ogive is a term used in architecture to describe curves or curved shapes. Ogives are
graphs that are used to estimate how many numbers lie below or above a particular variable or value in
data. The Ogive is defined as the frequency distribution graph of a series. The Ogive is a graph of a
cumulative distribution, which explains data values on the horizontal plane axis and either the cumulative
relative frequencies, the cumulative frequencies or cumulative per cent frequencies on the vertical axis.
(Ogive (Cumulative Frequency Curve) - Definition and Its Types, n.d.)
4. Dot plot - A dot plot is a hybrid between a histogram and a stem and leaf plot. Each quantitative data
value becomes a dot or point that is placed above the appropriate class values. Where histograms use
rectangles—or bars—these graphs use dots, which are then joined together with a simple line, says
statisticshowto.com.
5. Scatter Plot – A scatter plot (also called a scatterplot, scatter graph, scatter chart, scattergram, or
scatter diagram) is a type of plot or mathematical diagram using Cartesian coordinates (x-y coordinates)
to display values for typically two variables for a set of data according to Wikipedia.com.
6. Time series plot – According to whatis.techtarget.com, Time series graph illustrates data points at
successive intervals of time. Each point on the chart corresponds to both a time and a quantity that is being
measured.
7. Frequency polygons – are analogous to line graphs, and just as line graphs make continuous data
visually easy to interpret, so too do frequency polygons.
Figure 12. Stem and leaf Plot Figure 11. Dot Plot
For your enrichment: Watch online videos in creating graphs or charts using excel. Sample online videos:
• Excel 2010 statistics 07: Charts basics: pie, column, bar, line and x-y scatter. (n.d.). Retrieved April
10, 2021, from https://www.youtube.com/watch?v=-btUxQi76qI
• Excel charts & graphs: Learn the basics for a quick start. (n.d.). Retrieved April 10, 2021, from
https://www.youtube.com/watch?v=DAU0qqh_I-A
ORGANIZING DATA
ORGANIZING CATEGORICAL DATA
CATEGORICAL DATA
TALLYING DATA
1. Summary Table - A summary table tallies the frequencies or percentages of items in a set of categories
so that you can see differences between categories.
Example1:
Example 2:
Example 3:
Doctorate 8 14.55%
55 100%
Example 1:
GENDER
CLASSIC ICE CREAM Total
MALE FEMALE
Vanilla 13 20 33
Strawberry 10 17 27
Chocolate 25 14 39
Total 48 51 99
Example 2:
Contingency Table Displaying Ice Cream and Gender based Percentage of Grand Total
GENDER
CLASSIC ICE CREAM Total
MALE FEMALE
𝑪𝒆𝒍𝒍 𝑽𝒂𝒍𝒖𝒆
• 𝒙𝟏𝟎𝟎%
𝑮𝒓𝒂𝒏𝒅 𝑻𝒐𝒕𝒂𝒍
𝟏𝟑
• 𝒙𝟏𝟎𝟎% = 𝟏𝟑. 𝟏𝟑%
𝟗𝟗
𝟐𝟎
• 𝒙𝟏𝟎𝟎% = 𝟐𝟎. 𝟐𝟎%
𝟗𝟗
𝟏𝟎
• 𝒙𝟏𝟎𝟎% = 𝟏𝟎. 𝟏𝟎%
𝟗𝟗
And so on.
Example 3:
Contingency Table Displaying Ice Cream and Gender based Percentage of Row Total
GENDER
CLASSIC ICE CREAM Total
MALE FEMALE
𝑪𝒆𝒍𝒍 𝑽𝒂𝒍𝒖𝒆
• 𝒙𝟏𝟎𝟎%
𝑹𝒐𝒘 𝑻𝒐𝒕𝒂𝒍
𝟏𝟑
• 𝒙𝟏𝟎𝟎% = 𝟑𝟗. 𝟑𝟗%
𝟑𝟑
𝟐𝟎
• 𝒙𝟏𝟎𝟎% = 𝟔𝟎. 𝟔𝟎%
𝟑𝟑
𝟏𝟎
• 𝒙𝟏𝟎𝟎% = 𝟑𝟕. 𝟎𝟒%
𝟐𝟕
And so on.
Example 4:
Contingency Table Displaying Ice Cream and Gender based Percentage of Column Total
GENDER
CLASSIC ICE CREAM Total
MALE FEMALE
𝑪𝒆𝒍𝒍 𝑽𝒂𝒍𝒖𝒆
• 𝒙𝟏𝟎𝟎%
𝑪𝒐𝒍𝒖𝒎𝒏 𝑻𝒐𝒕𝒂𝒍
𝟏𝟑
• 𝒙𝟏𝟎𝟎% = 𝟐𝟕. 𝟎𝟖%
𝟒𝟖
𝟐𝟎
• 𝒙𝟏𝟎𝟎% = 𝟑𝟗. 𝟐𝟐%
𝟓𝟏
𝟏𝟎
• 𝒙𝟏𝟎𝟎% = 𝟐𝟎. 𝟖𝟑%
𝟒𝟖
And so on.
NUMERICAL DATA
FREQUENCY CUMULATIVE
ORDERED ARRAY
DISTRIBUTION DISTRIBUTION
1. Ordered Array - An ordered array's elements are arranged in ascending (or descending) order. An
ordered array may have duplicate elements in general.
12 13 17 17 18 23 25 25 33 33 33 45
90 90 90 87 85 83 83 79 75 75 75 70
• Discrete Frequency Distribution - If the data contains a large number of items, it is preferable
to create a frequency array and condense the data further. The frequency array is created by listing
all of the values that occur in the series once and then noting the number of times each value
appears. Discrete frequency distribution, also known as ungrouped frequency distribution,
is the term for this form of distribution.
Example 1:
Example 2:
Example 3:
Cumulative Frequency Distribution Table of Number of Children per Family of 25 Selected Families
Example 4:
Percentage Frequency Distribution Table of Number of Children per Family of 25 Selected Families
Example 1:
Frequency Distribution Table of Midterm Exam Scores of Selected 65 Students in Statistics Class
Scores Frequency
25 - 34 17
35 - 44 23
45 - 54 20
55 - 64 5
Total 65
Example 2:
Cumulative Frequency Distribution Table of Midterm Exam Scores of Selected 65 Students in Statistics Class
Scores Frequency Cumulative Frequency
25 - 34 17 17
35 - 44 23 40
45 - 54 20 60
55 - 64 5 65
Total 65
Example 3:
Cumulative Frequency Distribution Table of Midterm Exam Scores of Selected 65 Students in Statistics Class
Less Than Cumulative Greater Than Cumulative
Scores
Frequency Frequency
25 - 34 17 65
35 - 44 40 48
45 - 54 60 25
55 - 64 65 5
Example 4:
Cumulative Frequency Distribution Table of Midterm Exam Scores of Selected 65 Students in Statistics Class
Scores Cumulative Frequency
25 - 34 17
35 - 44 40
45 - 54 60
55 - 64 65
Example 5:
Frequency and Percentage Distribution Table of Midterm Exam Scores of Selected 65 Students in Statistics
Class
Scores Frequency Percentage
25 - 34 17 26.15%
35 - 44 23 35.38%
45 - 54 20 30.77%
55 - 64 5 7.69%
Total 65 100%
Example 6:
2.0 – 2.4 5 5
2.5 – 2.9 10 15
3.0 – 3.4 15 30
3.5 – 3.9 20 50
Total 50
3.5 – 3.9 20 50
3.0 – 3.4 15 30
2.5 – 2.9 10 15
2.0 – 2.4 5 5
Total 50
Note: The < CF of Ascending order is the <CF of Descending Order (inverse)
Total 50 100%
3.0 – 3.4 15 30 35
2.5 – 2.9 10 15 45
2.0 – 2.4 5 5 50
Total 50
Learning modules
Communication apps (zoom, google meet, messenger, etc.)
PowerPoint Presentation
Electronic Pen-Pad
References:
Population vs sample | definitions, differences & examples. (2020, May 14). Scribbr.
https://www.scribbr.com/methodology/population-vs-sample/
Variable definition in statistics: Read this! (2020, April 12). World Sustainable.
https://worldsustainable.org/variable-definition-in-statistics/
Sampling methods | types and techniques explained. (2019, September 19). Scribbr.
https://www.scribbr.com/methodology/sampling-methods/
https://www.toppr.com/guides/economics/collection-of-data/source-and-
collection/#:~:text=There%20are%20two%20sources%20of%20data%20in%20Statistics.,sector.%20Br
owse%20more%20Topics%20under%20Collection%20Of%20Data
Difference between primary data and secondary data. (n.d.). BYJUS. Retrieved April 9, 2021, from
https://byjus.com/commerce/difference-between-primary-data-and-secondary-data/
Pareto Chart with example | 3 real-life Pareto chart examples with explanation | Pareto chart application in
textile. (n.d.). Retrieved April 9, 2021, from https://textilemirror60.blogspot.com/2020/09/pareto-chart-
with-example-how-to-read.html
Ogive (Cumulative frequency curve)—Definition and its types. (n.d.). BYJUS. Retrieved April 10, 2021, from
https://byjus.com/maths/ogive/
OpenStaxCollege. (2013). Histograms, frequency polygons, and time series graphs. In Introductory
Statistics. http://pressbooks-dev.oer.hawaii.edu/introductorystatistics/chapter/histograms-frequency-
polygons-and-time-series-graphs/
Stephanie. (2013, July 31). Contingency Table: What is it used for? Statistics How To.
https://www.statisticshowto.com/what-is-a-contingency-table/