You are on page 1of 20

PREFACE

This book poses an important role in acquainting


one with the fundamentals of data analysis. It
contains comprehensive discussions of different
concepts in Statistics not only within the bounds of
the descriptive type, but as well as its inferential
counterpart. Towards the completion of one’s
encounter of this book, he is expected to gain
knowledge regarding the elements of data analysis;
sampling techniques; types, scales, and
measurements of variables; data collection and
presentation; and measures of central tendency,
variability, shape, location, and association.
Furthermore, he is also expected to advance
towards understanding the basics of inferential
statistics such as statistical estimation using
confidence interval; margin of error; normality
and the central limit theorem; and hypothesis
testing. Be sure to enjoy the journey of delving into

DATA
the wonderful world of data analysis as we try to
turn opinions into data-driven arguments!

Jensen Niño P. Baking, LPT

ANALYSIS
Author

Fundamentals of Data Analytics

Copyright © 2024. All rights reserved.


BOOK OUTLINE

Chapter 1: Introduction to Data Analysis


1.1 Definition and History of Statistics and Data Analysis
1.2 Importance of Statistics and Data Analysis in Other Fields
1.3 Elements of Statistics and the Data Analysis Process
1.4 Divisions of Statistics
1.5 Types of Descriptive and Inferential Statistics
Chapter 2: Data Collection and Presentation
2.1 Methods of Collecting and Presenting Data
2.2 Types of Frequency Distribution Tables
2.3 Types of Data and Sampling Techniques
2.4 Primary vs. Secondary Sources of Data
Chapter 3: The Variable
3.1 Qualitative vs. Quantitative Variables
3.2 Variables according to Scale of Measurement
3.3 Variables according to Continuity of Values
3.4 Independent and Dependent Variable
Chapter 4: Measures of Central Tendency
4.1 Summation Notation
4.2 Mean, Median, and Mode for Ungrouped Data
4.3 Mean, Median, and Mode for Grouped Data
Chapter 5: Measures of Location
5.1 Quantiles
5.2 Quantiles for Ungrouped Data
5.3 Quantiles for Groupe Data
Chapter 6: Measures of Dispersion
6.1 Range and Mean Absolute Deviation
6.2 Quartile Deviation and Interquartile Range
6.3 Variance and Standard Deviation
Chapter 7: Measures of Shape
7.1 Normal Distribution and the Central Limit Theorem
7.2 Skewness
7.3 Kurtosis
7.4 Other Types of Distribution
Chapter 8: Measures of Association
8.1 Pearson Product Moment Correlation
8.2 Spearman Rank Correlation Coefficient
8.3 Coefficient of Determination
8.4 Simple Linear Regression
Chapter 9: Hypothesis Testing
9.1 Statistical Estimation
9.2 Null and Alternative Hypothesis
9.3 Confidence Interval and Margin of Error
9.4 Hypothesis Testing using Tabular Values
9.5 Hypothesis Testing using P-Value Theorem
9.6 T-Test and Z-Test

1
Chapter 1: Introduction to Data Analysis

Learning Objectives

At the end of this chapter, the learners should be able to:


Define what is Data Analysis and Statistics
Explore the importance of Data Analysis and Statistics in various fields
Enumerate the different elements of Statistics and explain the Data Analysis
Process
Distinguish between the two divisions of Statistics
Enumerate types of Descriptive and Inferential Statistics

Lesson 1.1: Definition and History of Statistics and Data Analysis


“Statistics” originally means science of states, and in its early existence, it was
called “political arithmetic”. In Mathematics, Statistics refers to the study of the
collection, analysis, interpretation, presentation, and organization of data. In the general
sense, application of Statistics may involve treating data using statistical methods which
make the data useful to describe, relate, associate, and to make certain inferences.
Statistics is an art and science of gathering, analyzing, and making inferences from
data. It has been very useful in recording facts about people, objects, and events in making
predictions and decisions based from the available data.
The study of statistics is a very powerful tool in almost all fields of work. It can be
found in the field of research, education, business, politics, psychology, and even in a
simple event that needs analysis.
From the research point of view, Statistics is the science which deals with methods
in the collection, gathering, presentation, analysis, and interpretation of data. Data
gathering in that sense involves getting information through interviews, questionnaires,
objective observations, experimentations, tests, among others.
In this modern time, statistics plays a great role as a tool in gathering opinions
from a survey. These opinions usually influence different sectors in the society and help
them anticipate possible solutions or actions they have to face.
Data Analysis
Data Analysis is the practice of working with data to glean useful information,
which can then be used to make informed decisions. As Sherlock Holmes argued: "It is a
capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit
theories, instead of theories to suit facts." Data analysis is the process of collecting,
modeling, and analyzing data to extract insights that support decision-making. There are
several methods and techniques to perform analysis depending on the industry and the
aim of the investigation. All these various methods are largely based on two core areas:
quantitative and qualitative research.

2
Don’t get confused!
Statistics refers to the study or the science itself.
Example: My favorite subject is Statistics! (“S” is capitalized all the time
because Statistics is a proper noun.)
Statistic is a numerical data that describes the attributes of a sample (plural:
statistics).
Example: The statistics of the data set suggest a low variability among the
responses. (“S” is in lower case unless the word is the first in a sentence.)

Brief History of Statistics

The history of Statistics can be traced back at least to Biblical times in ancient Egypt,
Babylon, and Rome. Ancient people unknowingly used the discipline in accounting
taxes, inventorying agricultural crops, estimation of soldiers, among others.

In 3800 BC, Babylonian government used Statistics to measure the number of men
under a king’s rule and the vast territory that he occupied. In 3500 BC, Statistics has
been used in Egypt in the form of recording the number of cattle or sheep owned, the
amount of produced grain, and number of people living in a particular city. In 700 BC,
Roman empires used Statistics by conducting registration to record population for the
purpose of collecting taxes.

In the modern times, statistical methods have been used to record and predict such
things as birth and death rates, employment and inflation rates, education, sports and
achievements, politics and other economic and social trends. They have been used to
assess opinions from polls and unlock secret codes from a game of chance.

In the 16th century, it was developed as an inferential science which helped gamblers
learn the best techniques to have best chances of winning. Blaise Pascal, Gottfried
Leibniz, Pierre de Fermat, and James Bernoulli were among the first mathematicians
to study Statistics developing the theory of probability.

Karl Friedrich Gauss, the brilliant German mathematician used statistical methods in
making predictions about the positions of the planets in our solar system.

In the 1600’s, John Graunt, an English tradesman who collected published records
called “bills of mortality” that included information about the numbers and causes of
deaths in the city of London. Modern statistics is said to have begun with him.

3
In the 1700’s, Abraham de Moivre discovered the equation for the normal distribution
upon which many of the theories of Inferential Statistics were based from. Pierre
Simon-Laplace gained popularity for his application of Statistics to astronomy.

Not long after, Adolph Quetelet, a Belgian astronomer, developed the idea of the
“average man” from his studies of Belgian census. He was also known as “Father of
Modern Statistics” and made applications of Statistics in the field of psychology and
education, and was considered to be the first mathematician to demonstrate the use
of Statistics in the field of research.

Meanwhile, Francis Galton was considered to be the greatest contributor of Statistics


to social sciences for his application of the discipline to the field of heredity and
eugenics, and the discovery of percentiles. He was credited as one of the principal
founders of statistical theory. His contributions to the field included introducing the
concepts of standard deviation, correlation, regression and the application of these
methods to the study of the variety of humans.

Karl Pearson was also a notable contributor in the field of Statistics for his
development of the theory of correlation and regression, which paved the way for the
present theories of sampling and made important links between probability and
statistics.

At the beginning of the 20th century, William Gosset developed methods for decision-
making derived from smaller sets of data which helped Ronald Fisher to develop
Statistics for experimental designs which has been very useful in testing
improvements of production from agricultural experiments and improvement of
precision of results from medical, biological and industrial experimentation.

In this age of information technology, a lot of computer programs such as Microstat,


Soritec Sampler, Data Analysis, Stata, SPSS, AMOS, WarpPLS that perform more than
the manual calculations in statistics.

Lesson 1.2: Importance of Data Analysis and Statistics in Other Fields

Statistics and Data Analysis are said to have been very useful instruments in
understanding a set of data thereby providing avenues for informed decision-making. The
use of these sciences brings about drastic positive impact to community of professionals
who use data for a more data-driven decision-making process. Below are some of the
importance of Statistics and Data Analysis to other fields:

In Research:

The use of Statistics is very crucial in the field of research. It aids researchers in
presenting, summarizing, communicating, and interpreting data sets. It also allows
researchers to test different statistical inferences or research questions based on
statistically treated data.

4
It is practically helpful in determining whether the results are in support or in
disagreement to the hypotheses set which allows a researcher to conclude correctly.
Without Statistics, research will apparently be non-essential because we will be in
confusion with more open-ended conclusions and data which we cannot describe or
interpret.

In Education:

Statistics help educators to easily collect, classify, and tabulate numerical facts which
may pertain to the students and other academic-related observations which range
from test scores, grades, assessment and evaluation results, surveys, progress report,
among others.
It also helps in instances where educational research comes into play in order to
develop tools that will improve learning or to implement an experiment to make
necessary innovations in the educational system.
With Statistics, it is possible to make the learning and the teaching process efficient.
It offers remarkable influence in evaluation and measurement which helps in learning
and teaching. In the process, it will help with interpretation for better decisions.

In Business:

Businesses in almost every field use descriptive Statistics to gain a better


understanding of how their consumers behave. Best examples for these are the sales
chart or progress reports upon which most of the decisions in a company are based.
Another common way that Statistics is used in business is through data visualizations
such as line charts, histograms, boxplots, and pie charts. These types of charts are
often used to help a business spot trends.
Another use of Statistics to businesses is the assessment of consumer groups through
cluster analysis. This allows businesses to segment their target groups for a particular
product.

In Healthcare:

Statistical researches guide healthcare decision-makers with regards to the utility,


cost, and efficacy of medical goods and services.
Hospitals implement data-driven improvement programs to maximize efficiency,
whereas government health and human service agencies gauge the overall health and
well-being of populations with statistical information.
The healthcare industry extracts consumer market characteristics such as age, sex,
race, income, and disabilities. These "demographics" predict the types of services that
people need and the level of care that is affordable to them.
Health agencies use Statistics on service utilization to justify budget requests and
expenditures to their governing boards.

5
In Economics:

Statistics enables economists to monitor the current economic status of a country and
maintain its healthy state.
Statistical data pertaining to the status of the economy of a country may be used to
inform law-makers on the best methods to ensure that the economy grows healthy and
strong.
Another way that Statistics is used in economics is in the form of forecasting trends.
Using this forecast, the economist can predict (with a certain level of confidence) how
the economy is likely to perform in the coming months or years.

Lesson 1.3: Elements of Statistics and the Data Analysis Process

It is important that before venturing into any Statistics subject, one must always
understand the fundamental concepts involved with the subject. And by fundamental
concepts, we mean the various terminologies that serve as important elements of
Statistics. The following definition of terms are crucial in understanding Descriptive
Statistics:

1. Population – refers to a group of any aggregate of people or things, etc. It can be finite
(can be counted) or infinite (fluctuating ―such as world population).

2. Sample – refers to a group of people or things that is part of a population. Usually


referred to as the respondents.

3. Sampling techniques – are methods which help to identify which sample or


respondents to use.

4. Respondent – is a person who responds to questions for purposes related to surveys


or researches. Commonly known as informant (for interviews) or participant (for
experiments).

5. Enumerator – is a person responsible for the conduct of the collection of data.

6. Parameter – is a numerical value which describes the attributes or properties of a


population.

7. Statistic – is a numerical value which describes the attributes or properties of a


sample.

8. Variable – is an object of study. It can be people, characteristics, test scores, gender,


etc.

9. Qualitative – refers to a data that is not a numerical value, rather a categorical data
such as gender, birthplace, hair color, or skin tone.

6
10. Quantitative – refers to data that hold numerical values, such as monthly income,
age, height or weight, sales, even temperature.

The Data Analysis Process

As data becomes readily available to anyone, there is the need for an effective and
efficient process by which to harness the value of a set of data. The data analysis process
typically moves through several iterative phases. The process typically starts by
identifying the variables, and then collecting relevant data, cleaning the data, analyzing
it, and interpreting the results. Let’s take a closer look at each.

Identify Collect Clean Analyze Interpret

Steps in Data Analysis

1. Identifying the data needed

Before you get your hands dirty with data, you first need to identify why do you
need it in the first place. The identification is the stage in which you establish the
questions you will need to answer. For example, what is the customer's perception of our
brand? Or what type of packaging is more engaging to our potential customers? Once the
questions are outlined, you are ready for the next step.

2. Collecting the needed data

As its name suggests, this is the stage where you start collecting the needed data.
Here, you define which sources of information you will use and how you will use them.
The collection of data can come in different forms such as internal or external sources,
surveys, interviews, questionnaires, focus groups, among others. An important note here
is that the way you collect the information will be different in a quantitative and
qualitative scenario.

3. Cleaning the collected data

Once you have the necessary data it is time to clean it and leave it ready for analysis.
Not all the data you collect will be useful, when collecting big amounts of information in
different formats it is very likely that you will find yourself with duplicate or badly
formatted data. To avoid this, before you start working with your data you need to make

7
sure to erase any white spaces, duplicate records, or formatting errors. This way you avoid
hurting your analysis with incorrect data.

4. Analyzing the data

With the help of various techniques such as statistical analysis, regressions, neural
networks, text analysis, and more, you can start analyzing and manipulating your data to
extract relevant conclusions. At this stage, you find trends, correlations, variations, and
patterns that can help you answer the questions you first thought of in the identify stage.
Various technologies in the market assists researchers and average business users with
the management of their data. Some of them include business intelligence and
visualization software, predictive analytics, data mining, among others.

5. Interpreting the result

Last but not least you have one of the most important steps: it is time to interpret
your results. This stage is where the researcher comes up with courses of action based on
the findings. For example, here you would understand if your clients prefer packaging
that is red or green, plastic or paper, etc. Additionally, at this stage, you can also find some
limitations and work on them.

Here is an example of a research activity that explicitly showcases the different


steps of data analysis. Pay attention to the importance of following the correct flow of the
process.

Scenario: In his TLE class, Jacob was tasked to survey 150 students of his
university in order to know their average meal budget during lunchtime so that
they may use the data in planning the cost of their lunch products for their
project.
Identify: Jacob identified the data that is needed for his task. He needs to survey
150 students to know their average meal budget. Therefore, Jacob needs values
or amount that correspond to the individual budget of the students for their
lunch.
Collect: Jacob started surveying among the students that pass by the hallway
near their homeroom. After a while, he was able to survey 150 students.
Clean: After taking a good look into the data, Jacob noticed that two of the
students answered that they do not have a budget for lunch because they bring
home-packed lunch. Also, there are three students that responded with a very
high amount, almost equivalent to the budget of about seven students. Jacob
thought that these entries could largely affect the result of their survey, so he
decides to discard them.

8
And since the data analysis process typically moves through several
iterative phases, Jacob decides to replace the discarded responses
with new responses, thus, Jacob surveys again.
Analyze: After all entries are deemed useful for the study, using a software,
Jacob proceeds to calculate the mean of the entries to get the average amount the
150 students spend for their lunch meal.
Interpret: Once Jacob successfully obtained the value that he is looking for, he
then concludes that the production cost and selling price of their product should
not be set at a price that is far greater than the average they got to ensure that
students can afford the product.

Types of Data Analysis

Descriptive Analysis. Descriptive analysis tells us what happened. This type of analysis
helps describe or summarize quantitative data by presenting statistics. For example,
descriptive statistical analysis could show the distribution of sales across a group of
employees and the average sales figure per employee. Descriptive Analysis answers the
question, “what happened?”

Diagnostic Analysis. If the descriptive analysis determines the “what,” diagnostic


analysis determines the “why.” Let’s say a descriptive analysis shows an unusual influx of
patients in a hospital. Drilling into the data further might reveal that many of these
patients shared symptoms of a particular virus. This diagnostic analysis can help you
determine that an infectious agent led to the influx of patients. Diagnostic Analysis
answers the question, “why did it happen?”

Predictive Analysis. So far, we’ve looked at types of analysis that examine and draw
conclusions about the past. Predictive analytics uses data to form projections about the
future. Using predictive analysis, you might notice that a given product has had its best
sales during the months of September and October each year, leading you to predict a
similar high point during the upcoming year. Predictive Analysis answers the question,
“what might happen in the future?”

Prescriptive Analysis. Prescriptive analysis takes all the insights gathered from the
first three types of analysis and uses them to form recommendations for how a company
should act. Using our previous example, this type of analysis might suggest a market plan
to build on the success of the high sales months and harness new growth opportunities in
the slower months. Prescriptive Analysis answers the question, “what should we do about
it?”

9
Lesson 1.4: Division of Statistics

Statistics is largely divided into two types: descriptive and inferential.

Descriptive Statistics

Descriptive Statistics is concerned in the collection and presentation of data. From


the word itself, it aims to describe the data in summary. This type of statistics involves the
use of graphs and charts, and summary measures or averages to organize, represent, and
explain a set of data.

Data is typically arranged and displayed in tables or graphs summarizing details


such as histograms, pie charts, bars or scatter plots. Descriptive Statistics are just
descriptive and thus do not require normalization beyond the data collected.

For example, an entrepreneur wants to know the status of his sales for the
past three months so he can determine or pose a prediction on his sales for the
next two months. The entrepreneur may use charts or graphs to organize and
summarize his sales from the last three months. By doing so, he can descriptively
observe the trend on his sales and can predict his performance for the next three
months.

Inferential Statistics

Inferential Statistics is concerned in the analysis and interpretation of the data set.
It aims to arrive at a conclusion based on evidence or reasoning. In simpler words, this
type of statistics tries to interpret the meaning of the descriptive statistics. After the data
has been collected, summarized, and analyzed, we use inferential statistics to explain the
meaning of the collected data.

Inferential Statistics use the probability principle to assess whether trends contained
in the research sample can be generalized to the larger population from which the
sample originally comes.
Inferential Statistics are intended to test hypotheses and investigate relationships
between variables and can be used to make population predictions.
Inferential Statistics are used to draw conclusions and inferences, i.e., to make valid
generalizations from samples.

For example, a teacher wants to determine whether her gamified activities


in Mathematics helped in improving the scores of her students in Geometry. The
teacher may gather information such as the scores of the students before and
after the gamified activities were introduced and use inferential statistics to test
whether the activities affected the students’ scores in Geometry.

10
Lesson 1.5: Types of Descriptive and Inferential Statistics

Types of Descriptive Statistics

There are different types of Descriptive Statistics which you can use in describing
and presenting a set of data. Some of the commonly used Descriptive Statistics are the
following:

Frequency Distribution
Measures of Central Tendency
Measures of Dispersion/Variability
Measures of Shape
Measures of Location/Position
Measures of Association/Correlation
Ratios, Rates, Proportions, and Percentages

Frequency Distributions

Frequency distribution in Statistics represents the number of times a value or an


outcome repeats itself over a period of time during an event. It is featured either as a graph
or table. This representation helps data analysts assess the frequency of a value over an
interval during an instance and determine the possibility of such outcomes occurring
again under similar circumstances.

The figure above shows a classic frequency distribution that displays both the tally
marks obtained while the enumerator records the individual entries and the frequency
count opposite each tally mark. Notice that the observations are grouped in particular
intervals. This works by putting a tally mark (equivalent to 1 frequency count) whenever
an observation (or score) falls within a particular interval. This is particularly true for
grouped data. For ungrouped data, the same process applies except the observations are
not grouped, but rather listed one by one.

11
Measures of Central Tendency

A measure of central tendency is a single value that attempts to describe a set of data
by identifying the central position within that set of data. As such, measures of central
tendency are sometimes called measures of central location. They are also classed as
summary Statistics. The mean (often called the average) is most likely the measure of
central tendency that you are most familiar with, but there are others, such as the median
and the mode.

Mean refers to the arithmetic mean of a dataset which is the sum of all values
divided by the total number of values. It is the most commonly used measure of
central tendency because all values are used in the calculation.

Median of a dataset is the value that is exactly in the middle when it is ordered
from low to high. For odd-numbered data set, it is exactly the middle number when
the data set is arranged, but for an even-numbered data set, it is the average of the
two middle values

Mode is the most frequently occurring value in the dataset. It is possible to have
no mode, one mode, or more than one mode. To find the mode, sort your dataset
numerically or categorically and select the response that occurs most frequently.

The figure above shows how the measures of central tendency are described in a
normal distribution (symmetric) and skewed (asymmetric and distorted) distributions. It
can be observed that for a normal or symmetric distribution, the values of the three
measures of central tendency are (or almost) equal.

Measures of Variability or Dispersion

Variability refers to how spread scores are in a distribution out; that is, it refers to
the amount of spread of the scores around the mean. It tells you how scattered or
dispersed your scores are in a data set. Variability is commonly measured using range,
mean absolute deviation, variance, and standard deviation.

12
The figure above illustrates the different circumstances that can be suggested
following a particular measure of variability or dispersion. When the value is too high, it
usually means that the individual scores are more spread out, and that they deviate too
far from the supposed mean of the set (just like the dot plot in C). When the measure is
lower, it means that the individual scores are more clustered, and nearer to the supposed
mean (just like the dot plot in A).

Measures of Position or Location

Measures of position give a range where a certain percentage of the data fall. The
measures we consider here are percentiles and quartiles. The measures we consider here
are called quantiles – and they can be classified as quartiles (divides the set into 4 equal
portions), deciles (divides the set into 10 equal portions), or percentiles (divides the set
into 100 equal portions).

The figure above shows how quartiles divide a data set into 4 equal parts and
determine values that can correspond to each part. As shown above, the second quartile
or also the median is in the middle of 5 and 6 which is clearly 5.5. This means that 50% of
the distribution is located below 5.5 and the other 50% is located above it. Same is true
with the first or lower quartile which shows you the value from which the lower 25% and
upper 75% of the distribution is located, and the third or upper quartile from which the
lower 75% and upper 25% distribution is located.

13
Measures of Shape

Histograms can give you an illustration of the shape of the graph of the data, but
two numerical measures of shape give a more precise evaluation: skewness tells you the
amount and direction of skew (deviation from vertical symmetry; distortion), and
kurtosis tells you how tall and sharp the central peak is, relative to a standard bell curve.

The previous figure shows the two types of skewness. Skewness if defined as the
deviation of the distribution from the normal bell curve. When the data is normal, it
follows a bell curve and the skewness is zero. When the skewness is negative, it means the
distribution is negatively-skewed (or left-tailed). It indicates that the scores cluster on the
right side of the normal bell curve. When the skewness is positive, it means that the
distribution is positively-skewed (or right-tailed). It suggests that majority of the scores
cluster on the left side of the normal bell curve. For better understanding, a distribution
of age at death is negatively-skewed because majority dies old (greater age) and a
distribution of scores from a very difficult exam is positively skewed because many will
tend to score lower due to the difficulty of the exam.

14
The figure above shows the three types of kurtoses: leptokurtic, mesokurtic, and
platykurtic. When the distribution is leptokurtic, its curve is characterized by a tall peak
and thin, elongated tails which suggests that the probability of the distribution to contain
extreme outliers is high. A mesokurtic distribution is the normal distribution
characterized by a medium peak with a moderate probability of containing outliers in the
data set. A platykurtic distribution is one with a lower or less prominent peak which
suggests that there is a very low likelihood that extreme outliers are in the dataset.

Measures of Association

The most commonly used techniques for investigating the association (or more
commonly relationship) between variables are correlation and regression. Correlation
quantifies the strength of the linear relationship between a pair of variables, whereas
regression expresses the relationship in the form of an equation that can allow you to
predict a resulting dependent variable using an independent variable. There are several
tests of relationship depending on the nature of data. When you have a pair of quantitative
data, you can use Pearson’s correlation and if you have a pair of variables wherein one of
them is qualitative, or even if they are both qualitative, then you may use Spearman’s
Rank correlation.

The figure above shows how different types of correlations can be illustrated. When
the correlation coefficient is negative, this means that as one variable goes up, the other
variable goes down (and vice versa). When it is positive, both variables go the same
direction (either they go up or down simultaneously). A correlation coefficient of –1 or 1
is a perfect correlation which depicts a straight line. When the coefficient is zero, then
there is no correlation between the variables. A coefficient of –0.5 or 0.5 suggests a
moderate correlation.

15
Below is a table that can be used in interpreting the correlation both for the
Pearson’s and Spearman’s Rank correlation. Note that the values may differ according to
different sources, but nevertheless, the values are more or less very close to each other.

Interpretation Table for Pearson’s and Spearman’s Rank Correlation Coefficient

Ratios, Rates, Proportions, and Percentages


Ratios, rates, proportions, and percentages are useful tools in comparing numbers
or values with respect to other values. They can help people to immediately provide a
description of a set of data by merely calculating values such as quotients and percentage.
Ratio: a relationship between two numbers, expressed as a quotient with the same
unit in the denominator and the numerator. There are three ways to write a ratio.
Rate: a ratio of two quantities with different units.
Proportion: an equation with a ratio (or rate) on two sides in which the two ratios
are equal.
Percentage: a fraction whose denominator is 100.

Types of Inferential Statistics

The following types of Inferential Statistics may be used in answering inferences


that are associated with your data. There are more advanced types of inferential statistics
but below are the most commonly used in basic data analysis:

Test of Difference
Parametric Test (z-test, t-test, f-test)
Non-parametric Test (McNemar test, Median test, Kruskall-Wallis test, Chi-
square test)

Test of Relationship
Parametric Test (Pearson’s Rho coefficient, regression)
Non-parametric Test (Phi-coefficient, Chi-square test, Kendall’s Tau coefficient)

16
Exercises

Do as indicated. Provide answers to questions on the blanks provided and supply


examples whenever necessary. Write legibly and answer in a precise yet comprehensive
manner.
1. Reflect on your daily activities and find some connection between these activities and
Statistics/Data Analysis. How do you think these sciences help you in your daily living?
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________
____________________________________________________________

2. Below are the great men of Statistics that contributed a lot to the development of the
subject. Review the discussion regarding its history and try to summarize their
contributions using the given diagram.

Karl Gauss

John Graunt

Abrahan de Moivre

Pierre Laplace

Adolph Quetelet

Francis Galton

Karl Pearson

William Gosset

Ronald Fisher

17
3. Refer to the discussion regarding the elements of Statistics. Identify what element is
being referred to in the following statements/questions.
______________a. Taylor is conducting a survey among the engineering students of
Ateneo de Manila. Using cluster sampling, she was able to obtain
three sections as her sample. What do you call the group of
engineering students of Ateneo de Manila?
______________b. In attempt to have an estimate of how much a street vendor earns
in a day, Adele decided to survey among 100 street vendors in
Quiapo. What do you call the group of those 100 street vendors?
______________c. Mariah interviewed Beyonce for her research paper. What role did
Beyonce played in this scenario?
______________d. Miley was tasked to conduct the study of their group. What role is
Miley playing in the process of administering the data collection?
______________e. In her study, Cher revealed that after surveying all singers in their
company, she found out that the average monthly income of a
singer in the company is 1000 dollars. The recorded average of
1000 dollars is a value that represents what?
______________f. When Cher decided to survey only the female singers, the average
rose to almost 1080 dollars. In this case, 1080 dollars is a value
that represents what?
______________g. What type of data should be given if one asks you for your favorite
color?
______________h. This can refer to a number or a name, or even your hometown.
______________i. These are methods in selecting your desired respondent,
participant, or informant.
______________j. If a person asked you about your body mass index, what type of
data does that person ask you?
4. Provide scenarios that depict the 4 types of Data Analysis.
Descriptive Analysis.
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________

18
Diagnostic Analysis.
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________

Predictive Analysis.
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________

Prescriptive Analysis.
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________
_______________________________________________________

5. Create your own scenario that showcases the Data Analysis Process.

19

You might also like