You are on page 1of 47

C H A P TER I I

ANALYSIS AND INTERPRETATION


OF DATA
ANALYSIS AND INTERPRETATION OF DATA
• A. Descriptive Statistics
• B. Non-Parametric Test
• C. Chi- Square
• D. T-test
• E. Correlational Analysis
• F. Analysis of Variance
• G. Factor Analysis
• H. Other Appropriate Statistical Method
• DATA

• ANALYSIS

• INTERPRETATION
• DATA any information that has been collected, observed,
generated or created to validate original research
findings.

• ANALYSIS means the ordering, manipulating, and


summarizing of data to obtain answers to research
questions.

• INTERPRETATION gives the results of analysis,


makes inferences pertinent to the research relations
studied, and draws conclusions about these relations.
DATA ANALYSIS AND
INTERPRETATION is the next stage after
collecting data from empirical methods. The
dividing line between the analysis of data and
interpretation is difficult to draw as the two
processes are symbolic and merge
imperceptibly. Interpretation is inextricably
interwoven with analysis.
What is Data Analysis?
• Data analysis is described “as the process of
bringing order, structure, and meaning” to the
collected data. The data analysis aims to discover
patterns or regularities by observing, exploring,
organizing, transforming, and modeling the
collected data.
Five types of Data Analysis:

• Descriptive Analysis
• Diagnostic Analysis
• Predictive Analysis
• Prescriptive Analysis
• Cognitive Analysis
1. Descriptive Analysis: What has happened?
The foundation step simply looks at the past data and tells what has
happened in the past. This analysis helps understand how the data is
present and does not make any predictions or answers why something
has happened.

2. Diagnostic Analysis: Why has it happened?


Diagnostic analysis helps dig further by creating detailed, informative,
dynamic, and interactive dashboards to answer that. It separates the
root cause of the problem and identifies the source of the patterns.
3. Predictive Analysis: What is likely to happen?
It predicts the likelihood of an event, forecasting any measurable
amount, risk assessment, and segmenting customers into groups.
Along with the previous summarized and root cause analysis, the
models use statistics and machine learning algorithms for predicting
future outcomes.
4. Prescriptive Analysis: How to make it happen?
It collaborates the learnings from the what, why has happened with
what is likely to happen to help with what measures to maximize the
primary business metrics. The prescriptive analysis is not predicting
one individual standalone event but a collection of future events
using simulation and optimization.
5. Cognitive Analysis: Mimicking the human brain to carry out
tasks
It combines technologies such as artificial intelligence, semantics,
machine learning, and deep learning algorithms. It learns and even
generates data using the already available data and retrieves
features and hidden patterns. Real-time data cognitive analysis is
heavily employed in image classification and segmentation,
detection of objects, machine translations, virtual assistants, and
Chabot’s.
What is Data Interpretation?
Is the process of assigning meaning to the processed and analyzed data.
It enables us to make informed and meaningful conclusions,
implications, infer the significance between the relationships of
variables and explain the patterns in the data.
• Explaining numerical data points and categorical data points would
require different methods; hence, the different nature of data demands
different data interpretation technique.
• There are two primary techniques available to understand and
interpret the data; Quantitative, and Qualitative
Importance of Data Analysis and Interpretation

• 1. Informed decision-making
• 2. Identification of trend and forecasting needs
• 3. Cost-efficient
• 4. Clear Insights
A. Descriptive Statistics

• Descriptive statistics refers to a branch of statistics that


involves summarizing, organizing, and presenting data
meaningfully and concisely. It focuses on describing and
analyzing a dataset's main features and characteristics without
making any generalizations or inferences to a larger
population.
A. Descriptive Statistics

• The primary goal of descriptive statistics is to provide a clear


and concise summary of the data, enabling researchers or
analysts to gain insights and understand patterns, trends, and
distributions within the dataset. This summary typically
includes measures such as central tendency (e.g., mean,
median, mode), dispersion (e.g., range, variance, standard
deviation), and shape of the distribution (e.g., skewness,
kurtosis).
A. Descriptive Statistics

• Descriptive statistics also involves a graphical representation


of data through charts, graphs, and tables, which can further
aid in visualizing and interpreting the information. Common
graphical techniques include histograms, bar charts, pie charts,
scatter plots, and box plots.
• By employing descriptive statistics, researchers can effectively
summarize and communicate the key characteristics of a
dataset, facilitating a better understanding of the data and
providing a foundation for further statistical analysis or
decision-making processes.
A. Descriptive Statistics
TYPES OF DESCRIPTIVE STATISTICS

1) Distribution (Also Called Frequency Distribution)


• Datasets consist of a distribution of scores or values. Statisticians use
graphs and tables to summarize the frequency of every possible value of a
variable, rendered in percentages or numbers. For instance, if you held a
poll to determine people’s favorite Beatle, you’d set up one column with
all possible variables (John, Paul, George, and Ringo), and another with
the number of votes.
• Statisticians depict frequency distributions as either a graph or as a table.
A. Descriptive Statistics
2) Measures of Central Tendency

• Measures of central tendency estimate a dataset's average or center, finding the result using three methods: mean,
mode, and median.

• Mean: The mean is also known as “M” and is the most common method for finding averages. You get the mean by
adding all the response values together, and dividing the sum by the number of responses, or “N.” For instance, say
someone is trying to figure out how many hours a day they sleep in a week. So, the data set would be the hour entries
(e.g., 6,8,7,10,8,4,9), and the sum of those values is 52. There are seven responses, so N=7. You divide the value sum
of 52 by N, or 7, to find M, which in this instance is 7.3.

• Mode: The mode is just the most frequent response value. Datasets may have any number of modes, including “zero.”
You can find the mode by arranging your dataset's order from the lowest to highest value and then looking for the
most common response. So, in using our sleep study from the last part: 4,6,7,8,8,9,10. As you can see, the mode is
eight.

• Median: Finally, we have the median, defined as the value in the precise center of the dataset. Arrange the values in
ascending order (like we did for the mode) and look for the number in the set’s middle. In this case, the median is
eight.
A. Descriptive Statistics

3) Variability (Also Called Dispersion)


• The measure of variability gives the statistician an idea of how spread out the responses
are. The spread has three aspects — range, standard deviation, and variance.
• Range: Use range to determine how far apart the most extreme values are. Start by
subtracting the dataset’s lowest value from its highest value. Once again, we turn to our
sleep study: 4,6,7,8,8,9,10. We subtract four (the lowest) from ten (the highest) and get
six. There’s your range.
• Standard Deviation: This aspect takes a little more work. The standard deviation (s) is
your dataset’s average amount of variability, showing you how far each score lies from
the mean.
A. Descriptive Statistics
4) Univariate Descriptive Statistics
• Univariate descriptive statistics examine only one variable at a time and do not compare variables.
Rather, it allows the researcher to describe individual variables. As a result, this sort of statistic is
also known as descriptive statistics. The patterns identified in this sort of data may be explained
using the following:
• Measures of central tendency (mean, mode, and median)
• Data dispersion (standard deviation, variance, range, minimum, maximum, and quartiles) (standard
deviation, variance, range, minimum, maximum, and quartiles)
Tables of frequency distribution:

• Pie graphs

• Frequency polygon histograms

• Bar graphs
A. Descriptive Statistics

5) Bivariate Descriptive Statistics


• When using bivariate descriptive statistics, two variables are concurrently analyzed
(compared) to see whether they are correlated. Generally, by convention, the
independent variable is represented by the columns, and the rows represent the
dependent variable.'
A. Descriptive Statistics
Example:
Exam Scores Suppose you have the following scores of 20 students on an exam:
85, 90, 75, 92, 88, 79, 83, 95, 87, 91, 78, 86, 89, 94, 82, 80, 84, 93, 88, 81
• To calculate descriptive statistics:
• Mean: Add up all the scores and divide by the number of scores. Mean = (85 + 90 +
75 + 92 + 88 + 79 + 83 + 95 + 87 + 91 + 78 + 86 + 89 + 94 + 82 + 80 + 84 + 93 + 88
+ 81) / 20 = 1770 / 20 = 88.5
• Median: Arrange the scores in ascending order and find the middle value. Median =
86 (middle value)
• Mode: Identify the score(s) that appear(s) most frequently. Mode = 88
A. Descriptive Statistics

• Range: Calculate the difference between the highest and lowest scores.
Range = 95 - 75 = 20
• Variance: Calculate the average of the squared differences from the
mean. Variance = [(85-88.5)^2 + (90-88.5)^2 + ... + (81-88.5)^2] / 20 =
33.25
• Standard Deviation: Take the square root of the variance. Standard
Deviation = √33.25 = 5.77
A. Descriptive Statistics

What is the Main Purpose of Descriptive Statistics?


Descriptive statistics can be useful for two things:
1.) providing basic information about variables in a dataset and
2.) highlighting potential relationships between variables.
Graphical/Pictorial Methods are measures of the three most common
descriptive statistics that can be displayed graphically or pictorially. It is
used to summarize data. Descriptive statistics only make statements about
the data set used to calculate them; they never go beyond your data.
B. NON-PARAMETRIC TESTS

• Non-parametric tests are the mathematical methods used in statistical


hypothesis testing, which do not make assumptions about the frequency
distribution of variables that are to be evaluated. The non-parametric
experiment is used when there are skewed data, and it comprises
techniques that do not depend on data pertaining to any particular
distribution.
B. NON-PARAMETRIC TESTS

Type of Tests

1.) Mann-Whitney U Test


• This test determines the difference between two non-dependent groups
on a condition that the dependent variables will either be regular or
ordinal. If r1 and r2 are the sum of the ranks in group 1 and group 2
respectively, then the test statistic ‘U’ is given as;
B. NON-PARAMETRIC TESTS
Mann-Whitney U Test
B. NON-PARAMETRIC TESTS

Type of Tests

2.)Wilcoxon Signed Rank Test


• -The Wilcoxon Signed Rank Test is a nonparametric counterpart of the paired
samples t-test. The test compares two dependent samples with ordinal data.
B. NON-PARAMETRIC TESTS

Wilcoxon Signed Rank Test


B. NON-PARAMETRIC TESTS

Type of Tests

The Kruskal-Wallis Test-This test helps in estimating whether two or more medians
are distinct. The ranks of the data points are used in calculations rather than the data
point themselves.
B. NON-PARAMETRIC TESTS

The Kruskal-Wallis Tes


B. NON-PARAMETRIC TESTS
• Advantages and Disadvantages of Non-Parametric Test
Advantages
• Easily understandable
• Short calculations
• Assumption of distribution is not required
• Applicable to all types of data
Disadvantages
• Less efficient as compared to parametric test
• The results may or may not provide an accurate answer because they are distribution
free
B. NON-PARAMETRIC TESTS
Reasons to Use Nonparametric Tests
1. The underlying data do not meet the assumptions about the population sample
-Generally, the application of parametric tests requires various assumptions to be satisfied. For example,
the data follows a normal distribution and the population variance is homogeneous. However, some data
samples may show skewed distributions.
2. The population sample size is too small
-The sample size is an important assumption in selecting the appropriate statistical method. If a sample
size is reasonably large, the applicable parametric test can be used.
3. The analyzed data is ordinal or nominal
-For such types of variables, the nonparametric tests are the only appropriate solution.
C. CHI-SQUARE

• Is a statistical procedure for determining the difference


between observed and expected data. This test can also be
used to determine whether it correlates to the categorical
variables in our data. It helps to find out whether a difference
between two categorical variables is due to chance or a
relationship between them.
C. CHI-SQUARE
• Categorical variables--- belong to a subset of variables that can be
divided into discrete categories. Names or labels are the most
common categories. These variables are also known as qualitative
variables because they depict the variable's quality or
characteristics.
• Categorical variables can be divided into two categories:
1.)Nominal Variable: A nominal variable's categories have no natural
ordering.
2.)Ordinal Variable: A variable that allows the categories to be sorted
is ordinal variables.
C. CHI-SQUARE
C. CHI-SQUARE
Where
• c = Degrees of freedom
• O = Observed Value
• E = Expected Value
• Degrees of freedom (c)--calculation represent the number of
variables that can vary in a statistically valid.
• Observed Value (o) ---- are those you gather yourselves.
• Expected values (E) --- are the frequencies expected, based on
the null hypothesis.
C. CHI-SQUARE
When to use a chi-square test?
A Pearson’s chi-square test may be an appropriate option for your data if all of
the following are true:
1.)You want to test a hypothesis about one or more categorical variables. If one
or more of your variables is quantitative, you should use a different statistical
test. Alternatively, you could convert the quantitative variable into a categorical
variable by separating the observations into intervals.
2.)The sample was randomly selected from the population.
3.)There are a minimum of five observations expected in each group or
combination of groups.
C. CHI-SQUARE
Important Characteristics Of A Chi Square Test
• This test (as a non-parametric test) is based on frequencies and not on the
parameters like mean and standard deviation.
• The test is used for testing the hypothesis and is not useful for estimation.
• This test can also be applied to a complex contingency table with several
classes and as such is a very useful test in research work.
• This test is an important non-parametric test as no rigid assumptions are
necessary in regard to the type of population, no need of parameter values
and relatively less mathematical details are involved.
D. T-TEST

• is a statistical test that is used to compare the means of two


groups. It is often used in hypothesis testing to determine
whether a process or treatment actually has an effect on the
population of interest, or whether two groups are different from
one another.
D. T-TEST

• Formula of T-Test

• The formula for the two-sample t test (a.k.a. the Student’s t-test) is shown below.

In this formula, t is the t value, x1 and x2 are the means of the two groups being compared, s2 is the
pooled standard error of the two groups, and n1 and n2 are the number of observations in each of the
groups.
D. T-TEST

Types of T-Test
When choosing a t test, you will need to consider two things:
• whether the groups being compared come from a single
population or two different populations, and
• whether you want to test the difference in a specific direction.
D. T-TEST
D. T-TEST

• One sample t-test is a statistical test where the critical area of a


distribution is one-sided so that the alternative hypothesis is accepted if the
population parameter is either greater than or less than a certain value, but
not both.
• Two sample t-test is a test a method in which the critical area of a
distribution is two-sided and the test is performed to determine whether the
population parameter of the sample is greater than or less than a specific
range of values.
• Independent t-test is a test used for judging the means of two independent
groups to determine the statistical evidence to prove that the population
means are significantly different.
D. T-TEST

• Application of T-Test
• • The T-test compares the mean of two samples, dependent or independent.
• • It can also be used to determine if the sample mean is different from the
assumed mean.
• • T-test has an application in determining the confidence interval for a
sample mean.
D. T-TEST
Two Content Layout with Table
• First bullet point here Class Group 1 Group 2

• Second bullet point here Class 1 82 95

• Third bullet point here Class 2 76 88

Class 3 84 90
Title and Content Layout with SmartArt

• Task description
Step 2 Title • Task description
• Task description • Task description • Task description
• Task description

Step 1 Title Step 3 Title

You might also like