Statistical Analysis

Statistical Analysis
Statistical analysis
means investigating trends,
patterns, and relationships using
quantitative data. It is an important
research tool used by scientists,
governments, businesses, and other
organizations.
STEPS IN STATISTICAL ANALYSIS
1. Write your hypotheses and plan your research
design.
Writing statistical hypotheses

 To collect valid data for statistical analysis, you first
need to specify your hypotheses and plan out your
research design.
 The goal of research is often to investigate a
relationship between variables within a population. You
start with a prediction, and use statistical analysis to
test that prediction.
1. Write your hypotheses and plan your
research design.
Planning your research design
 A research design is your overall strategy for

data collection and analysis. It determines the
statistical tests you can use to test your
hypothesis later on.
 A research design is a strategy for answering
your research question using empirical data.
Types of Research Design
 Experimental design - you can assess a cause-and-effect

relationship (e.g., the effect of meditation on test scores)
using statistical tests of comparison or regression.
 Correlational design - explore relationships between

variables (e.g., parental income and GPA) without any
assumption of causality using correlation coefficients and
significance tests.
 Descriptive design - studies the characteristics of a

population or phenomenon (e.g., the prevalence of anxiety in
U.S. college students) using statistical tests to draw
inferences from sample data.
Measuring variables
For statistical analysis, it’s important to consider the

level of measurement of your variables, which tells
you what kind of data they contain:
Categorical data represents groupings. These may be

nominal (e.g., gender) or ordinal (e.g. level of language
ability).
Quantitative data represents amounts. These may be

on an interval scale (e.g. test score) or a ratio scale
(e.g. age).
Many variables can be measured at
different levels of precision. For example,
age data can be quantitative (8 years old)
or categorical (young). If a variable is
coded numerically (e.g., level of agreement
from 1–5), it doesn’t automatically mean
that it’s quantitative instead of categorical.
Step 2: Collect data from a sample
Statistical analysis allows you to apply your findings beyond
your own sample as long as you use appropriate sampling
procedures. You should aim for a sample that is
representative of the population.
 Sampling for statistical analysis

COLLECTING DATA
A population is the entire group that you

want to draw conclusions about.
A sample is the specific group that you will

collect data from. The size of the sample is
always less than the total size of the
population.
SAMPLING FOR STATISTICAL ANALYSIS
There are two main approaches to selecting a

sample.
 Probability sampling: every member of the

population has a chance of being selected for
the study through random selection.
 Non-probability sampling: some members of the

population are more likely than others to be
selected for the study because of criteria such
as convenience or voluntary self-selection.
CREATE AN APPROPRIATE SAMPLING
PROCEDURE
Based on the resources available for your

research, decide on how you’ll recruit participants.
Will you have resources to advertise your study

widely, including outside of your university setting?
Will you have the means to recruit a diverse
sample that represents a broad population?
Do you have time to contact and follow up with
members of hard-to-reach groups?
CALCULATE SUFFICIENT SAMPLE SIZE
Before recruiting participants, decide on

your sample size either by looking at other
studies in your field or using statistics. A
sample that’s too small may be
unrepresentative of the sample, while a
sample that’s too large will be more costly
than necessary.
Step 3: Summarize your data with
descriptive statistics.
Once you’ve collected all of your data,

you can inspect them and calculate
descriptive statistics that summarize
them.
Inspect your data
There are various ways to inspect your data,

including the following:
 Organizing data from each variable in frequency

distribution tables.
 Displaying data from a key variable in a bar
chart to view the distribution of responses.
 Visualizing the relationship between two
variables using a scatter plot.
 Calculate measures of central tendency
Measures of central tendency describe where most

of the values in a data set lie. Three main
measures of central tendency are often reported:
 Mode: the most popular response or value in the

data set.
 Median: the value in the exact middle of the
data set when ordered from low to high.
 Mean: the sum of all values divided by the
number of values.
Distributions and central
tendency
A data set is a distribution of n

number of scores or values.
Normal distribution
In a normal distribution, data is

symmetrically distributed with no skew.
Most values cluster around a central region,
with values tapering off as they go further
away from the center. The mean, mode and
median are exactly the same in a normal
distribution.
Skewed distributions
In skewed distributions, more values fall on one

side of the center than the other, and the mean,
median and mode all differ from each other. One
side has a more spread out and longer tail with
fewer scores at one end than the other. The
direction of this tail tells you the side of the skew
Calculate measures of variability
Measures of variability tell you how spread out the

values in a data set are. Four main measures of
variability are often reported:
Range: the highest value minus the lowest value of

the data set.
Interquartile range: the range of the middle half of
the data set.
Standard deviation: the average distance between
each value in your data set and the mean.
Variance: the square of the standard deviation.
Levels of Measurement
Levels of measurement, also called scales of
measurement, tell you how precisely variables are
recorded. In scientific research, a variable is
anything that can take on different values across
your data set (e.g., height or test scores).
There are 4 levels of measurement:
Nominal: the data can only be

categorized
Ordinal: the data can be categorized
and ranked
Interval: the data can be categorized,
ranked, and evenly spaced
Ratio: the data can be categorized,
ranked, evenly spaced, and has a
natural zero.
Nominal level
You can categorize your data by labelling them in
mutually exclusive groups, but there is no order
between the categories.
Examples of nominal scales
 City of birth
 Gender
 Ethnicity
 Car brands
 Marital status
Ordinal level
You can categorize and rank your data in an order,
but you cannot say anything about the intervals
between the rankings.
Although you can rank the top 5 Olympic medallists,
this scale does not tell you how close or far apart
they are in number of wins.
Examples of ordinal scales

 Top 5 Olympic medallists
 Language ability (e.g., beginner, intermediate,
fluent)
 Likert-type questions (e.g., very dissatisfied to
very satisfied)
Interval level
You can categorize, rank, and infer equal intervals
between neighboring data points, but there is no
true zero point.
The difference between any two adjacent
temperatures is the same: one degree. But zero
degrees is defined differently depending on the
scale – it doesn’t mean an absolute absence of
temperature.
Examples of interval scales

 Test scores (e.g., IQ or exams)
 Personality inventories
 Temperature in Fahrenheit or Celsius
Ratio level
You can categorize, rank, and infer equal intervals
between neighboring data points, and there is a
true zero point.
A true zero means there is an absence of the
variable of interest. In ratio scales, zero does
mean an absolute lack of the variable.
Examples of ratio scales

 Height
 Age
 Weight
 Temperature in Kelvin
Variability
Variability describes how far apart data
points lie from each other and from the
center of a distribution. Along with
measures of central tendency, measures
of variability give you descriptive
statistics that summarize your data.
Variability is also referred to as spread,
scatter or dispersion. It is most commonly
measured with the following:
Range: the difference between the highest

and lowest values
Interquartile range: the range of the middle
half of a distribution
Standard deviation: average distance from
the mean
Variance: average of squared distances from
the mean
Correlation Coefficient
A correlation coefficient is a number
between -1 and 1 that tells you the
strength and direction of a relationship
between variables.
In other words, it reflects how similar

the measurements of two or more
variables are across a dataset.
Inferential
Statistics
While descriptive statistics summarize the
characteristics of a data set, inferential
statistics help you come to conclusions and
make predictions based on your data.
When you have collected data from a sample,

you can use inferential statistics to
understand the larger population from which
the sample is taken.
Inferential statistics have two main uses:
 making estimates about populations (for

example, the mean SAT score of all 11th
graders in the US).
 testing hypotheses to draw conclusions about

populations (for example, the relationship
between SAT scores and family income).
T-Distribution
The t-distribution, also known as Student’s t-
distribution, is a way of describing data that follow a
bell curve when plotted on a graph, with the greatest
number of observations close to the mean and fewer
observations in the tails.
In statistics, the t-distribution is most often used to:
 Find the critical values for a confidence interval

when the data is approximately normally distributed.
 Find the corresponding p-value from a statistical
test that uses the t-distribution (t-tests, regression
analysis).
The t-distribution, also known as Student’s t-
distribution, is a way of describing data that follow a
bell curve when plotted on a graph, with the greatest
number of observations close to the mean and fewer
observations in the tails.
In statistics, the t-distribution is most often used to:
 Find the critical values for a confidence interval

when the data is approximately normally distributed.
 Find the corresponding p-value from a statistical
test that uses the t-distribution (t-tests, regression
analysis).
THANK YOU
GIRLIE T. ALAAN
Reporter

Statistical Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistical Analysis

Uploaded by

Copyright:

Available Formats

Statistical Analysis

Writing statistical hypotheses

Planning your research design

 A research design is your overall strategy for

 Experimental design - you can assess a cause-and-effect

 Correlational design - explore relationships between

 Descriptive design - studies the characteristics of a

For statistical analysis, it’s important to consider the

Categorical data represents groupings. These may be

Quantitative data represents amounts. These may be

 Sampling for statistical analysis

A population is the entire group that you

A sample is the specific group that you will

There are two main approaches to selecting a

 Probability sampling: every member of the

 Non-probability sampling: some members of the

Based on the resources available for your

Will you have resources to advertise your study

Before recruiting participants, decide on

Once you’ve collected all of your data,

There are various ways to inspect your data,

 Organizing data from each variable in frequency

Measures of central tendency describe where most

 Mode: the most popular response or value in the

A data set is a distribution of n

In a normal distribution, data is

In skewed distributions, more values fall on one

Measures of variability tell you how spread out the

Range: the highest value minus the lowest value of

Nominal: the data can only be

Examples of nominal scales

Examples of ordinal scales

Examples of interval scales

Examples of ratio scales

Range: the difference between the highest

In other words, it reflects how similar

When you have collected data from a sample,

 making estimates about populations (for

 testing hypotheses to draw conclusions about

In statistics, the t-distribution is most often used to:

 Find the critical values for a confidence interval

In statistics, the t-distribution is most often used to:

 Find the critical values for a confidence interval

You might also like