You are on page 1of 39

Statistical Analysis

Statistical analysis
means investigating trends,
patterns, and relationships using
quantitative data. It is an important
research tool used by scientists,
governments, businesses, and other
organizations.
STEPS IN STATISTICAL ANALYSIS
1. Write your hypotheses and plan your research
design.

Writing statistical hypotheses


 To collect valid data for statistical analysis, you first
need to specify your hypotheses and plan out your
research design.
 The goal of research is often to investigate a
relationship between variables within a population. You
start with a prediction, and use statistical analysis to
test that prediction.
1. Write your hypotheses and plan your
research design.

Planning your research design

 A research design is your overall strategy for


data collection and analysis. It determines the
statistical tests you can use to test your
hypothesis later on.
 A research design is a strategy for answering
your research question using empirical data.
Types of Research Design

 Experimental design - you can assess a cause-and-effect


relationship (e.g., the effect of meditation on test scores)
using statistical tests of comparison or regression.

 Correlational design - explore relationships between


variables (e.g., parental income and GPA) without any
assumption of causality using correlation coefficients and
significance tests.

 Descriptive design - studies the characteristics of a


population or phenomenon (e.g., the prevalence of anxiety in
U.S. college students) using statistical tests to draw
inferences from sample data.
Measuring variables

For statistical analysis, it’s important to consider the


level of measurement of your variables, which tells
you what kind of data they contain:

Categorical data represents groupings. These may be


nominal (e.g., gender) or ordinal (e.g. level of language
ability).

Quantitative data represents amounts. These may be


on an interval scale (e.g. test score) or a ratio scale
(e.g. age).
Many variables can be measured at
different levels of precision. For example,
age data can be quantitative (8 years old)
or categorical (young). If a variable is
coded numerically (e.g., level of agreement
from 1–5), it doesn’t automatically mean
that it’s quantitative instead of categorical.
Step 2: Collect data from a sample
Statistical analysis allows you to apply your findings beyond
your own sample as long as you use appropriate sampling
procedures. You should aim for a sample that is
representative of the population.

 Sampling for statistical analysis


COLLECTING DATA

A population is the entire group that you


want to draw conclusions about.

A sample is the specific group that you will


collect data from. The size of the sample is
always less than the total size of the
population.
SAMPLING FOR STATISTICAL ANALYSIS

There are two main approaches to selecting a


sample.

 Probability sampling: every member of the


population has a chance of being selected for
the study through random selection.

 Non-probability sampling: some members of the


population are more likely than others to be
selected for the study because of criteria such
as convenience or voluntary self-selection.
CREATE AN APPROPRIATE SAMPLING
PROCEDURE

Based on the resources available for your


research, decide on how you’ll recruit participants.

Will you have resources to advertise your study


widely, including outside of your university setting?
Will you have the means to recruit a diverse
sample that represents a broad population?
Do you have time to contact and follow up with
members of hard-to-reach groups?
CALCULATE SUFFICIENT SAMPLE SIZE

Before recruiting participants, decide on


your sample size either by looking at other
studies in your field or using statistics. A
sample that’s too small may be
unrepresentative of the sample, while a
sample that’s too large will be more costly
than necessary.
Step 3: Summarize your data with
descriptive statistics.

Once you’ve collected all of your data,


you can inspect them and calculate
descriptive statistics that summarize
them.
Inspect your data

There are various ways to inspect your data,


including the following:

 Organizing data from each variable in frequency


distribution tables.
 Displaying data from a key variable in a bar
chart to view the distribution of responses.
 Visualizing the relationship between two
variables using a scatter plot.
 Calculate measures of central tendency

Measures of central tendency describe where most


of the values in a data set lie. Three main
measures of central tendency are often reported:

 Mode: the most popular response or value in the


data set.
 Median: the value in the exact middle of the
data set when ordered from low to high.
 Mean: the sum of all values divided by the
number of values.
Distributions and central
tendency

A data set is a distribution of n


number of scores or values.
Normal distribution

In a normal distribution, data is


symmetrically distributed with no skew.
Most values cluster around a central region,
with values tapering off as they go further
away from the center. The mean, mode and
median are exactly the same in a normal
distribution.
Skewed distributions

In skewed distributions, more values fall on one


side of the center than the other, and the mean,
median and mode all differ from each other. One
side has a more spread out and longer tail with
fewer scores at one end than the other. The
direction of this tail tells you the side of the skew
Calculate measures of variability

Measures of variability tell you how spread out the


values in a data set are. Four main measures of
variability are often reported:

Range: the highest value minus the lowest value of


the data set.
Interquartile range: the range of the middle half of
the data set.
Standard deviation: the average distance between
each value in your data set and the mean.
Variance: the square of the standard deviation.
Levels of Measurement
Levels of measurement, also called scales of
measurement, tell you how precisely variables are
recorded. In scientific research, a variable is
anything that can take on different values across
your data set (e.g., height or test scores).
There are 4 levels of measurement:

Nominal: the data can only be


categorized
Ordinal: the data can be categorized
and ranked
Interval: the data can be categorized,
ranked, and evenly spaced
Ratio: the data can be categorized,
ranked, evenly spaced, and has a
natural zero.
Nominal level
You can categorize your data by labelling them in
mutually exclusive groups, but there is no order
between the categories.

Examples of nominal scales

 City of birth
 Gender
 Ethnicity
 Car brands
 Marital status
Ordinal level
You can categorize and rank your data in an order,
but you cannot say anything about the intervals
between the rankings.
Although you can rank the top 5 Olympic medallists,
this scale does not tell you how close or far apart
they are in number of wins.

Examples of ordinal scales


 Top 5 Olympic medallists
 Language ability (e.g., beginner, intermediate,
fluent)
 Likert-type questions (e.g., very dissatisfied to
very satisfied)
Interval level
You can categorize, rank, and infer equal intervals
between neighboring data points, but there is no
true zero point.
The difference between any two adjacent
temperatures is the same: one degree. But zero
degrees is defined differently depending on the
scale – it doesn’t mean an absolute absence of
temperature.

Examples of interval scales


 Test scores (e.g., IQ or exams)
 Personality inventories
 Temperature in Fahrenheit or Celsius
Ratio level
You can categorize, rank, and infer equal intervals
between neighboring data points, and there is a
true zero point.
A true zero means there is an absence of the
variable of interest. In ratio scales, zero does
mean an absolute lack of the variable.

Examples of ratio scales


 Height
 Age
 Weight
 Temperature in Kelvin
Variability
Variability describes how far apart data
points lie from each other and from the
center of a distribution. Along with
measures of central tendency, measures
of variability give you descriptive
statistics that summarize your data.
Variability is also referred to as spread,
scatter or dispersion. It is most commonly
measured with the following:

Range: the difference between the highest


and lowest values
Interquartile range: the range of the middle
half of a distribution
Standard deviation: average distance from
the mean
Variance: average of squared distances from
the mean
Correlation Coefficient
A correlation coefficient is a number
between -1 and 1 that tells you the
strength and direction of a relationship
between variables.

In other words, it reflects how similar


the measurements of two or more
variables are across a dataset.
Inferential
Statistics
While descriptive statistics summarize the
characteristics of a data set, inferential
statistics help you come to conclusions and
make predictions based on your data.

When you have collected data from a sample,


you can use inferential statistics to
understand the larger population from which
the sample is taken.
Inferential statistics have two main uses:

 making estimates about populations (for


example, the mean SAT score of all 11th
graders in the US).

 testing hypotheses to draw conclusions about


populations (for example, the relationship
between SAT scores and family income).
T-Distribution
The t-distribution, also known as Student’s t-
distribution, is a way of describing data that follow a
bell curve when plotted on a graph, with the greatest
number of observations close to the mean and fewer
observations in the tails.

In statistics, the t-distribution is most often used to:

 Find the critical values for a confidence interval


when the data is approximately normally distributed.
 Find the corresponding p-value from a statistical
test that uses the t-distribution (t-tests, regression
analysis).
The t-distribution, also known as Student’s t-
distribution, is a way of describing data that follow a
bell curve when plotted on a graph, with the greatest
number of observations close to the mean and fewer
observations in the tails.

In statistics, the t-distribution is most often used to:

 Find the critical values for a confidence interval


when the data is approximately normally distributed.
 Find the corresponding p-value from a statistical
test that uses the t-distribution (t-tests, regression
analysis).
THANK YOU
GIRLIE T. ALAAN
Reporter

You might also like