Selecting the Appropriate Statistical Method

CHOSSING
APPROPRIATE
STATISTICAL
MA. SOFIA JOY
C. CASUNCAD
AILENE B. ELLEN C. PRAXIDES, Ed.D.
BABOR PROFESSOR
DISCUSSANT DISCUSSANT
• To select the appropriate statistical method, one need to
know the assumption and conditions of the statistical
methods, so that proper statistical method can be
selected for data analysis.
• Two main statistical methods are used in data analysis:
• descriptive statistics, which summarizes data using
indexes such as mean and median and another
• is inferential statistics, which draw conclusions from
data using statistical tests such as student's t-test and
ANOVA test.
• Selection of appropriate statistical method depends
on the following three things:
• Aim and objective of the study,
• Type and distribution of the data used, and
• Nature of the observations (paired/unpaired).
• All type of statistical methods that are used to
compare the means are called parametric while
statistical methods used to compare other than
means (ex-median/mean ranks/proportions) are
called nonparametric methods.
• To select the appropriate statistical method, one need to
know the assumption and conditions of the statistical
methods, so that proper statistical method can be selected
for data analysis.
• Other than knowledge of the statistical methods, another
very important aspect is nature and type of the data
collected and objective of the study because as per
objective, corresponding statistical methods are selected
which are suitable on given data.
• Incorrect statistical methods can be seen in many conditions
like use of unpaired t-test on paired data or use of
parametric test for the data which does not follow the
normal distribution.
FACTORS INFLUENCING SELECTION OF STATISTICAL
METHODS
1. Aim and objective of the study
• Selection of statistical test depends upon our aim and
objective of the study. Suppose our objective is to find out
the predictors of the outcome variable, then regression
analysis is used while to compare the means between two
independent samples, unpaired samples t-test is used.
2. Type and distribution of the data used
• For the same objective, selection of the statistical test is
varying as per data types. For the nominal, ordinal, discrete
data, we use nonparametric methods while for continuous
data, parametric methods as well as nonparametric
methods are used.
FACTORS INFLUENCING SELECTION OF STATISTICAL METHODS
• For example, in the regression analysis, when our outcome variable is
categorical, logistic regression while for the continuous variable, linear
regression model is used. The choice of the most appropriate representative
measure for continuous variable is dependent on how the values are
distributed. If continuous variable follows normal distribution, mean is the
representative measure while for non-normal data, median is considered as
the most appropriate representative measure of the data set. Similarly, in the
categorical data, proportion (percentage) while for the ranking/ordinal data,
mean ranks are our representative measure.
3. Observations are paired or unpaired
• Another important point in selection of the statistical test is to assess
whether data is paired (same subjects are measures at different time points
or using different methods) or unpaired (each group have different
subject). For example, to compare the means between two groups, when
data is paired, paired samples t-test while for unpaired (independent) data,
independent samples t-test is used.
• Statistical tests are used in hypothesis testing.
They can be used to:
• determine whether a predictor variable has
a statistically significant relationship with an
outcome variable.
• estimate the difference between two or
more groups.
• Statistical tests assume a null hypothesis
of no relationship or no difference
between groups. Then they determine
whether the observed data fall outside of
the range of values predicted by the null
hypothesis. If you already know what
types of variables you’re dealing with, you
can use the flowchart to choose the right
statistical test for your data.
WHAT DOES THE STATISTICAL DO?
• Statistical tests work by calculating a test statistic – a
number that describes how much the relationship
between variables in your test differs from the null
hypothesis of no relationship.
• It then calculates a p-value (probability value). The p-
value estimates how likely it is that you would see
the difference described by the test statistic if the
null hypothesis of no relationship were true.
• If the value of the test statistic is more extreme than
the statistic calculated from the null hypothesis, then
you can infer a statistically significant
relationship between the predictor and outcome
variables.
• If the value of the test statistic is less extreme than the
one calculated from the null hypothesis, then you can
infer no statistically significant relationship between
the predictor and outcome variables.
WHEN TO PERFORM STATISTICAL TEST
• You can perform statistical tests on data that have
been collected in a statistically valid manner – either
through an experiment, or through observations
made using probability sampling methods.
• For a statistical test to be valid, your sample size
needs to be large enough to approximate the true
distribution of the population being studied.
• To determine which statistical test to use, you need
to know:
• whether your data meets certain assumptions.
• the types of variables that you’re dealing with.
STATISTICAL ASSUMPTIONS
• Statistical tests make some common assumptions about the data

they are testing:
1. Independence of observations (a.k.a. no autocorrelation): The
observations/variables you include in your test are not related (for
example, multiple measurements of a single test subject are not
independent, while measurements of multiple different test
subjects are independent).
2. Homogeneity of variance: the variance within each group being
compared is similar among all groups. If one group has much more
variation than others, it will limit the test’s effectiveness.
3. Normality of data: the data follows a normal distribution (a.k.a. a
bell curve). This assumption applies only to quantitative data.
• If your data do not meet the assumptions of
normality or homogeneity of variance, you may be
able to perform a nonparametric statistical test,
which allows you to make comparisons without any
assumptions about the data distribution.
• If your data do not meet the assumption of
independence of observations, you may be able to
use a test that accounts for structure in your data
(repeated-measures tests or tests that include
blocking variables).
Types of variables
• The types of variables you have usually determined what
type of statistical test you can use:
1. Quantitative variables represent amounts of things (e.g.
the number of trees in a forest). Types of quantitative
variables include:
• Continuous (a.k.a ratio variables): represent measures
and can usually be divided into units smaller than one
(e.g. 0.75 grams).
• Discrete (a.k.a integer variables): represent counts and
usually can’t be divided into units smaller than one (e.g.
1 tree).
Types of variables
2. Categorical variables represent groupings of things (e.g.
the different tree species in a forest). Types of categorical
variables include:
• Ordinal: represent data with an order (e.g. rankings).
• Nominal: represent group names (e.g. brands or species
names).
• Binary: represent data with a yes/no or 1/0 outcome
(e.g. win or lose).
• Choose the test that fits the types of predictor and
outcome variables you have collected (if you are doing an
experiment, these are the
independent and dependent variables).
Parametric Test • Common types of Parametric Test
• They can only be
conducted with data
that adheres to the
• Regression tests.
common assumptions • comparison tests
of homogeneity of
variance that follows • correlation tests.
normal distribution
Regression test
• Regression tests are used to test cause-and-effect relationships. They
look for the effect of one or more continuous variables on another
variable.
• Regression models describe the relationship
between variables by fitting a line to the
observed data. Linear regression models use a
straight line.
• Example: You are a social researcher interested in the
relationship between income and happiness. You survey
500 people whose incomes range from $15k to $75k and
ask them to rank their happiness on a scale from 1 to 10.
• Your independent variable (income) and dependent
variable (happiness) are both quantitative, so you can do a
regression analysis to see if there is a linear relationship
between them.
• Multiple linear regression is used to estimate the
relationship between two or more independent
variables and one dependent variable.
Example:
1. How strong the relationship is between two or more
independent variables and one dependent variable (e.g. how
rainfall, temperature, and amount of fertilizer added affect
crop growth).
2. The value of the dependent variable at a certain value of
the independent variables (e.g. the expected yield of a crop at
certain levels of rainfall, temperature, and fertilizer addition).
COMPARISON TEST
Comparison tests look for differences among group
means. They can be used to test the effect of a
categorical variable on the mean value of some other
characteristic.
T-tests are used when comparing the means of
precisely two groups (e.g. the average heights of men
and women). ANOVA and MANOVA tests are used
when comparing the means of more than two groups
(e.g. the average heights of children, teenagers, and
adults)
• A t-test is a statistical test that is used to compare the means
of two groups. It is often used in hypothesis testing to determine
whether a process or treatment actually has an effect on the
population of interest, or whether two groups are different from
one another.
ANOVA when you have collected data about
one categorical independent variable and one quantitative
dependent variable. The independent variable should have at
least three levels (i.e. at least three different groups or
categories).
• Examples:
1.Your independent variable is social media use, and you
assign groups to low, medium, and high levels of social
media use to find out if there is a difference in hours of
sleep per night.
2. Your independent variable is brand of soda, and you
collect data on Coke, Pepsi, Sprite, and Fanta to find out
if there is a difference in the price per 100ml.
Correlation test
Correlation tests check whether two variables are
related without assuming cause-and-effect relationships.
• These can be used to test whether two variables you want
to use in (for example) a multiple regression test are
autocorrelated.
Non parametric Test
Non-parametric tests don’t make as many
assumptions about the data, and are useful when
one or more of the common statistical
assumptions are violated. However, the
inferences they make aren’t as strong as with
parametric tests.
NON-PARAMETRIC can be used if there are
outlining data that violates the normal bell curve of
Normal Distribution.
Spearman’s R
A Spearman correlation coefficient is also referred to as Spearman rank
correlation or Spearman’s rho. It is typically denoted either with the Greek
letter rho (ρ), or rs. Like all correlation coefficients, Spearman’s rho measures
the strength of association between two variables. As such, the Spearman
correlation coefficient is similar to the Pearson correlation coefficient.
• Compared to the Pearson correlation coefficient, the Spearman
correlation does not require continuous-level data (interval or
ratio), because it uses ranks instead of assumptions about the
distributions of the two variables. This allows us to analyze the
association between variables of ordinal measurement levels.
Moreover, the Spearman correlation does not assume that the
variables are normally distributed. A Spearman correlation
analysis can therefore be used in many cases in which the
assumptions of the Pearson correlation (continuous-level
variables, linearity, heteroscedasticity, and normality) are not
met
Chi Square test of Independence
The Chi-Square Test of Independence can only compare categorical
variables. It cannot make comparisons between continuous variables or
between categorical and continuous variables. Additionally, the Chi-
Square Test of Independence only assesses associations between
categorical variables, and can not provide any inferences about
causation.
Example: we have a list of movie genres; this is our first variable. Our
second variable is whether or not the patrons of those genres bought
snacks at the theater. Our idea (or, in statistical terms, our null
hypothesis) is that the type of movie and whether or not people bought
snacks are unrelated. The owner of the movie theater wants to estimate
how many snacks to buy. If movie type and snack purchases are
unrelated, estimating will be simpler than if the movie types impact
snack sales.
The Kruskal-Wallis H
test
• The Kruskal-Wallis H test (sometimes also called
the "one-way ANOVA on ranks") is a rank-based
nonparametric test that can be used to determine if
there are statistically significant differences
between two or more groups of an independent
variable on a continuous or ordinal dependent
variable. It is considered the nonparametric
alternative to the one-way ANOVA, and an
extension of the Mann-Whitney U test to allow the
comparison of more than two independent groups.
• For example, you could use a Kruskal-Wallis H test to
understand whether exam performance, measured on a
continuous scale from 0-100, differed based on test anxiety
levels (i.e., your dependent variable would be "exam
performance" and your independent variable would be "test
anxiety level", which has three independent groups:
students with "low", "medium" and "high" test anxiety
levels). Alternately, you could use the Kruskal-Wallis H test
to understand whether attitudes towards pay discrimination,
where attitudes are measured on an ordinal scale, differed
based on job position (i.e., your dependent variable would
be "attitudes towards pay discrimination", measured on a 5-
point scale from "strongly agree" to "strongly disagree", and
your independent variable would be "job description", which has three independent groups: "shop floor",
THANK YOU!!!

ANOSIM
• ANOSIM is a distribution-free method of multivariate data analysis widely
used by community ecologists. It is primarily employed to compare the
variation in species abundance and composition among sampling units (=
Beta diversity) in terms of some grouping factor or experimental treatment
levels. ANOSIM. There is no difference between the means of two or more
groups of (ranked) dissimilarities. The Analysis Of Similarity (ANOSIM)
test has some similarity to an ANOVA-like hypothesis test, however, it is
used to evaluate a dissimilarity matrix rather than raw data (Clarke, 1993 ).
Wilcon Rank Sum Test
•A popular nonparametric test to compare outcomes between two independent groups is the
Mann Whitney U test. The Mann Whitney U test, sometimes called the Mann Whitney
Wilcoxon Test or the Wilcoxon Rank Sum Test, is used to test whether two samples are likely
to derive from the same population (i.e., that the two populations have the same shape).
Some investigators interpret this test as comparing the medians between the two populations.
Recall that the parametric test compares the means (H0: μ1=μ2) between independent groups.
•In contrast, the null and two-sided research hypotheses for the nonparametric test are stated
as follows:
•H0: The two populations are equal versus

•H1: The two populations are not equal.
•This test is often performed as a two-sided test and, thus, the research hypothesis indicates
that the populations are not equal as opposed to specifying directionality. A one-sided
research hypothesis is used if interest lies in detecting a positive or negative shift in one
population as compared to the other. The procedure for the test involves pooling the
observations from the two samples into one combined sample, keeping track of which sample
each observation comes from, and then ranking lowest to highest from 1 to n1+n2,
respectively.
The Wilcoxon signed-rank test
• The Wilcoxon signed-rank test is the nonparametric test equivalent to the
dependent t-test. As the Wilcoxon signed-rank test does not assume normality in the
data, it can be used when this assumption has been violated and the use of the
dependent t-test is inappropriate. It is used to compare two sets of scores that come
from the same participants. This can occur when we wish to investigate any change in
scores from one time point to another, or when individuals are subjected to more than
one condition.
• For example, you could use a Wilcoxon signed-rank test to understand whether there
was a difference in smokers' daily cigarette consumption before and after a 6 week
hypnotherapy programme (i.e., your dependent variable would be "daily cigarette
consumption", and your two related groups would be the cigarette consumption
values "before" and "after" the hypnotherapy programme). You could also use a
Wilcoxon signed-rank test to understand whether there was a difference in reaction
times under two different lighting conditions (i.e., your dependent variable would be
"reaction time", measured in milliseconds, and your two related groups would be
reaction times in a room using "blue light" versus "red light").
• Selection of the appropriate statistical methods is very important for the quality
research. It is important that a researcher knows the basic concepts of the statistical
methods used to conduct research study that produce a valid and reliable results.
• There are various statistical methods that can be used in different situations. Each
test makes particular assumptions about the data. These assumptions should be
taken into consideration when deciding which the most appropriate test is.
• Wrong or inappropriate use of statistical methods may lead to defective
conclusions, finally would harm the evidence-based practices.
• However, it is extremely difficult for academician to learn the entire statistical
methods. Therefore, at least basic knowledge is very important so that appropriate
selection of the statistical methods can decide as well as correct/incorrect practices
can be recognized in the published research.

Selecting the Appropriate Statistical Method

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Selecting the Appropriate Statistical Method

Uploaded by

Copyright:

Available Formats

CHOSSING

• Statistical tests make some common assumptions about the data

•H0: The two populations are equal versus

You might also like