You are on page 1of 12

Quantitative Methods

1
Parametric and Non Parametric

Module 004 – Parametric and Non-Parametric

At the end of this module you are expected to:


1. Explain statistical test;
2. Differentiate parametric and non-parametric;
3. Discuss descriptive statistics and inferential statistics

Statistical Test

A statistical test provides a mechanism for making quantitative decisions about a


process or processes. The intent is to determine whether there is enough evidence to
"reject" a conjecture or hypothesis about the process. The conjecture is called the null
hypothesis. Not rejecting may be a good result if we want to continue to act as if we
"believe" the null hypothesis is true. Or it may be a disappointing result, possibly
indicating we may not yet have enough data to "prove" something by rejecting the
null hypothesis.

Choosing a Test

In terms of selecting a statistical test, the most important question is "what is the main
study hypothesis?" In some cases there is no hypothesis; the investigator just wants
to "see what is there". For example, in a prevalence study there is no hypothesis to
test, and the size of the study is determined by how accurately the investigator wants
to determine the prevalence. If there is no hypothesis, then there is no statistical test.
It is important to decide a priori which hypotheses are confirmatory (that is, are
testing some presupposed relationship), and which are exploratory (are suggested by
the data). No single study can support a whole series of hypotheses. A sensible plan
is to limit severely the number of confirmatory hypotheses. Although it is valid to use
statistical tests on hypotheses suggested by the data, the P values should be used only
as guidelines, and the results treated as tentative until confirmed by subsequent
studies. A useful guide is to use a Bonferroni correction, which states simply that if
one is testing n independent hypotheses, one should use a significance level of 0.05/n.
Thus if there were two independent hypotheses a result would be declared significant
only if P<0.025. Note that, since tests are rarely independent, this is a very
conservative procedure – i.e. one that is unlikely to reject the null hypothesis. The
investigator should then ask "are the data independent?" This can be difficult to
decide but as a rule of thumb results on the same individual, or from matched
individuals, are not independent. Thus results from a crossover trial, or from a case-

Course Module
control study in which the controls were matched to the cases by age, sex and social
class, are not independent.

 Analysis should reflect the design, and so a matched design should be followed
by a matched analysis.
 Results measured over time require special care. One of the most common
mistakes in statistical analysis is to treat correlated variables as if they were
independent. For example, suppose we were looking at treatment of leg
ulcers, in which some people had an ulcer on each leg. We might have 20
subjects with 30 ulcers but the number of independent pieces of information
is 20 because the state of ulcers on each leg for one person may be influenced
by the state of health of the person and an analysis that considered ulcers as
independent observations would be incorrect. For a correct analysis of mixed
paired and unpaired data consult a statistician.

The next question is "what types of data are being measured?" The test used should
be determined by the data. The choice of test for matched or paired data is described
in Table 1.

Table 1 Choice of statistical test from paired or matched observation

Parametric tests are those that make assumptions about the parameters of the
population distribution from which the sample is drawn. This is often the assumption
that the population data are normally distributed. Non-parametric tests are
“distribution-free” and, as such, can be used for non-Normal variables. Table 2 shows
the non-parametric equivalent of a number of parametric tests.
Quantitative Methods
3
Parametric and Non Parametric

Table 2: Parametric and Non-parametric tests for comparing two or more groups

Non-parametric tests are valid for both non-Normally distributed data and Normally
distributed data, so why not use them all the time?

It would seem prudent to use non-parametric tests in all cases, which would save one
the bother of testing for Normality. Parametric tests are preferred, however, for the
following reasons:

1. We are rarely interested in a significance test alone; we would like to say


something about the population from which the samples came, and this is
best done with estimates of parameters and confidence intervals.

2. It is difficult to do flexible modelling with non-parametric tests, for


example allowing for confounding factors using multiple regression.

3. Parametric tests usually have more statistical power than their non-
parametric equivalents. In other words, one is more likely to detect
significant differences when they truly exist.

A potential source of confusion in working out what statistics to use in analysing data
is whether your data allows for parametric or non-parametric statistics.
The importance of this issue cannot be underestimated!

Course Module
If you get it wrong you risk using an incorrect statistical procedure or you may use a
less powerful procedure.

Non-parametric statistical procedures are less powerful because they use less
information in their calculation. For example, a parametric correlation uses
information about the mean and deviation from the mean while a non-parametric
correlation will use only the ordinal position of pairs of scores.

The basic distinction for parametric versus non-parametric is:

 If your measurement scale is nominal or ordinal then you use non-parametric


statistics
 If you are using interval or ratio scales you use parametric statistics.

There are other considerations which have to be taken into account:

You have to look at the distribution of your data. If your data is supposed to take
parametric stats you should check that the distributions are approximately normal.
The best way to do this is to check the skew and Kurtosis measures from the
frequency output from SPSS. For a relatively normal distribution:

Skew~=1.0
kurtosis~=1.0

If a distribution deviates markedly from normality then you take the risk that the
statistic will be inaccurate. The safest thing to do is to use an equivalent non-
parametric statistic.

Parametric Test

The parametric test is the hypothesis test which provides generalizations for making
statements about the mean of the parent population. A t-test based on Student’s t-
statistic, which is often used in this regard.

The t-statistic rests on the underlying assumption that there is the normal
distribution of variable and the mean in known or assumed to be known. The
population variance is calculated for the sample. It is assumed that the variables of
interest, in the population are measured on an interval scale.

A parameter in statistics refers to an aspect of a population, as opposed to a statistic,


which refers to an aspect about a sample. For example, the population mean is a
parameter, while the sample mean is a statistic. A parametric statistical test makes an
assumption about the population parameters and the distributions that the data came
from. These types of test includes Student’s T tests and ANOVA tests, which assume
data is from a normal distribution.
Quantitative Methods
5
Parametric and Non Parametric

The opposite is a nonparametric test, which doesn’t assume anything about the
population parameters. Nonparametric tests include chi-square, Fisher’s exact
test and the Mann-Whitney test.

Every parametric test has a nonparametric equivalent. For example, if you have
parametric data from two independent groups, you can run a 2 sample t test to
compare means. If you have nonparametric data, you can run a Wilcoxon rank-sum
test to compare means.

Advantage of Parametric Test

The advantage of using a parametric test instead of a nonparametric equivalent is that


the former will have more statistical power than the latter. In other words, a
parametric test is more able to lead to a rejection of H0. Most of the time, the p-value
associated to a parametric test will be lower than the p-value associated to a
nonparametric equivalent that is run on the same data.

Advantage 1: Parametric tests can provide trustworthy results with


distributions that are skewed and nonnormal

Many people aren’t aware of this fact, but parametric analyses can produce reliable
results even when your continuous data are nonnormally distributed. You just have
to be sure that your sample size meets the requirements for each analysis in the table
below. Simulation studies have identified these requirements.

PARAMETRIC ANALYSES SAMPLE SIZE REQUIREMENTS


FOR NONNORMAL DATA
1-SAMPLE T-TEST Greater than 20
2-SAMPLE T-TEST Each group should have more
than 15 observations
ONE-WAY ANOVA For 2-9 groups, each group
should have more than 15
observations
For 10-12 groups, each group
should have more than 20
observations

Course Module
Table 3: Parametric Analyses

Advantage 2: Parametric tests can provide trustworthy results when the groups
have different amounts of variability

It’s true that nonparametric tests don’t require data that are normally distributed.
However, nonparametric tests have the disadvantage of an additional requirement
that can be very hard to satisfy. The groups in a nonparametric analysis typically must
all have the same variability (dispersion). Nonparametric analyses might not provide
accurate results when variability differs between groups.

Conversely, parametric analyses, like the 2-sample t-test or one-way ANOVA, allow
you to analyze groups that have unequal variances. In most statistical software, it’s as
easy as checking the correct box! You don’t have to worry about groups having
different amounts of variability when you use a parametric analysis.

Advantage 3: Parametric tests have greater statistical power

In most cases, parametric tests have more power. If an effect actually exists, a
parametric analysis is more likely to detect it.

Non Parametric Test

The nonparametric test is defined as the hypothesis test which is not based on
underlying assumptions, i.e. it does not require population’s distribution to be
denoted by specific parameters.

The test is mainly based on differences in medians. Hence, it is alternately known as


the distribution-free test. The test assumes that the variables are measured on a
nominal or ordinal level. It is used when the independent variables are non-metric.

A non parametric test (sometimes called a distribution free test) does not assume
anything about the underlying distribution (for example, that the data comes from
a normal distribution). That’s compared to parametric test, which makes
assumptions about a population’s parameters (for example, the mean or standard
deviation); When the word “non- parametric” is used in stats, it doesn’t quite mean
that you know nothing about the population. It usually means that you know the
population data does not have a normal distribution.
Quantitative Methods
7
Parametric and Non Parametric

For example, one assumption for the one way ANOVA is that the data comes from a
normal distribution. If your data isn’t normally distributed, you can’t run an ANOVA,
but you can run the nonparametric alternative–the Kruskal-Wallis test.

If at all possible, you should us parametric tests, as they tend to be more accurate.
Parametric tests have greater statistical power, which means they are likely to find a
true significant effect. Use nonparametric tests only if you have to (i.e. you know that
assumptions like normality are being violated). Nonparametric tests can perform
well with non-normal continuous data if you have a sufficiently large sample size
(generally 15-20 items in each group).

When to use it?

Non parametric tests are used when your data isn’t normal. Therefore the key is to
figure out if you have normally distributed data. For example, you could look at the
distribution of your data. If your data is approximately normal, then you can
use parametric statistical tests.

A normal distribution has no skew. Basically, it’s a centered and symmetrical in shape.
Kurtosis refers to how much of the data is in the tails and the center. The skewness
and kurtosis for a normal distribution is about 1.

If your distribution is not normal (in other words, the skewness and kurtosis deviate
a lot from 1.0), you should use a non parametric test like chi-square test. Otherwise
you run the risk that your results will be meaningless.

Data Types

Does your data allow for a parametric test, or do you have to use a non parametric
test like chi-square? The rule of thumb is:

 For nominal scales or ordinal scales, use non parametric statistics.


 For interval scales or ratio scales use parametric statistics.

A skewed distribution is one reason to run a nonparametric test.


Other reasons to run nonparametric tests:

 One or more assumptions of a parametric test have been violated.


 Your sample size is too small to run a parametric test.
 Your data has outliers that cannot be removed.
 You want to test for the median rather than the mean (you might want to do
this if you have a very skewed distribution).

Types of Nonparametric Tests


Course Module
When the word “parametric” is used in stats, it usually means tests like ANOVA or a t
test. Those tests both assume that the population data has a normal distribution. Non
parametric do not assume that the data is normally distributed. The only non
parametric test you are likely to come across in elementary stats is the chi-
square test. However, there are several others. For example: the Kruskal Willis test is
the non parametric alternative to the One way ANOVA and the Mann Whitney is the
non parametric alternative to the two sample t test.

The main nonparametric tests are:

 1-sample sign test - Use this test to estimate the median of a population and
compare it to a reference value or target value.
 1-sample Wilcoxon signed rank test - With this test, you also estimate the
population median and compare it to a reference/target value. However, the
test assumes your data comes from a symmetric distribution (like the Cauchy
distribution or uniform distribution).
 Friedman test - This test is used to test for differences between groups
with ordinal dependent variables. It can also be used for continuous data if the
one-way ANOVA with repeated measures is inappropriate (i.e. some
assumption has been violated).
 Goodman Kruska’s Gamma - a test of association for ranked variables.
 Kruskal - Wallis test - Use this test instead of a one-way ANOVA to find out if
two or more medians are different. Ranks of the data points are used for the
calculations, rather than the data points themselves.
 The Mann-Kendall Trend Test looks for trends in time-series data.
 Mann-Whitney test - Use this test to compare differences between two
independent groups when dependent variables are either ordinal or
continuous.
 Mood’s Median test - Use this test instead of the sign test when you have two
independent samples.
 Spearman Rank Correlation - Use when you want to find a correlation between
two sets of data.

NO NP ARA M E TR IC TE S T P ARA M ET RI C AL T ER NAT I V E


1-sample sign test One-sample Z-test, One sample t-test

1-sample Wilcoxon Signed


Rank test One sample Z-test, One sample t-test

Friedman test Two-way ANOVA

Kruskal-Wallis test One-way ANOVA

Mann-Whitney test Independent samples t-test


Quantitative Methods
9
Parametric and Non Parametric

Mood’s Median test One-way ANOVA

Spearman Rank
Correlation Correlation Coefficient

Table 4: Lists the nonparametric tests and their parametric alternatives.

Advantages and Disadvantages

Compared to parametric tests, nonparametric tests have several advantages,


including:

 More statistical power when assumptions for the parametric tests have been
violated. When assumptions haven’t been violated, they can be almost as
powerful.
 Fewer assumptions (i.e. the assumption of normality doesn’t apply).
 Small sample sizes are acceptable.
 They can be used for all data types, including nominal variables, interval
variables, or data that has outliers or that has been measured imprecisely.

However, they do have their disadvantages. The most notable ones are:

 Less powerful than parametric tests if assumptions haven’t been violated.


 More labor-intensive to calculate by hand (for computer calculations, this isn’t
an issue).

Course Module
 Critical value tables for many tests aren’t included in many computer software
packages. This is compared to tables for parametric tests (like the z-table or t-
table) which usually are included

Descriptive and Inferential Statistics

When analysing data, such as the marks achieved by 100 students for a piece of
coursework, it is possible to use both descriptive and inferential statistics in your
analysis of their marks. Typically, in most research conducted on groups of people,
you will use both descriptive and inferential statistics to analyse your results and
draw conclusions. So what are descriptive and inferential statistics? And what are
their differences?

Descriptive Statistics

Descriptive statistics is the term given to the analysis of data that helps describe, show
or summarize data in a meaningful way such that, for example, patterns might emerge
from the data. Descriptive statistics do not, however, allow us to make conclusions
beyond the data we have analyzed or reach conclusions regarding any hypotheses we
might have made. They are simply a way to describe our data.

Descriptive statistics are very important because if we simply presented our raw data
it would be hard to visualize what the data was showing, especially if there was a lot
of it. Descriptive statistics therefore enables us to present the data in a more
meaningful way, which allows simpler interpretation of the data. For example, if we
had the results of 100 pieces of students' coursework, we may be interested in the
overall performance of those students. We would also be interested in the
distribution or spread of the marks. Descriptive statistics allow us to do this. How to
properly describe data through statistics and graphs is an important topic and
discussed in other Laerd Statistics guides. Typically, there are two general types of
statistic that are used to describe data:

 Measures of central tendency: these are ways of describing the central


position of a frequency distribution for a group of data. In this case, the
frequency distribution is simply the distribution and pattern of marks scored
by the 100 students from the lowest to the highest. We can describe this
central position using a number of statistics, including the mode, median, and
mean.

 Measures of spread: these are ways of summarizing a group of data by


describing how spread out the scores are. For example, the mean score of our
100 students may be 65 out of 100. However, not all students will have scored
65 marks. Rather, their scores will be spread out. Some will be lower and
others higher. Measures of spread help us to summarize how spread out these
scores are. To describe this spread, a number of statistics are available to us,
including the range, quartiles, absolute deviation, variance and standard
deviation.
Quantitative Methods
11
Parametric and Non Parametric

When we use descriptive statistics it is useful to summarize our group of data using a
combination of tabulated description (i.e., tables), graphical description (i.e., graphs
and charts) and statistical commentary (i.e., a discussion of the results).

Inferential Statistics

We have seen that descriptive statistics provide information about our immediate
group of data. For example, we could calculate the mean and standard deviation of
the exam marks for the 100 students and this could provide valuable information
about this group of 100 students. Any group of data like this, which includes all the
data you are interested in, is called a population. A population can be small or large,
as long as it includes all the data you are interested in. For example, if you were only
interested in the exam marks of 100 students, the 100 students would represent your
population. Descriptive statistics are applied to populations, and the properties of
populations, like the mean or standard deviation, are called parameters as they
represent the whole population (i.e., everybody you are interested in).

Often, however, you do not have access to the whole population you are interested in
investigating, but only a limited number of data instead. For example, you might be
interested in the exam marks of all students in the UK. It is not feasible to measure all
exam marks of all students in the whole of the UK so you have to measure a
smaller sample of students (e.g., 100 students), which are used to represent the larger
population of all UK students. Properties of samples, such as the mean or standard
deviation, are not called parameters, but statistics. Inferential statistics are
techniques that allow us to use these samples to make generalizations about the
populations from which the samples were drawn. It is, therefore, important that the
sample accurately represents the population. The process of achieving this is called
sampling. Inferential statistics arise out of the fact that sampling naturally incurs
sampling error and thus a sample is not expected to perfectly represent the
population. The methods of inferential statistics are (1) the estimation of
parameter(s) and (2) testing of statistical hypotheses.

References and Supplementary Materials


Books and Journals

Course Module
1. Vimala Veeraraghavan, Suhas Shetgovekar; 2016; Textbook of Non Parametric and
Parametric Statistics; London; SAGE Publications Inc.
2. Robet Bruhl; 2017; Understanding Statistical Analysis and Methods; London; SAGE
Publications Inc.
Online Supplementary Reading Materials
1. Parametric and Non Parametric Tests for Comparing Two or More Variables;
https://www.healthknowledge.org.uk/public-health-textbook/research-methods/1b-
statistical-methods/parametric-nonparametric-tests; September 08, 2018
2. Parametric Versus Non Parametric;
http://users.monash.edu/~smarkham/resources/param.htm; September 08, 2018
Online Instructional Videos
1. This video explains the differences between parametric and nonparametric statistical
tests. The assumptions for parametric and nonparametric tests are discussed
including the Mann-Whitney Test, Kruskal-Wallis Test, Wilcoxon Signed-Rank Test,
and Friedman’s ANOVA.; https://www.youtube.com/watch?v=pWEWHKnwg_0;
September 08, 2018
2. We have covered a number of testing scenarios and a parametric and nonparametric
test for each of those scenarios. We haven't spent much time talking about how to
decide between choosing a parametric and nonparametric test.;
https://www.youtube.com/watch?v=frGwZJdOa74; September 08, 2018
3. Video shows what inferential statistics means. A branch of statistics studying
statistical inference—drawing conclusions about a population from a random sample
drawn from it, or, more generally, about a random process from its observed behavior
during a finite period of time.. Inferential statistics Meaning. How to pronounce,
definition audio dictionary..; https://www.youtube.com/watch?v=yJ1cNv7sSRk;
September 08, 2018

You might also like