You are on page 1of 11

Statistics

1. Normality (parametric and non-parametric tests)


2. Testing the difference between means
3. Independent sample t-test and paired-sample t-test and their non-
parametric equivalents

Measures of Central Tendency

In statistics, a central tendency is a central or typical value for a probability distribution.


It may also be called a center or location of the distribution. Measures of central
tendency help you find the middle, or the average, of a data set. The 3 most common
measures of central tendency are the mean, median and mode. The mode is the most
frequent value. The median is the middle number in an ordered data set.

What is Kurtosis?

Kurtosis is a statistical measure used to describe the degree to which scores cluster in
the tails or the peak of a frequency distribution. The peak is the tallest part of the
distribution, and the tails are the ends of the distribution.

What are the 3 Types of Kurtosis?

There are three types of kurtosis: mesokurtic, leptokurtic, and platykurtic.

Mesokurtic: Distributions that are moderate in breadth and curves with a medium
peaked height.

Leptokurtic: More values in the distribution tails and more values close to the mean (i.e.
sharply peaked with heavy tails)

Platykurtic: Fewer values in the tails and fewer values close to the mean (i.e. the curve
has a flat peak and has more dispersed scores with lighter tails).

Negative and Positive Value Kurtosis

Negative values of kurtosis indicate that a distribution is flat and has thin tails.
Platykurtic distributions have negative kurtosis values. Normally platykurtic.

Positive values of kurtosis indicate that a distribution is peaked and possess thick tails.
Leptokurtic distributions have positive kurtosis values. Normally leptopkurtic.

When kurtosis is equal to 0, the distribution is mesokurtic. This means the kurtosis is the
same as the normal distribution, it is mesokurtic (medium peak).

Null Hypothesis and Alternative Hypothesis


The null hypothesis states that there is no relationship between the two variables being
studied (one variable does not affect the other). It states the results are due to chance
and are not significant in terms of supporting the idea being investigated. Thus, the null
hypothesis assumes that whatever you are trying to prove did not happen.

The alternative hypothesis is the one you would believe if the null hypothesis is
concluded to be untrue. The alternative hypothesis states that the independent variable
did affect the dependent variable, and the results are significant in terms of supporting
the theory being investigated (i.e. not due to chance).

How Do You Know If A "p-value" Is Statistically Significant?

When you perform a statistical test a p-value helps you determine the significance of your results
in relation to the null hypothesis. The level of statistical significance is often expressed as a p-
value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject
the null hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It
indicates strong evidence against the null hypothesis, as there is less than a 5% probability the
null is correct (and the results are random). Therefore, we reject the null hypothesis, and accept
the alternative hypothesis. However, this does not mean that there is a 95% probability that the
research hypothesis is true. The p-value is conditional upon the null hypothesis being true is
unrelated to the truth or falsity of the research hypothesis. A p-value higher than 0.05 (> 0.05) is
not statistically significant and indicates strong evidence for the null hypothesis. This means we
retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot
accept the null hypothesis, we can only reject the null or fail to reject it.

A statistically significant result cannot prove that a research hypothesis is correct (as
this implies 100% certainty). Instead, we may state our results “provide support for” or
“give evidence for” our research hypothesis (as there is still a slight probability that the
results occurred by chance and the null hypothesis was correct – e.g. less than 5%).

What is Skewness?

In probability theory and statistics, skewness is a measure of the asymmetry of the


probability distribution of a real-valued random variable about its mean. The skewness
value can be positive, zero, negative, or undefined.

It is the degree of distortion from the symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution. It differentiates extreme values in
one versus the other tail. A symmetrical distribution will have a skewness of 0.

Types of Skewness

There are two types of Skewness: Positive and Negative.

Positive Skewness means when the tail on the right side of the distribution is longer or
fatter. The mean and median will be greater than the mode.
Negative Skewness is when the tail of the left side of the distribution is longer or fatter
than the tail on the right side. The mean and median will be less than the mode.

A symmetrical distribution will have a skewness of 0.

Skew Normality and Extremes

If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.

If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and


1(positively skewed), the data are moderately skewed.

If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed),
the data are highly skewed.

Parametric and Non-parametric Tests

Parametric and non-parametric are two broad classifications of statistical procedures.

Parametric tests: are those that make assumptions about the parameters of the
population distribution from which the sample is drawn. This is often the assumption that
the population data are normally distributed.

Non-parametric tests: are those when the parameters of the population is not known
and is not normally distributed. They do not rely on any distribution.

If the data deviate strongly from the assumptions of a parametric procedure, using the
parametric procedure could lead to incorrect conclusions. In such cases, non-
parametric tests are often a good option.

Differences Between Parametric and Non-parametric Tests

1. Parametric Tests:

a - Specific assumptions are made about the population.

b - Test statistic is based on distribution.

c - Variables of interest is done on interval or ratio level.

d - Measure of central tendency in the parametric test is mean.

e - Complete information about the population.

2: Non-parametric Tests:

a - No assumptions are made.


b - Test statistic is arbitrary.

c - Variables of interest is measured on nominal or ordinal scale.

d - Measure of central tendency in the nonparametric test is median..

e - No information about the population.

The parametric test for measuring central tendency may be "mean (average)" and its
non-parametric test equivalent may be "median".

The parametric test for measuring variance may be "standard deviation" while its non-
parametric test equivalent may be "quartiles/range".

Test of Normality

An assessment of the normality of data is a prerequisite for many statistical tests


because normal data is an underlying assumption in parametric testing.

The main tests for the assessment of normality are Kolmogorov-Smirnov and Shapiro-
Wilk test.

Kolmogorov-Smirnov test: This test is used as a test of goodness of fit and is ideal when
the size of the sample is larger (above 2000).

Shapiro-Wilk test: for smaller sample sizes less than 2000

If p < 0.05, we don't believe that our variable follows a normal distribution in our
population. (95 percent confidence interval).

Difference Between Two Independent Means

The comparison of two independent population means is very common and provides a
way to test the hypothesis that the two groups differ from each other.

The Independent Samples t-Test is a statistical test used to determine if 2 groups are
significantly different from each other on your variable of interest.

THE INDEPENDENT SAMPLES t TEST

When we run the analysis, we get a t-statistic and a p-value. The t-statistic is a measure
of how different the two groups are on our recovery variable of interest. A p-value less
than or equal to 0.05 means that our result is statistically significant, and we can trust
that the difference is not due to chance alone.

Assumptions

- Continuous

- Normally distributed

- Random sample

- Enough data (minimum 5)

- Similar spread between groups

Example

Objective: To test the effectiveness of a medical treatment in treating arthritis

Null hypothesis: There is no difference in the number of days to recover in both the
groups

Mann-Whitney U Test

The Mann-Whitney U test is used to compare whether there is a difference in the


dependent variable for two independent groups. (non-parametric)

THE PAIRED SAMPLES t TEST

The dependent t-test (called the paired-samples t-test in SPSS Statistics) compares the
means between two related groups on the same continuous, dependent variable.

Assumptions

- Measured on a continuous scale (i.e., it is measured at the interval or ratio level

- Your independent variable should consist of two categorical, "related groups" or


"matched pairs".

- There should be no significant outliers in the differences between the two related
groups.

- Normally distributed.
Example

Objective: To use a dependent t-test to understand whether there was a difference in


smokers' daily cigarette consumption before and after a 6 week hypnotherapy
programme (i.e., your dependent variable would be "daily cigarette consumption", and
your two related groups would be the cigarette consumption values "before" and "after"
the hypnotherapy programme).

Null hypothesis: There is no difference in the smokers’ daily cigarette consumption after
the intervention

Wilcoxon Signed-rank Test

The Wilcoxon signed-rank test is the nonparametric test equivalent to the dependent t-
test.

As the Wilcoxon signed-rank test does not assume normality in the data, it can be used
when this assumption has been violated and the use of the dependent t-test is
inappropriate.

Hypothesis
Below are some of the important types of hypothesis

1. Simple Hypothesis
a. Simple hypothesis is that one in which there exists relationship between two
variables one is called independent variable or cause and other is dependent
variable or effect. For example
1. Smoking leads to Cancer
2. The higher ratio of unemployment leads to crimes.

2. Complex Hypothesis
a. Complex hypothesis is that one in which as relationship among variables exists. I
recommend you should read characteristics of a good research hypothesis. In this
type dependent as well as independent variables are more than two. For example
i. Smoking and other drugs leads to cancer, tension chest infections etc.
ii. The higher ration of unemployment poverty, illiteracy leads to crimes like
dacoit, Robbery, Rape, prostitution & killing etc.
3. Empirical Hypothesis
a. Working hypothesis is that one which is applied to a field. During the formulation
it is an assumption only but when it is pat to a test become an empirical or
working hypothesis.

4. Null Hypothesis
a. Null hypothesis is contrary to the positive statement of a working hypothesis.
According to null hypothesis there is no relationship between dependent and
independent variable. It is denoted by ‘HO”.

5. Alternative Hypothesis
a. Firstly many hypotheses are selected then among them select one which is more
workable and most efficient. That hypothesis is introduced latter on due to
changes in the old formulated hypothesis. It is denote by “HI”.

6. Logical Hypothesis
a. It is that type in which hypothesis is verified logically. J.S. Mill has given four
cannons of these hypothesis e.g. agreement, disagreement, difference and residue.

7. Statistical Hypothesis
a. A hypothesis which can be verified statistically called statistical hypothesis. The
statement would be logical or illogical but if statistic verifies it, it will be
statistical hypothesis.

Errors
1. Type I error

The first kind of error is the rejection of a true null hypothesis as the result of a test
procedure. This kind of error is called a type I error and is sometimes called an error of
the first kind.

In terms of the courtroom example, a type I error corresponds to convicting an innocent


defendant.

2. Type II error
The second kind of error is the failure to reject a false null hypothesis as the result of a
test procedure. This sort of error is called a type II error and is also referred to as an error
of the second kind.

In terms of the courtroom example, a type II error corresponds to acquitting a criminal.

● False positive and false negative

In terms of false positives and false negatives, a positive result corresponds to rejecting
the null hypothesis, while a negative result corresponds to failing to reject the null
hypothesis; "false" means the conclusion drawn is incorrect. Thus, a type I error is
equivalent to a false positive, and a type II error is equivalent to a false negative.
Normality
Normality tests are used to determine if a data set is well-modeled by a normal distribution
and to compute how likely it is for a random variable underlying the data set to be normally
distributed.

Shapiro-Wilk Test
● The null-hypothesis of this test is that the population is normally distributed.
○ Thus, if the p value is less than the chosen alpha level, then the null hypothesis is
rejected and there is evidence that the data tested are not normally distributed.
● On the other hand, if the p value is greater than the chosen alpha level, then the null
hypothesis that the data came from a normally distributed population can not be rejected
(e.g., for an alpha level of .05, a data set with a p value of less than .05 rejects the null
hypothesis that the data are from a normally distributed population).

● SPSS Steps (Click Here for link with pictures)

1. Click Analyze > Descriptive Statistics > Explore... on the top menu.
2. In the dialogue box that appears, transfer the variable that needs to be tested for
normality into the Dependent List: box by either drag-and-dropping or using the
arrow button.
3. Click on the Statistics button. You will be presented with the Explore: Statistics
dialogue box.
4. Select Descriptives and click Continue.
5. Now click plots and select Stem-and-leaf (Under Descriptives), Factor levels
together (Under Boxplots) and then Normality plots with tests.
6. Click Continue and OK.

● SPSS Output
The output looks like this.

We can see from the above table that for the "Beginner", "Intermediate" and "Advanced"
Course Group the dependent variable, "Time", was normally distributed. How do we
know this? If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data is
normal. If it is below 0.05, the data significantly deviates from a normal distribution.

Note:

● The above table presents the results from two well-known tests of normality,
namely the Kolmogorov-Smirnov Test and the Shapiro-Wilk Test. The Shapiro-
Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also
handle sample sizes as large as 2000. For this reason, we will use the Shapiro-
Wilk test as our numerical means of assessing normality.
● To check for normality, both types of data should be scale.

T Test (Click Here for the link)


● The independent t-test, also called the two sample t-test, independent-samples t-test or
student's t-test, is an inferential statistical test that determines whether there is a
statistically significant difference between the means in two unrelated groups.
● Null and alternative hypotheses for the independent t-test
○ The null hypothesis for the independent t-test is that the population means from
the two unrelated groups are equal:

H0: u1 = u2
○ In most cases, we are looking to see if we can show that we can reject the null
hypothesis and accept the alternative hypothesis, which is that the population
means are not equal:

HA: u1 ≠ u2

○ To do this, we need to set a significance level (also called alpha) that allows us to
either reject or accept the alternative hypothesis. Most commonly, this value is set
at 0.05.

● What do you need to run an independent t-test?

In order to run an independent t-test, you need the following:

○ One independent, categorical variable that has two levels/groups.


○ One continuous dependent variable.

● SPSS Version (Click Here for the link)


● Miscellaneous
○ Parametric is when the data is normally distributed. Use Pearson.
○ Non Parametric is when the data is not normally distributed. Use Spearman.
○ Both of them should be Scale variables.

You might also like