Professional Documents
Culture Documents
What is Kurtosis?
Kurtosis is a statistical measure used to describe the degree to which scores cluster in
the tails or the peak of a frequency distribution. The peak is the tallest part of the
distribution, and the tails are the ends of the distribution.
Mesokurtic: Distributions that are moderate in breadth and curves with a medium
peaked height.
Leptokurtic: More values in the distribution tails and more values close to the mean (i.e.
sharply peaked with heavy tails)
Platykurtic: Fewer values in the tails and fewer values close to the mean (i.e. the curve
has a flat peak and has more dispersed scores with lighter tails).
Negative values of kurtosis indicate that a distribution is flat and has thin tails.
Platykurtic distributions have negative kurtosis values. Normally platykurtic.
Positive values of kurtosis indicate that a distribution is peaked and possess thick tails.
Leptokurtic distributions have positive kurtosis values. Normally leptopkurtic.
When kurtosis is equal to 0, the distribution is mesokurtic. This means the kurtosis is the
same as the normal distribution, it is mesokurtic (medium peak).
The alternative hypothesis is the one you would believe if the null hypothesis is
concluded to be untrue. The alternative hypothesis states that the independent variable
did affect the dependent variable, and the results are significant in terms of supporting
the theory being investigated (i.e. not due to chance).
When you perform a statistical test a p-value helps you determine the significance of your results
in relation to the null hypothesis. The level of statistical significance is often expressed as a p-
value between 0 and 1. The smaller the p-value, the stronger the evidence that you should reject
the null hypothesis. A p-value less than 0.05 (typically ≤ 0.05) is statistically significant. It
indicates strong evidence against the null hypothesis, as there is less than a 5% probability the
null is correct (and the results are random). Therefore, we reject the null hypothesis, and accept
the alternative hypothesis. However, this does not mean that there is a 95% probability that the
research hypothesis is true. The p-value is conditional upon the null hypothesis being true is
unrelated to the truth or falsity of the research hypothesis. A p-value higher than 0.05 (> 0.05) is
not statistically significant and indicates strong evidence for the null hypothesis. This means we
retain the null hypothesis and reject the alternative hypothesis. You should note that you cannot
accept the null hypothesis, we can only reject the null or fail to reject it.
A statistically significant result cannot prove that a research hypothesis is correct (as
this implies 100% certainty). Instead, we may state our results “provide support for” or
“give evidence for” our research hypothesis (as there is still a slight probability that the
results occurred by chance and the null hypothesis was correct – e.g. less than 5%).
What is Skewness?
It is the degree of distortion from the symmetrical bell curve or the normal distribution. It
measures the lack of symmetry in data distribution. It differentiates extreme values in
one versus the other tail. A symmetrical distribution will have a skewness of 0.
Types of Skewness
Positive Skewness means when the tail on the right side of the distribution is longer or
fatter. The mean and median will be greater than the mode.
Negative Skewness is when the tail of the left side of the distribution is longer or fatter
than the tail on the right side. The mean and median will be less than the mode.
If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed),
the data are highly skewed.
Parametric tests: are those that make assumptions about the parameters of the
population distribution from which the sample is drawn. This is often the assumption that
the population data are normally distributed.
Non-parametric tests: are those when the parameters of the population is not known
and is not normally distributed. They do not rely on any distribution.
If the data deviate strongly from the assumptions of a parametric procedure, using the
parametric procedure could lead to incorrect conclusions. In such cases, non-
parametric tests are often a good option.
1. Parametric Tests:
2: Non-parametric Tests:
The parametric test for measuring central tendency may be "mean (average)" and its
non-parametric test equivalent may be "median".
The parametric test for measuring variance may be "standard deviation" while its non-
parametric test equivalent may be "quartiles/range".
Test of Normality
The main tests for the assessment of normality are Kolmogorov-Smirnov and Shapiro-
Wilk test.
Kolmogorov-Smirnov test: This test is used as a test of goodness of fit and is ideal when
the size of the sample is larger (above 2000).
If p < 0.05, we don't believe that our variable follows a normal distribution in our
population. (95 percent confidence interval).
The comparison of two independent population means is very common and provides a
way to test the hypothesis that the two groups differ from each other.
The Independent Samples t-Test is a statistical test used to determine if 2 groups are
significantly different from each other on your variable of interest.
When we run the analysis, we get a t-statistic and a p-value. The t-statistic is a measure
of how different the two groups are on our recovery variable of interest. A p-value less
than or equal to 0.05 means that our result is statistically significant, and we can trust
that the difference is not due to chance alone.
Assumptions
- Continuous
- Normally distributed
- Random sample
Example
Null hypothesis: There is no difference in the number of days to recover in both the
groups
Mann-Whitney U Test
The dependent t-test (called the paired-samples t-test in SPSS Statistics) compares the
means between two related groups on the same continuous, dependent variable.
Assumptions
- There should be no significant outliers in the differences between the two related
groups.
- Normally distributed.
Example
Null hypothesis: There is no difference in the smokers’ daily cigarette consumption after
the intervention
The Wilcoxon signed-rank test is the nonparametric test equivalent to the dependent t-
test.
As the Wilcoxon signed-rank test does not assume normality in the data, it can be used
when this assumption has been violated and the use of the dependent t-test is
inappropriate.
Hypothesis
Below are some of the important types of hypothesis
1. Simple Hypothesis
a. Simple hypothesis is that one in which there exists relationship between two
variables one is called independent variable or cause and other is dependent
variable or effect. For example
1. Smoking leads to Cancer
2. The higher ratio of unemployment leads to crimes.
2. Complex Hypothesis
a. Complex hypothesis is that one in which as relationship among variables exists. I
recommend you should read characteristics of a good research hypothesis. In this
type dependent as well as independent variables are more than two. For example
i. Smoking and other drugs leads to cancer, tension chest infections etc.
ii. The higher ration of unemployment poverty, illiteracy leads to crimes like
dacoit, Robbery, Rape, prostitution & killing etc.
3. Empirical Hypothesis
a. Working hypothesis is that one which is applied to a field. During the formulation
it is an assumption only but when it is pat to a test become an empirical or
working hypothesis.
4. Null Hypothesis
a. Null hypothesis is contrary to the positive statement of a working hypothesis.
According to null hypothesis there is no relationship between dependent and
independent variable. It is denoted by ‘HO”.
5. Alternative Hypothesis
a. Firstly many hypotheses are selected then among them select one which is more
workable and most efficient. That hypothesis is introduced latter on due to
changes in the old formulated hypothesis. It is denote by “HI”.
6. Logical Hypothesis
a. It is that type in which hypothesis is verified logically. J.S. Mill has given four
cannons of these hypothesis e.g. agreement, disagreement, difference and residue.
7. Statistical Hypothesis
a. A hypothesis which can be verified statistically called statistical hypothesis. The
statement would be logical or illogical but if statistic verifies it, it will be
statistical hypothesis.
Errors
1. Type I error
The first kind of error is the rejection of a true null hypothesis as the result of a test
procedure. This kind of error is called a type I error and is sometimes called an error of
the first kind.
2. Type II error
The second kind of error is the failure to reject a false null hypothesis as the result of a
test procedure. This sort of error is called a type II error and is also referred to as an error
of the second kind.
In terms of false positives and false negatives, a positive result corresponds to rejecting
the null hypothesis, while a negative result corresponds to failing to reject the null
hypothesis; "false" means the conclusion drawn is incorrect. Thus, a type I error is
equivalent to a false positive, and a type II error is equivalent to a false negative.
Normality
Normality tests are used to determine if a data set is well-modeled by a normal distribution
and to compute how likely it is for a random variable underlying the data set to be normally
distributed.
Shapiro-Wilk Test
● The null-hypothesis of this test is that the population is normally distributed.
○ Thus, if the p value is less than the chosen alpha level, then the null hypothesis is
rejected and there is evidence that the data tested are not normally distributed.
● On the other hand, if the p value is greater than the chosen alpha level, then the null
hypothesis that the data came from a normally distributed population can not be rejected
(e.g., for an alpha level of .05, a data set with a p value of less than .05 rejects the null
hypothesis that the data are from a normally distributed population).
1. Click Analyze > Descriptive Statistics > Explore... on the top menu.
2. In the dialogue box that appears, transfer the variable that needs to be tested for
normality into the Dependent List: box by either drag-and-dropping or using the
arrow button.
3. Click on the Statistics button. You will be presented with the Explore: Statistics
dialogue box.
4. Select Descriptives and click Continue.
5. Now click plots and select Stem-and-leaf (Under Descriptives), Factor levels
together (Under Boxplots) and then Normality plots with tests.
6. Click Continue and OK.
● SPSS Output
The output looks like this.
We can see from the above table that for the "Beginner", "Intermediate" and "Advanced"
Course Group the dependent variable, "Time", was normally distributed. How do we
know this? If the Sig. value of the Shapiro-Wilk Test is greater than 0.05, the data is
normal. If it is below 0.05, the data significantly deviates from a normal distribution.
Note:
● The above table presents the results from two well-known tests of normality,
namely the Kolmogorov-Smirnov Test and the Shapiro-Wilk Test. The Shapiro-
Wilk Test is more appropriate for small sample sizes (< 50 samples), but can also
handle sample sizes as large as 2000. For this reason, we will use the Shapiro-
Wilk test as our numerical means of assessing normality.
● To check for normality, both types of data should be scale.
H0: u1 = u2
○ In most cases, we are looking to see if we can show that we can reject the null
hypothesis and accept the alternative hypothesis, which is that the population
means are not equal:
HA: u1 ≠ u2
○ To do this, we need to set a significance level (also called alpha) that allows us to
either reject or accept the alternative hypothesis. Most commonly, this value is set
at 0.05.