Professional Documents
Culture Documents
1. Frequency distribution:
When data are grouped according to magnitude, the resulting series is called frequency
distribution. For example, in the following list of numbers, the frequency of the number 9 is 5
(because it occurs five times):
1,2 3,4,6,9,9,8,5,1,1,9,9,0,6,9
2. Descriptive Statistics:
It aims at describing a number of features of data usually involved in a study. Its main purpose is
to provide description of the samples and the measures done on a particular study through either
numerical calculations, graphs or tables.
3. Variable View:
It contains descriptions of the attributes of each variable in the data file. In variable view, rows
are called variables while columns are called variable attributes.
4. Data View
5. Standard Deviation:
It is a number used to tell how measurements for a group spread out from the average (mean), or
expected value.
• Mean: It is the average of the numbers, a calculated "central" value of a set of numbers.
• Median: It is the middle value in the list of numbers. Numbers have to be listed in
numerical order from smallest to largest.
• Mode: It is the value that occurs most often. If no number is repeated in the list, there is
no mode for the list.
7. Correlation:
Correlation is a statistical measure that indicates the extent to which two or more variables
fluctuate together. A positive correlation indicates the extent to which those variables increase or
decrease in parallel; a negative correlation indicates the extent to which one variable increases as
the other decreases.
8. Pearson Correlation:
A Pearson correlation also known as the “product moment correlation coefficient” (PMCC) or
simply “correlation”, is a number between -1 and 1 that indicates the extent to which two
variables are linearly related and are suitable only for metric variables (which include
dichotomous variables).
9. Graph:
10.Regression:
A measure of the relation between the mean value of one variable and corresponding values of
other variables.
In simple linear regression, a single independent variable is used to predict the value of a
dependent variable. In multiple linear regression, two or more independent variables are used to
predict the value of a dependent variable. The difference between the two is the number of
independent variables.
12.T-test
A t-test’s statistical significance indicates whether the difference between two groups’ averages
most likely reflects a “real” difference in the population from which the groups were sampled.
• Paired Sample T-test: It compares means from the same group at different times.
• One Sample T-test: It tests the mean of a single group against a known mean.
14. Z-test
A Z-test is a type of hypothesis test. Hypothesis testing is just a way for you to figure out if
results from a test are valid or repeatable
16. ANOVA
ANOVA is an analysis of the variation present in an experiment. It is used for examining the
differences in the mean values of the dependent variable associated with the effect of
independent variables. Essentially, ANOVA is used as a test of means for two or more
populations. The tests in an ANOVA are based on the F-ratio: the variation due to an
experimental treatment or effect divided by the variation due to experimental error.
17.One-way ANOVA
The One-Way ANOVA ("analysis of variance") compares the means of two or more independent
groups in order to determine whether there is statistical evidence that the associated population
means are significantly different. One-Way ANOVA is a parametric test. This test is also known
as:
One-Factor ANOVA One-Way Analysis of Variance
It compares the mean difference between groups that have been split on two independent
variables (called factors) and its main purpose is to understand if there is an interaction between
two independent variable on the dependent variable.
Kruskal-Wallis compares the medians of two or more samples to determine if the samples have
come from different populations. It is an extension of the Mann–Whitney U test to 3 or more
groups. The distributions do not have to be normal and the variances do not have to be equal.
20.Welch’s F-test
Welch’s F-test (Field 2009) is designed to test the equality of group means when we have more
than two groups to compare, especially in the cases which didn’t meet the homogeneity of
variance assumption and sample sizes are small.
The Friedman test is used to detect differences in scores across multiple occasions or conditions.
The scores for each subject are ranked and then the sums of the ranks for each condition are used
to calculate a test statistic. The Friedman test can also be used when subjects have ranked a list
e.g. rank these pictures in order of preference.
Test to determine if the portions of two or more categories differ from the expected proportion.
The Mann-Whitney U test is a non-parametric test that can be used in place of an unpaired t-test.
It is used to test the null hypothesis that two samples come from the same population (i.e. have
the same median) or, alternatively, whether observations in one sample tend to be larger than
observations in the other. Although it is a non-parametric test, it does assume that the two
distributions are similar in shape.
A Phi coefficient is a non-parametric test of relationships that operates on two dichotomous (or
dichotomized) variables. The phi (rhymes with fee) correlation gives an estimate of the degree of
relationship between two dichotomous variables. The value of the phi φ correlation coefficient is
interpreted just like the Pearson r, that is, it can vary from -1.00 to +1.00.
26.Skewness
27. Kurtosis
Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is used to
describe the extreme values in one versus the other tail. It is actually the measure of
outliers present in the distribution.
28.Inferential Statistics:
Inferential statistics use a random sample to draw conclusions about the population. Typically, it
is not practical to obtain data from every member of a population. Instead, we collect a random
sample from a small proportion of the population
29.Outliers:
Also called Spearman’s rho, the Spearman correlation evaluates the monotonic relationship
between two continuous or ordinal variables. In a monotonic relationship, the variables tend to
change together, but not necessarily at a constant rate. The Spearman correlation coefficient is
based on the ranked values for each variable rather than the raw data.
31.Range (Statistics)
In statistics, range is defined simply as the difference between the maximum and minimum
observations.
32.Median test:
Normal Distribution is uniquely defined by its mean and standard deviation. It is symmetrical
about the mean and may be represented graphically as a bell shaped curve, known as the Normal
curve. The area under the curve = 1. Most of the area under the curve is within ± one SD of the
mean, the large majority (95%) is within ± 1.96 SD (often written as 2 SD for short) of the mean,
almost all is within ± 3 SD of the mean.
35.Non-Parametric Test:
Non-parametric test is one that makes no such assumptions. In this strict sense, "non-
parametric" is essentially a null category, since virtually all statistical tests assume one thing or
another about the properties of the source.