Professional Documents
Culture Documents
AMR Concept Note-1 (Freq Dist, Cross Tab, T-Test and ANOVA)
AMR Concept Note-1 (Freq Dist, Cross Tab, T-Test and ANOVA)
Concept Note-I
Dr. VIKAS GOYAL
The test statistic measures how close the sample has come to the null hypothesis.
The test statistic often follows a well-known distribution, such as the normal, t, or chi-square distribution.
Type I error occurs when the sample results lead to the rejection of the null hypothesis when it is in fact
true. The probability of type I error () is also called the level of significance.
Type II error occurs when, based on the sample results, the null hypothesis is not rejected when it is in fact
false. The probability of type II error is denoted by .
Cross-tabulation is a statistical technique that describes two or more variables simultaneously and results
in tables that reflect the joint distribution of two or more variables that have a limited number of
categories or distinct values.
A cross-tabulation is the merging of the frequency distribution of two or more variables in a single table.
Cross-tabulation with two variables is also known as Bivariate Cross-tabulation.
Cross-tabulation tables are also called contingency tables.
The chi-square statistic (2) is used to test the statistical significance of the observed association in a crosstabulation.
The chi-square distribution is a skewed distribution whose shape depends solely on the number of
degrees of freedom. As the number of degrees of freedom increases, the chi-square distribution becomes
more symmetrical.
Chi-square requires that you use numerical values, not percentages or ratios.
Chi-square should not be calculated if the expected value in any category is less than 5.
The phi coefficient () is used as a measure of the strength of association between the two variables, in
the special case of a table with two rows and two columns (a 2 x 2 table).
Cramers V can be used to check the strength of relationship in cross tab for any size of table.
t tests are conducted for examining hypothesis about means.
t test could be conducted on the mean of one or two samples of observations.
t test are used to provide inferences for making statements about the means of parent populations.
Parametric tests are hypothesis-testing procedures that assume that the variables of interest are
measured on at least an interval scale.
Nonparametric tests are hypothesis-testing procedures that assume that the variables are measured on a
nominal or ordinal scale.
Single sample t-test is performed when we wish to test the hypothesis about the mean of a variable
against an absolute number (say, mean height is greater than 4)
For class circulation only
Two-sample t-tests for testing the difference in the mean of either the independent samples, paired
samples and overlapping samples.
Two independent sample t-tests allow researchers to evaluate the mean difference between two
populations using the data from these two separate samples.
Two independent samples t-test is used when two separate sets of independent and identically
distributed samples are obtained, one from each of the two populations being compared.
Paired samples t-tests is used when the hypothesis needs to compare the mean of two different variables
for a single population, without identifying separate groups. Thus, it is called paired or related sample ttest. It characteristically comprise of a sample of matched pairs, or one group of units that has been tested
twice.
Kolmogorov-Smirnov (K-S) one-sample test is a one-sample nonparametric goodness-of-fit test that
compares the cumulative distribution function for a variable with a specified distribution.
Mann-Whitney U Test is a statistical test for a variable measured on an ordinal scale, comparing the
difference in the location of two populations based on observations from two independent samples.
Wilcoxon test can be used as the non-parametric equivalent for paired sample t-test.
Analysis of variance (ANOVA) & Analysis of Covariance (ANCOVA) are used for examining the differences
in the mean values of the dependent variable associated with the effect of the independent variables with
more than 2 categories/levels or treatment. Dependent variable needs to be metric and the independent
variable needs to be categorical for ANOVA.
Analysis of variance (ANOVA) is a statistical technique for examining the differences among means for two
or more populations.
Treatment in ANOVA is a particular combination of factor levels or categories.
One-way ANOVA is a technique in which there is only one factor or independent variable.
SSy is the total variation in Y, i.e. the sum of squares. This is the total of SS between groups and SS within
groups.
SSbetween is also denoted as SSx, is the variation in Y related to the variation in the mean level of different
categories of X. This represents variation between the categories of X, or the portion of the sum of
squares in Y related to X.
SSwithin is also referred to as SSerror, is the variation in Y due to the variation within each of the categories of
X.
The strength of the effects of individual X (independent variable or factor) on Y (dependent variable) is
measured by eta2 ( 2). The value of 2 varies between 0 and 1.
N-way ANOVA is a model where two or more factors are involved.
Strength of relationship between individual factors and the dependent variable can be measured by using
omega-squared.
Analysis of covariance (ANCOVA) is an advanced analysis of variance procedure in which the effects of one
or more metric-scaled independent variables are removed from the dependent variable before conducting
the ANOVA. Metric independent variables in ANOVA are treated as covariates.
The covariate is generally used as the control variables in ANCOVA.
Multivariate analysis of variance (MANOVA) is similar to analysis of variance (ANOVA), except that
instead of one metric dependent variable, we have two or more.
A Classification of Hypothesis Testing Procedures for Examining Differences: