You are on page 1of 93

What is this test (one-way

ANOVA)for?
An analysis of variance (ANOVA) is used to compare
the means of two or more independent samples
and to test whether the differences between the
means are statistically significant.

The one-way analysis of Variance (one-way


ANOVA) can be thought of as an extension of a t-
test for independent samples. It is used when there
are two or more independent groups.
Poll
ANOVA tests use which of the following distributions?
a) F
b) T
c) Z
d) None
Poll
Analysis of variance is a statistical method of comparing the of
several populations.
a) Means
b) Standard Deviations
c) None Of The Above
one-way ANOVA
Note:
 The independent variable is the categorical
variable that defines the groups that are
compared. e.g., instructional methods, grade
level, or marital status.
 The dependent variable is measured variable
whose means are being compared e.g., level
of job satisfaction, or test anxiety.
POLLS
Which of the following statistical concepts is used to
test differences in the means for more than two
independent populations?
a) Analysis of variance
b) Regression analysis
c) None
What does this test do?
• The one-way ANOVA compares the means between
the groups you are interested in and determines
whether any of those means are significantly
different from each other. Specifically, it tests the null
hypothesis:

• where µ = group mean and k = number of groups. If,


however, the one-way ANOVA returns a significant
result, we accept the alternative hypothesis (HA),
which is that there are at least 2 group means that
are significantly different from each other.
Poll
Express your opinion about the statements given
below:-
A- one-way ANOVA compares the means between the
groups you are interested in
B- and determines whether any of those means are
significantly different from each other.
Ans:
a) Statements A is Correct
b) Statements B is Correct
c) Both Statements A & B are Correct
One-way ANOVA uses
Example,
a one-way ANOVA is used to understand whether exam
performance differed based on test anxiety levels amongst
students, dividing students into three independent groups
(e.g., low, medium and high-stressed students).

It only tells you that at least two groups were different.


Since you may have three, four, five or more groups in your
study design, determining which of these groups differ
from each other is important. You can do this using a post-
hoc test.
One-way ANOVA Assumptions
1: Your dependent variable should be measured at
the interval or ratio scales (i.e., they are
continuous).

Examples of variables that meet this criterion


include, revision time (measured in hours),
intelligence (measured using IQ score),
exam performance (measured from 0 to 100),
weight (measured in kg).
One-way ANOVA Assumptions
2. Your independent variable should consist of two or more categorical,
independent groups. Typically, a one-way ANOVA is used when you
have three or more categorical, independent groups, but it can be
used for just two groups (but an independent-samples t-test is
more commonly used for two groups).
Example independent variables that meet this criterion
include ethnicity (e.g., 3 groups: Caucasian, African American

and Hispanic), physical activity level (e.g., 4 groups:


sedentary, low, moderate and high), profession (e.g., 5
groups: surgeon, doctor, nurse, dentist, therapist), and so
forth.
One-way ANOVA Assumptions
3. You should have independence of observations,
which means that there is no relationship between
the observations in each group or between the
groups themselves.

For example, it is an important assumption of the one-


way ANOVA. If your study fails this assumption, you
will need to use another statistical test instead of the
one-way ANOVA (e.g., a repeated measures design)
One-way ANOVA Assumptions
4: There should be no significant outliers. Outliers are
simply single data points within your data that do not
follow the usual pattern.
Example: in a study of 100 students' IQ scores, where
the mean score was 108 with only a small variation
between students, one student had a score of 156,
which is very unusual.
The problem with outliers is that they can have a
negative effect on the one-way ANOVA, reducing the
validity of your results.
One-way ANOVA Assumptions
5: Your dependent variable should be
approximately normally distributed for each
category of the independent variable.
One-way ANOVA only requires approximately
normal data because it is quite "robust" to
violations of normality, meaning that assumption
can be a little violated and still provide valid
results. You can test for normality using the
Shapiro-Wilk test of normality.
One-way ANOVA Assumptions
6: There needs to be homogeneity of variances.
You can test this assumption in SPSS Statistics
using Levene's test for homogeneity of variances.

If your data fails this assumption, you will need


to not only carry out a Welch ANOVA instead of a
one-way ANOVA, which you can do using SPSS
Statistics, but also use a different post-hoc test.
POLLS
Q-Which of the following is an assumption of one-way ANOVA
comparing samples from three or more experimental
treatments?
a) The samples associated with each population are randomly
selected and are independent from all other samples
b) The response variable within each of the k populations
have equal variances
c) All the response variables within the k populations follow a
normal distributions
d) All of the above
POLLS
Which of the following assumptions must be met to use an
ANOVA?
a) The dependent variable must be interval or ratio
b) There is homogeneity of variance
c) The data must be normally distributed
d) Random sampling of cases must have taken place
e) All of these
Example:
In a company there are four shop floors. Productivity
rate for three methods of incentives and gain sharing
in each shop floor is presented in the following table.
Analyse whether various methods of incentives and
gain sharing differ significantly at 5% and 1% F-limits.
Analysis of Variance (ANOVA)
Another EXAMPLE Problem
Analysis of Variance- EXAMPLE Problem
15 Students undergoing training are randomly assigned
to three different types of instruction modules. At the
end of training period, their test scores are as follows:-
• Analysis of Variance
• Solution

• Generating hypothesis
• Mean of Each Sample
• Analysis of Variance
• Solution
• Analysis of Variance
• Solution

• Mean of Sample Means


• Analysis of Variance
• Solution
• Analysis of Variance
• Solution
• Analysis of Variance
• Solution
• Analysis of Variance
• Solution
• Analysis of Variance
• Solution
• Analysis of Variance
• Solution
• Analysis of Variance
• Solution
• Analysis of Variance
With These calculations, we are ready to set up the
ANOVA Table
SIMPLE REGRESSION
• Regression analysis is the process of developing a
statistical model, which is used to predict the value of a
dependent variable by at least one independent
variable.
• In a simple regression analysis, there are two types of
variables
– Dependent Variable: The variable whose value is
influenced or to be predicted. It is called regressed or
explained variable.
– Independent variable: The variable which influences the
value or is used for prediction. It is also called regressor or
predictor or explanatory variable
Simple linear regression analysis is focused on developing a
regression model by which the value of the dependent
variable can be predicted with the help of the independent
variable.
Exercise
POLLS
The process of constructing a mathematical
model or function that can be used to predict or
determine one variable by another variable is
called
a) regression
b) correlation
c) None

POLLS
B0+ B1 is the linear component
a) True
b) False
BASICS
BASICS
POLLS
Q-Intercept in regression means the
expected mean value of Y when all X=0
a) True
b) False

Q-The slope of a regression line (b) represents


the rate of change in y as x changes.
a) True
b) False
• From the following data
REGRESSION
• A random sample of eight drivers insured with a company
and having similar auto insurance policies was selected. The
following table lists their driving experiences(in years) and
monthly auto insurance premiums.

Predict the monthly auto insurance premium for a driver with 10


years of driving experience.
Solution
Solution
• b= -1.5476
• a= 76.6605
• Regression equation:
• Y= 76.6605-1.5476X
• When X=10, THEN
• Y= 76.6605-1.5476x10= 61.68
Exercise
Solution
Exercise
Exercise
Exercise
Difference B/w Correlation &
Regression
• DIFFRENCE BETWEEN CORRELATION AND REGRESSION
• After having an understanding about the concept and
application of simple correlation and simple regression,
we can draw the difference between them. They are:
• 1) Correlation coefficient .r. between two variables (X
and Y) is a measure of the direction and degree of the
linear relationship between them, which is mutual. It is
symmetric (i.e., rxy = ryx) and it is inconsiderable
which, of X and Y, is dependent variable and which is
independent variable. Whereas regression analysis
aims at establishing the functional relationship
between the two variables under study, and then using
this relationship to predict the value of the dependent
variable for any given value of the independent
variable. It also
• DIFFRENCE BETWEEN CORRELATION AND
REGRESSION
• reflects upon the nature of the variables (i.e., which
is the dependent variable and which is independent
variable). Regression coefficients, therefore, are not
symmetric in X and Y (i.e., byx ≠ bxy).
• 2) Correlation need not imply cause and effect
relationship between the variables under study. But
regression analysis clearly indicates the cause and
effect relationship between the variables. The
variable corresponding to cause is taken as
independent variable and the variable
corresponding to effect is taken as dependent
variable.
• DIFFRENCE BETWEEN CORRELATION AND REGRESSION
• 3) Correlation coefficient .r. is a relative measure of the
linear relationship between X and Y variables and is
independent of the units of measurement.
• It is a number lying between ±1. Whereas the
regression coefficient byx (or bxy) is an absolute
measure representing the change in the value of the
variable Y (or X) for a unit change in the value of the
variable X (or Y).
• Once the functional form of the regression curve is
known, by substituting the value of the dependent
variable we can obtain the value of the independent
variable which will be in the unit of measurement of
the variable.
• DIFFRENCE BETWEEN CORRELATION AND
REGRESSION
• 4) There may be spurious (non-sense) correlation
between two variables which is due to pure chance
and has no practical relevance. For example, the
correlation between the size of shoe and the
income of a group of individuals. There is no such
thing as spurious regression.
• 5) Correlation analysis is confined only to study of
linear relationship between the variables and,
therefore, has limited applications. Whereas
regression analysis has much wider applications as
it studies linear as well as non-linear relationships
between the variables.
Exercise
Difference -Correlation vs Regression

• In correlation analysis the


degree and direction of
relationship between the
variables are studied.
• If value of one variable is
known, the value of other
variable cannot be
estimated.
• Correlation coefficient lies
between -1 to +1.
• Correlation does not always
assume cause and effect
relationship.
Exercise
Difference -Correlation vs Regression
• In correlation analysis the • In regression analysis, the
degree and direction of nature of relationship is
relationship between the studied.
variables are studied.
• If value of variable is known,
• If value of one variable is the value of other variable
known, the value of other can be estimated using the
variable cannot be functional relationships.
estimated.
• Only one regression
• Correlation coefficient lies coefficient can be greater
between -1 to +1. than 1.
• Correlation does not • Regression always expresses
always assume cause and the cause and effect
effect relationship. relationship.
• Parametric Test
• If the information about the population is
completely known by means of its parameters then
statistical test is called parametric test.
• ∗ Eg: t- test, f-test, z-test, ANOVA

• Nonparametric test
• Nonparametric test
• If there is no knowledge about the population or
paramters, but still it is required to test the
hypothesis of the population. Then it is called non-
parametric test
• ∗ Eg: Chi-Square,mann-Whitney, rank sum test,
Kruskal-Wallis test
Poll
Regarding parametric Tests, Express your opinion
about the statements given below:-
A- information about the population is completely known
B- information about the population is partially known
Ans:
a) Statements A is Correct
b) Statements B is Correct
c) Both Statements A & B are Correct
Poll
Regarding non parametric Tests, Express your
opinion about the statements given below:-
A- no knowledge about the population or paramters
B- a little knowledge about the population or paramters
Ans:
a) Statements A is Correct
b) Statements B is Correct
c) Both Statements A & B are Correct
Difference between parametric and Non parametric
Difference between parametric and Non parametric
POLLS
The statement about not parametric test: No assumption is
made regarding the parent population is

a) True
b) False
POLLS
state whether ‘Test statistic is arbitrary’ non-parametric is

a) True
b) False
Polls
Q-A continuous probability can be represented by
Graph as well.
a) True
b) False

Q-A variable (Random Variable) assuming an infinite


number of values is called
a) Discrete Random Variable
b) Continuous Random Variable
c) None of these
Polls
Q-Area under the normal curve is equal to one .
a) True
b) False
Standard Normal Distribution
Standard Normal Distribution with a mean
of 0 and a standard deviation of 1.
https://www.youtube.com/watch?v=p_KApjpyBHE
Polls
Q- Around 68 percent area is covered under - + one
standard deviation.
a) True
b) False
Exercise

• Find the area under the standard normal


distribution curve between z = 1.62 and z =
−1.35.
Solution

You might also like