You are on page 1of 10

Find the Discriminant Function

The discriminant function is written as:

D = b0 + b1X1 + b2X2 +….+ bkXk

Here, ‘D’ is the discriminant score, ‘b’ represents the coefficients or


weights for the predictor variables ‘X’.

You already know ‘X’. You need to estimate the values of ‘b’.

There are two ways to do this – direct and stepwise. In the direct
method, you include all the variables and estimate the coefficients for
all of them. In the other method, the variables are included one by
one, based on their ability to discriminate.

The number of discriminant functions required depends on the


number of groups and independent predictor variables. If there are
Ng groups and k predictors, then you need at least the minimum of
Ng-1 and k variables.

Determine the Significance of the Discriminant


Function
The function derived above should be statistically significant. one
method to check the significance is by using the eigenvalue of the
function. Larger eigenvalue implies better discrimination.

What is Wilks’ Lambda?

Wilks’ lambda (Λ) is a test statistic that’s reported in results


from MANOVA , discriminant analysis, and other multivariate
procedures. Other similar test statistics include Pillai’s trace
criterion and Roy’s ger criterion.

 In MANOVA, Λ tests if there are differences between group


means for a particular combination of dependent variables. It is similar
to the F-test statistic in ANOVA. Lambda is a measure of the percent
variance in dependent variables not explained by differences in levels of
the independent variable. A value of zero means that there isn’t any
variance not explained by the independent variable (which is ideal). In
other words, the closer to zero the statistic is, the more the variable in
question contributes to the model. You would reject the null
hypothesis when Wilk’s lambda is close to zero, although this should be
done in combination with a small p-value.
 In discriminant analysis, Wilk’s lambda tests how well each level
of independent variablecontributes to the model. The scale ranges
from 0 to 1, where 0 means total discrimination, and 1 means no
discrimination. Each independent variable is tested by putting it into
the model and then taking it out — generating a Λ statistic.
The significance of the change in Λ is measured with an F-test; if the
F-value is greater than the critical value, the variable is kept in the
model. This stepwise procedure is usually performed using software
like Minitab, R, or SPSS. The following SPSS output shows which
variables (from a list of a dozen or more) were kept in using this
procedure.

SPSS Wilk’s Lambda output. Image: Bournemouth University.


1 – λ in the denominator is the proportion of variance in dependent
variables explained by the model’s effect. Caution should be used in
interpreting results as this statistic tends to be biased, especially for
small samples.

Output Components

Wilks’ lambda output has several components, including:

 “Sig” or significance (p-value). If this is small, (i.e.


under .05) reject the null hypothesis.
 “Value” column in the output: the value of Wilk’s Lambda.
 “Statistic” is the F-statistic associated with the listed degrees of
freedom. It would be reported in APA format as F(df1,df2) = value. For
example, if you had an f-value of 36.612 with 1 and 2 degrees of freedom
you would report that as F(1,2) = 36.612.

Box’s Test for Cov. Matrices


Box’s M Test Basic Concepts
Overview

Box’s test is used to determine whether two or more covariance matrices are
equal. Bartlett’s test for homogeneity of variance presented in Homogeneity of
Variances is derived from Box’s test. One caution: Box’s test is sensitive to
departures from normality. If the samples come from non-normal distributions,
then Box’s test may simply be testing for non-normality.

Suppose that we have m independent populations and we want to test the null
hypothesis that the population covariance matrices are all equal, i.e.

H0: Σ1 = Σ2 =⋯= Σm

Now suppose that S1, …, Sm are sample covariance matrices from


the m populations where each Sj is based on nj independent observations each
consisting of k × 1 column vector (or alternatively a 1 × k row vector).
Functions

Now define S as the pooled covariance matrix

where n =

What is a Chi Square Test?

Watch the video for an overview of the tests:

Chi-Square Test Intro


Watch this video on YouTube.

Can’t see the video? Click here.


There are two types of chi-square tests. Both use the chi-square
statistic and distribution for different purposes:
 A chi-square goodness of fit test determines if sample data
matches a population. For more details on this type, see: Goodness of
Fit Test.
 A chi-square test for independence compares two variables in
a contingency table to see if they are related. In a more general sense,
it tests to see whether distributions of categorical variablesdiffer from
each another.
Back to Top

What is a Chi-Square Statistic?

The formula for the chi-square statistic used in the chi square test is:

The chi-square formula.


The subscript “c” is the degrees of freedom. “O” is your observed
value and E is your expected value. It’s very rare that you’ll want to
actually use this formula to find a critical chi-square value by hand.
The summation symbolmeans that you’ll have to perform a calculation
for every single data item in your data set. As you can probably
imagine, the calculations can get very, very, lengthy and tedious.
Instead, you’ll probably want to use technology:
 Chi Square Test in SPSS.
 Chi Square P-Value in Excel.
A chi-square statistic is one way to show a relationship between
two categorical variables. In statistics, there are two types of
variables: numerical (countable) variablesand non-numerical
(categorical) variables. The chi-squared statistic is a single number
that tells you how much difference exists between your observed
counts and the counts you would expect if there were no relationship at
all in the population.
There are a few variations on the chi-square statistic. Which one you
use depends upon how you collected the data and which hypothesis is
being tested. However, all of the variations use the same idea, which is
that you are comparing your expected values with the values you
actually collect. One of the most common forms can be used
for contingency tables:

Where O is the observed value, E is the expected value and “i” is the
“ith” position in the contingency table.
A low value for chi-square means there is a high correlation between
your two sets of data. In theory, if your observed and expected values
were equal (“no difference”) then chi-square would be zero — an event
that is unlikely to happen in real life. Deciding whether a chi-square
test statistic is large enough to indicate a statistically
significant difference isn’t as easy it seems. It would be nice if we
could say a chi-square test statistic >10 means a difference, but
unfortunately that isn’t the case.
You could take your calculated chi-square value and compare it to
a critical value from a chi-square table. If the chi-square value is more
than the critical value, then there is a significant difference.
What is a Chi Square Test?

Watch the video for an overview of the tests:

Chi-Square Test Intro


Watch this video on YouTube.

Can’t see the video? Click here.


There are two types of chi-square tests. Both use the chi-square
statistic and distribution for different purposes:
 A chi-square goodness of fit test determines if sample data
matches a population. For more details on this type, see: Goodness of
Fit Test.
 A chi-square test for independence compares two variables in
a contingency table to see if they are related. In a more general sense,
it tests to see whether distributions of categorical variablesdiffer from
each another.
Back to Top

What is a Chi-Square Statistic?

The formula for the chi-square statistic used in the chi square test is:

The chi-square formula.


The subscript “c” is the degrees of freedom. “O” is your observed
value and E is your expected value. It’s very rare that you’ll want to
actually use this formula to find a critical chi-square value by hand.
The summation symbolmeans that you’ll have to perform a calculation
for every single data item in your data set. As you can probably
imagine, the calculations can get very, very, lengthy and tedious.
Instead, you’ll probably want to use technology:
 Chi Square Test in SPSS.
 Chi Square P-Value in Excel.
A chi-square statistic is one way to show a relationship between
two categorical variables. In statistics, there are two types of
variables: numerical (countable) variablesand non-numerical
(categorical) variables. The chi-squared statistic is a single number
that tells you how much difference exists between your observed
counts and the counts you would expect if there were no relationship at
all in the population.
There are a few variations on the chi-square statistic. Which one you
use depends upon how you collected the data and which hypothesis is
being tested. However, all of the variations use the same idea, which is
that you are comparing your expected values with the values you
actually collect. One of the most common forms can be used
for contingency tables:

Where O is the observed value, E is the expected value and “i” is the
“ith” position in the contingency table.
A low value for chi-square means there is a high correlation between
your two sets of data. In theory, if your observed and expected values
were equal (“no difference”) then chi-square would be zero — an event
that is unlikely to happen in real life. Deciding whether a chi-square
test statistic is large enough to indicate a statistically
significant difference isn’t as easy it seems. It would be nice if we
could say a chi-square test statistic >10 means a difference, but
unfortunately that isn’t the case.
You could take your calculated chi-square value and compare it to
a critical value from a chi-square table. If the chi-square value is more
than the critical value, then there is a significant difference.

What Is a Chi-Square Test?

The Chi-Square test is a statistical procedure for determining the difference between
observed and expected data. This test can also be used to determine whether it
correlates to the categorical variables in our data. It helps to find out whether a
difference between two categorical variables is due to chance or a relationship
between them.

Chi-Square Test Definition

A chi-square test is a statistical test that is used to compare observed and expected
results. The goal of this test is to identify whether a disparity between actual and
predicted data is due to chance or to a link between the variables under consideration.
As a result, the chi-square test is an ideal choice for aiding in our understanding and
interpretation of the connection between our two categorical variables.

A chi-square test or comparable nonparametric test is required to test a hypothesis


regarding the distribution of a categorical variable. Categorical variables, which
indicate categories such as animals or countries, can be nominal or ordinal. They
cannot have a normal distribution since they can only have a few particular values.

For example, a meal delivery firm in India wants to investigate the link between
gender, geography, and people's food preferences.

It is used to calculate the difference between two categorical variables, which are:

 As a result of chance or
 Because of the relationship

Formula For Chi-Square Test

Where

c = Degrees of freedom

O = Observed Value

E = Expected Value

The degrees of freedom in a statistical calculation represent the number of variables


that can vary in a calculation. The degrees of freedom can be calculated to ensure that
chi-square tests are statistically valid. These tests are frequently used to compare
observed data with data that would be expected to be obtained if a particular
hypothesis were true.
There are two main types of Chi-Square tests namely -

1. Independence
2. Goodness-of-Fit

Independence

The Chi-Square Test of Independence is a derivable ( also known as inferential )


statistical test which examines whether the two sets of variables are likely to be
related with each other or not. This test is used when we have counts of values for two
nominal or categorical variables and is considered as non-parametric test. A relatively
large sample size and independence of obseravations are the required criteria for
conducting this test.

For Example-

In a movie theatre, suppose we made a list of movie genres. Let us consider this as the
first variable. The second variable is whether or not the people who came to watch
those genres of movies have bought snacks at the theatre. Here the null hypothesis is
that th genre of the film and whether people bought snacks or not are unrelatable. If
this is true, the movie genres don’t impact snack sales.

Goodness-Of-Fit

In statistical hypothesis testing, the Chi-Square Goodness-of-Fit test determines


whether a variable is likely to come from a given distribution or not. We must have a
set of data values and the idea of the distribution of this data. We can use this test
when we have value counts for categorical variables. This test demonstrates a way of
deciding if the data values have a “ good enough” fit for our idea or if it is a
representative sample data of the entire population.

For Example-

Suppose we have bags of balls with five different colours in each bag. The given
condition is that the bag should contain an equal number of balls of each colour. The
idea we would like to test here is that the proportions of the five colours of balls in
each bag must be exact.

You might also like