You are on page 1of 55

Virgen Milagrosa University Foundation

SAN CARLOS CITY, PANGASINAN PHILIPPINES

BIOSTATISTIC
S
Presented by:

Reynald Dizon.,RN
Benjelyn Joy Campos.,RPh
TOPIC:

Testing of Relationship of
Nominal data and chi
square.
• INTRODUCTION
• TEST OF
INDEPENDENCE
TOPIC:

• List of Properties of Chi


square distribution.
• Performing testing
hypothesis using chi square.
• Identify the limitations of chi
square application
What is Nominal Data?

In statistics, nominal data (also known as nominal scale)


is a type of data that is used to label variables without
providing any quantitative value. It is the simplest form
of a scale of measure. Unlike ordinal data, nominal data
cannot be ordered and cannot be measured.

Dissimilar to interval or ratio data, nominal data cannot


be manipulated using available mathematical
operators. Thus, the only measure of central
tendency for such type of data is the mode.
Characteristics of Nominal Data

Nominal data can be both qualitative and quantitative.


However, the quantitative labels lack a numerical value
or relationship (e.g., identification number). On the other
hand, various types of qualitative data can be
represented in nominal form. They may include words,
letters, and symbols. Also, names of people, gender, and
nationality are just a few of the most common examples
of nominal data.
How to Analyze Nominal Data?
Nominal data can be analyzed using the grouping method. The
variables can be grouped together into categories, and for each
category, the frequency or percentage can be calculated. The data
can also be presented visually such as by using a pie chart.

Attended
Skipped
FAILED
Attended
Skipped
PASS
PASS
6%
FAILED
25%
8%
15%
Whether attending
class influences how
students perform on
an exam
How to Analyze Nominal Data?
Although nominal data cannot be treated using mathematical
operators, they still can be analyzed using advanced statistical
methods.

For example, one way to analyze the data is through hypothesis


testing.
For nominal data, hypothesis testing can be carried out using
nonparametric tests such as the chi-squared test.

The chi-squared test aims to determine whether there is a


significant difference between the expected frequency and the
observed frequency of the given values.
 
What is the Chi-square test for? 

The Chi-square test is intended to test how likely it is that


an observed distribution is due to chance. It is also called
a "goodness of fit" statistic, because it measures how
well the observed distribution of data fits with the
distribution that is expected if the variables are
independent. 
GOODNESS AND FIT

• The chance
• Applied when you have one categorical variable from
single population
• Used to determine whether the sample is consistent
with a hypothesized distribution
• The expected value of the number of sample
observation in each level of the variable is atleast 5
Chi-Square as a Statistical Test

• Chi-square test: an inferential statistics


technique designed to test for significant
relationships between two variables organized in
a bivariate table.

• Chi-square requires no assumptions about the


shape of the population distribution from which a
sample is drawn.
Example

Pass Failed

Attended 25 6

Skipped 8 15
Another way to describe the Chi-square test is that it
tests the null hypothesis that the variables are
independent. The test compares the observed data
to a model that distributes the data according to the
expectation that the variables are independent.
Wherever the observed data doesn't fit the model,
the likelihood that the variables are dependent
becomes stronger, thus proving the null
hypothesis incorrect!
Example

Without
With Master’s
Master’s
Degree
Degree
Male 20 30

Female 30 20

The following table would represent a possible input


to the Chi-square test, using 2 variables to divide the
data: gender and number of people with masters
degree.
With Without
Master’s Master’s Total
Degree Degree

Male 20 30 50

Female 30 20 50

Total 50 50 100

This shows the another basic grid. However, this is


actually complete, in a sense; generally, the data
table should include "marginal" information giving
the total counts for each column and row, as well as
for the whole data set: 
With Without
Master’s Master’s Total
Degree Degree

Male 25 25 50

Female 25 25 50

Total 50 50 100

This is the information we would need to calculate


the likelihood that gender and academic degree are
independent.
Limitations of the Chi-Square Test

• The chi-square test does not give us much


information about the strength of the
relationship or its substantive significance in the
population.

• The chi-square test is sensitive to sample size.


The size of the calculated chi-square is directly
proportional to the size of the sample,
independent of the strength of the relationship
between the variables.

• The chi-square test is also sensitive to small


expected frequencies in one or more of the cells
in the table.
What is the Chi-square test NOT for? 

First of all, the Chi-square test is only meant to


test the probability of independence of a
distribution of data. It will NOT tell you any details
about the relationship between them. If you want to
calculate how much more likely it is that a woman
have masters degree than a man, the Chi-square
test is not going to be very helpful. However, once
you have determined the probability that the two
variables are related (using the Chi-square test), you
can use other methods to explore their interaction
in more detail.
Some further considerations are necessary when
selecting or organizing your data to run a Chi-square
test. The variables you consider must be mutually
exclusive; participation in one category should not
entail or allow participation in another. In other
words, the data from all of your cells should add up
to the total count, and no item should be counted
twice. 
Hypothesis Testing with Chi-Square

Chi-square follows five steps

1. Making assumptions (random sampling)

2. Stating the research and null hypotheses

3. Selecting the sampling distribution and specifying


the test statistic

4. Computing the test statistic

5. Making a decision and interpreting the results


The Assumptions

• The chi-square test requires no assumptions about


the shape of the population distribution from
which the sample was drawn.

• However, like all inferential techniques it assumes


random sampling.
Stating Research and Null Hypotheses

• The research hypothesis (H1) proposes that the two


variables are related in the population.

• The null hypothesis (H0) states that no association


exists between the two cross-tabulated variables in
the population, and therefore the variables are
statistically independent.
H1: The two variables are related in the population.
Gender and fear of walking alone at night are
statistically dependent.

AFRAID MEN WOMEN

No 71.1 % 71.1%

Yes 28.9% 28.9%

Total 100% 100 %


H0: : There is no association between the two variables.
Gender and fear of walking alone at night are statistically
independent.

AFRAID MEN WOMEN

No 83.3 % 57.2%

Yes 16.7 % 42.8%

Total 100% 100 %


The Concept of Expected Frequencies

Expected frequencies fe
the cell frequencies that would be expected in a bivariate
table if the two tables were statistically independent.

Observed frequencies fo
the cell frequencies actually observed in a bivariate table.
The Concept of Expected Frequencies

fe = (column marginal)(row marginal)


N

Chi-Square (obtained)
The test statistic that summarizes the differences between
the observed (fo) and the expected (fe) frequencies in a
bivariate table.
Calculating the Obtained Chi-Square

( fe  fo ) 2
 
2

fe
fe = expected frequencies
fo = observed frequencies
Chi Square distribution

A standard normal deviate is a random sample from


the standard normal distribution. The Chi Square
distribution is the distribution of the sum of
squared standard normal deviates.

The degrees of freedom of the distribution is equal to


the number of standard normal deviates being
summed. Therefore, Chi Square with one degree of
freedom, written as χ2(1), is simply the distribution
of a single normal deviate squared. The area of a Chi
Square distribution below 4 is the same as the area
of a standard normal distribution below 2, since 4 is
22.
Chi Square distribution

The Chi Square distribution is very important


because many test statistics are approximately
distributed as Chi Square. Two of the more common
tests using the Chi Square distribution are tests of
deviations of differences between theoretically
expected and observed frequencies (one-way tables)
and the relationship between categorical variables
(contingency tables). Numerous other tests beyond
the scope of this work are based on the Chi Square
distribution.
The Sampling Distribution of Chi-
Square
The sampling distribution of chi-square tells the
probability of getting values of chi-square,
assuming no relationship exists in the
population.

The chi-square sampling distributions depend on


the degrees of freedom.

The   sampling distribution is not one distribution,


but is a family of distributions.
The Sampling Distribution of Chi-
Square
The distributions are positively skewed. The
research hypothesis for the chi-square is always
a one-tailed test.
Chi-square values are always positive. The
minimum possible value is zero, with no upper
limit to its maximum value.
As the number of degrees of freedom increases,
the   distribution becomes more symmetrical.
Figure 1. Chi Square distributions with 4 degrees of freedom.
Determining the Degrees of Freedom

df = (r – 1)(c – 1)

where
r = the number of rows
c = the number of columns
Calculating Degrees of Freedom

How many degrees of freedom would a


table with 3 rows and 2 columns have?

(3 – 1)(2 – 1) = 2

2 degrees of freedom
The Chi-square Formula 

You can find many programs that will calculate a Chi-


square value for you. For now, however, let's start by
trying to understand the formula itself. 
Observed Expected
Chi squared value (Data Collected) (Data Predicted)
  2
 

Sum
EXAMP
LES
Example of GOODNESS AND FIT TEST
Eg. I was about to buy a restaurant and I ask my friend who is also a restaurant owner of the
level of customers each day and he said the following data Monday 30 people which we expect
in 10%,Tuesday 14 which we expect 10%, weds 34, which we expect to be 15%, Thursday 45
which we expect 20%,Friday 57 which is 30% and sat 20 which we expect to be 15% of the sales

Ho/null hypothesis-owner distribution is correct


H1/alternative hypothesis-owner dist is incorrect
Expected times the total observed divide into 100%
since the 10 is in percentage to get the EO
CHI SQUARE STATISTICS
LETS COMPUTE IN THE BOARD
Let’s
compute in
   
 
expected

 • and soooo on

5
 
CHI SQUARE= 11.435/11.44
LOOK FOR THE
DF=DEGREES OF FREEDOM

6 DAYS-1
DF=5

LEVEL OF SIGNIFICANCE
<0.05
��^2= 11.071

REJECTION VALUE
>0.05
Critical value=11.071
Chi square=11.44

11.44>11.07
Thus we reject our null hypothesis
Another way to describe the Chi-square test is that it tests
the null hypothesis that the variables are independent.
The test compares the observed data to a model that
distributes the data according to the expectation that the
variables are independent. Wherever the observed data
doesn't fit the model, the likelihood that the variables are
dependent becomes stronger, thus proving the null
hypothesis incorrect!
• evaluates the relationship between two
variables
• nonparametric test that is performed on
categorical (nominal or ordinal data)

Look to see whether being a member of one


category is independent of the other
500 Elementary School boys and girls are asked which is their
favorite color: blue, green, or pink? Results are shown below.
Blue Green Pink
Boys 100 150 20 300
Girls 20 30 180 200
120 180 200 N = 500

Using alpha = 0.05, would you conclude that there is a


relationship between gender and favorite color?
1. Define Null and Alternative Hypothesis

Ho: For the population of Elementary school students, gender and


favorite color are not related.
H1: For the population of Elementary school students, gender and
favorite color are related.
Calculate Degrees of Freedom

df = (rows – 1)(column – 1)
df = (2 – 1)(3 – 1)
df = (1)(2) = 2
State Decision Rule

If x2 is greater than 5.99, reject Ho


5. Calculate Test Statistics
( f  f ) 2 fc fr
  e
2 o fe =
n
fe
Expected Blue Green Pink
Boys 72 300
Girls 200
120 180 200 N = 500
(Boys, Blue) = (120 * 300)/500 = 72
5. Calculate Test Statistics
( f  f ) 2
2   e o
fe
(100 – 72)2+ (20 – 48)2+ (150 – 108)2+ (30 – 72)2 + (20 – 120)2 + (180 – 80)2
x 2 =
72 48 108 72 120 80
276.389
x2 =
State Conclusion

In the population, there is a relationship between


gender and favorite color
THANK YOU
FOR LISTENING
“To learn something new, you need to
try new things and not be afraid to be
wrong.” 

You might also like