You are on page 1of 33

Marriel Macaraig

Angeline Domalayas
Presented by:
I. Test for Goodness of Fit

The characteristics of the chi-square


distribution:

1. The chi-square distribution is a family of curves


based on the degrees of freedom.
2. The chi-square distribution are positively skewed.
3. All chi-square values are greater than or equal to
zero.
4. The total area under each chi-square distribution is
equal to 1.
What is Chi-square test of goodness
of fit?

• This is a test of difference between the observed


frequencies and the expected frequencies.

• It is used to test the claim that an observed frequency


distribution fits some given expected frequency
distributions.
How do we used Chi-square test of goodness
of fit?
Use the formula:

Where:
X² = the chi-square test
O = observed frequencies
E = expected frequencies
Example:
Suppose you wanted to see if there was a difference in the number of arrest in a certain
city for four types of crimes. A random sample of 160 arrest showed the following
distribution.
Larceny Property crimes Drug use Driving under the
thefts influence
38 50 28 44

The actual frequencies are called observed frequencies. The frequencies obtained by
calculation are called expected frequencies.

To get the expected frequencies we can use the formula:


n/k
Where:
n= total number of observation 160/4=40
k= number of categories
Larceny Theft Property Drug use Driving under
crimes the influence
Observed 38 50 28 44
Expected 40 40 40 40

Before computing the test value, you must state the hypotheses. The null
hypotheses should be a statement indicating that there is no difference or
no change.

Ho: There is no difference in the number of arrest for each type of crime.
H1: There is difference in the number of arrest for each type of crime.
SOLUTION:
STEP 1. State the hypotheses and identify the claim.
Ho: There is no difference of arrest for each type of crime. (claim)
H1: There is a difference of arrest for each type of crime.

STEP 2. Find the critical value. The degrees of freedom are 4-1= 3 and at
α=0.05 the critical value from table G in appendix A is 7.815.

STEP 3. Compute the test value. Note the expected values are found by
E=n/k (160/4=40).
X²=Ʃ(O-E)² = (38-40)² + (50-40)² + (28-40)² + (44-40)²
E 40 40 40 40

= 0.1+2.5+3.6+0.4
=6.6
STEP 4. Make the decision. The decision is to not reject the null
hypothesis, since 6.6 ˂ 7.815

STEP 5. Summarize the results. There is not enough evidence to


reject the claim that there is no difference in the number of arrest
for each type of crime.
CONCLUSIONS.

The chi-square computed value 6.6 is less than chi-square tabular value 7.815 at 0.05 level
of significance with 3 degrees of freedom, so it confirmed that it will not reject the null
hypothesis.
II. Test for Independence
• The chi-square independence test is used to
test whether two variables are independent

• The chi-square test for independence, also


called Pearson’s chi-square test or the chi-
square test of association, is used to test the
relationship between two categorical
variables.
Formula for the Chi-Square Independence Test

With degrees of freedom equal to the (number of rows minus 1)


(number of columns minus 1 and where

O= observed of frequencies
E= expected frequencies
The data for two variables are placed in a contingency table. One variable is
called the row variable, and the other variable is called the column variable. The
table is called an R x C table where R is the number of rows and C is the number
of columns.
Each value in the table is called a cell value.
The formula for computing the expected values for each cell
is:

Expected value= (row sum)(column sum)


grand total

To get the degree of freedom :


(R-1)(C-1)
As an example, suppose a new postoperative procedure is administered to a
number of patients in a large hospital. The researcher can ask the questions.
Do the doctors feel differently about this procedure from the nurses, or do
they feel basically the same way? Note that the question is not whether they
prefer the procedure but whether there is a difference of opinion between
the two groups.
Group Prefer new Prefer old No preference
procedure procedure
Nurses 100 80 20
Doctors 50 120 30

Since the main question is whether there is a difference in opinion, the null hypothesis
is stated as follows:
Ho: The opinion about the procedure is independent of the procedure
The alternative hypothesis is stated as follows:
H1: The opinion about the procedure is dependent on the profession.
To get the degree of freedom :
d·f = (R-1)(C-1)
d·f= (2-1)(3-1)= 2
1. Find the sum of each row and each column, and find the grand total.
Group Prefer new Prefer old No preference Total
procedure procedure
Nurses 100 80 20 200
Doctors 50 120 30 200

Total 150 200 50 400

2. For each cell, multiply the corresponding row sum by the column sum and divide by the grand total, to
get the expected value.

Expected value= (row sum)(column sum)


Expected value= (200)(150) = 75
grand total 400
2. For each cell, the expected value are computed as follows:

E= (200)(150)= 75 + (200)(200)= 100 + (200)(50)= 25 + (200)(150)= 75 + (200)(200)= 100 + (200)(50)= 25


400 400 400 400 400 400

EXPECTED VALUE
Group Prefer new Prefer old No Preference Total
procedure procedure
Nurses 75 100 25 200
Doctors 75 100 25 200
Total 150 200 50 400

3. The test value can now be computed by using the formula:

= (100-75) ² + (80-100) ² + (20-25) ² +(50-75) ² + (120-100) ² + (30-25) ²


75 100 25 75 100 25

=8.33+4+1+8.33+4+1
=26.667
4. Make decisions. Since the test value 26.667 is larger than the critical value 5.991, the decision is to
reject the null hypothesis.

Conclusion: There is enough evidence to support the claim that opinion is related to
(dependent on) profession, that is, that the doctors and nurses differ in their opinions
about the procedure.
III. Test for Homogeneity of
Proportions.
• The test of homogeneity of proportions is used to test the claim that
different populations have the same proportion of subjects who have
a certain attitude or characteristics.

• This test is concerned with two or more samples, with only one
criterion variable. This test is use to determine if two or more
population are homogenous. Its data distribution are similar with
respect to a particular criterion variable.
Example:
A psychologist randomly selected 100 people from each of four income groups and asked
them if they were “very happy”. For people who made less than $30,000. 24% responded
yes. For people who made $30,000 to $74,999. 33% responded yes. For people who made
$75,000 to $99,999, 38% responded yes, and for people who made $100, 000 or more, 49%
responded yes. At α = 0.05, test the claim that there is no difference in the proportion of
people in each economic group who were very happy.
Solution:
It is necessary to make a table showing the number of people in each group who responded yes and the
number of people in each group who responded no.
For group 1, 24% of the people responded yes, so 24% of 100 = 0.24(100)=24 responded yes and 100-
24=76 responded no.
For group 2, 33% of the people responded yes, so 33% of 100 = 0.33(100)=33 responded yes and 100-
33=67 responded no.
For group 3, 38% of the people responded yes, so 38% of 100 = 0.38(100)=38 responded yes and 100-
38=62 responded no.
For group 4, 49% of the people responded yes, so 49% of 100 = 0.49(100)=49 responded yes and 100-
49=51 responded no.
Tabulate the data in a table, and find the sums of the row and
columns as shown.

Household Less than $30,000- $75,000- $100,000 0r Total


Income $30,000 $74,999 $99,999 more
YES 24 33 38 49 144
No 76 76 62 51 256
Total 100 100 100 100 400

Step 1. State the hypotheses and identify the claim.


Ho: P1=P2=P3=P4 (claim)
H1: At least one proportion differs from the others.
Step 2. Find the critical value. The formula for the degrees of freedom is the same as
before:
(2-1)(4-1)=3 . The critical value is 7.815
Step 3. Compute the test value. Since we want to test the claim that the proportions are
equal, we use the expected values as shown previously.

E1,1 = (144)(100) =36 E1,2 = (144)(100) =36 E1,3 = (144)(100) =36 E1,4 (144)(100) =36
400 400 400 400

E2,1 = (256)(100) =64 E2,2 = (256)(100) =64 E2,3 =(256)(100) =64 E2,4 =(256)(100) =64
400 400 400 400

The completed table is shown.


Household Less than $30,000- $75,000- $100,000 0r Total
Income $30,000 $74,999 $99,999 more
YES 24(36) 33(36) 38(36) 49(36) 144
No 76(64) 76(64) 62(64) 51(64) 256
Total 100 100 100 100 400
Calculate:

=(24-36)² + (33-36)² + (38-36) ² + (49-36) ²


36 36 36 36

+(76-64) ² + (67-64) ² + (62-64) ² + (51-64) ²


64 64 64 64
=4.000 + 0.250 + 0.111 + 4.694 + 2.250 + 0.141 + 0.063 + 2.641
=14.150

Step 4. Make the decision. Reject the null hypothesis since 14.150 is greater than 7.815.

Step 5. Summarize the results. There is enough evidence to reject the claim that there is
no difference in proportions. Hence, the incomes seem to make a difference in the
proportions.
IV. Analysis of Variance ( ANOVA)
• When the F test is used to test a hypothesis concerning the
means of three or more populations, the technique is called
analysis of variance – ANOVA.

• The one-way analysis of variance test is used to test the equality of


three or more means using sample variances.

• The procedure used in this section is called the one-way analysis of


variance because there is only one independent variable that
distinguishes between the different population in the study. The
independent variable is called factor.
Characteristics of the F distribution:

1. The values of F cannot be negative, because variances are always


positive or zero.
2. The distribution is positively skewed.
3. The mean value of F is approximatel y equal to 1.
4. The F distribution is a family of curves based on the degrees of
freedom of the variance of the numerator and the degrees of
freedom of the variance of the denominator.

Even though you are comparing three or more means in this use of the
F test, variances are used in the test instead of means.
• When the F test, two different estimates of the population variance
are made. The first estimate is called the between-group variance.
• The second estimate, the within-group variance, is made by
computing the variance using all the data and is not affected by
differences in the means.
• If there is no difference in the means, the between-group variance
estimate will be approximately equal to one.

The formula for the F test is ;


F=variance between groups
variance within groups
The variance between groups measures the difference in the means that
result from the different treatments given to each group. To calculate
this value, it is necessary to find the grand mean is
XGM = ƩX
N
The value is used to find the between-group variance among the means
using the sample size as weights.

Where k = number of groups


ni = sample size
Xi=sample mean
This formula can be written out as:
Example: Miles per Gallong

A researcher wishes to see if there is a differences in the fuel economy


for city driving for three different types of automobiles: small
automobiles, sedans, and luxury automobiles. He randomly samples
four small automobiles, five sedans, and three luxury automobiles. The
miles per galloon for each is shown. At α=0.05, test the claim that there
is no difference among the means. The data are shown.
Small Sedans Luxury
36 43 29
44 35 25
34 30 24
35 29
40
Step 1. State the hypotheses and identify the claim.
Ho: μ1=μ2=μ3 (claim)
H1: At least one mean is different from the others.

Step 2. Find the critical value.


N= 12 k=3
d.f.N = k-1=3-1=2
d.f.D= N-k=12-3=9

The critical value from Table H in Appendix A with α=0.05 is 4.26.

Step 3. Compute the test value.


a. Find the mean and variance for each sample.
For the small cars: X= 37.25 s²=20.917
For the sedans: X= 35.4 s²= 37.3
For the luxury cars:X= 26 s²= 7
b. Find the grand mean.
XGM= ƩX = 36+ 44+34+35+ …+24 = 404 = 33.667
N 12 12

c. Find the between-group variance.

ˢ²B= Ʃn(X- XGM)²


k-1
= 4(37.25-33.667)² + 5(35.4-33.667) ²+ 3(26-33.667) ²
=242.717= 121.359
2
d. Find the within-group variance.
ˢ²W=Ʃ(ni-1)s² = (4-1)(20.917)+(5-1)(37.3)+(3-1)7
Ʃ(ni-1) (4-1)+ (5-1) + ( 3-1)

=225.951= 25.106
9
e. Find the F test value.
F= ˢ²B = 121.359 = 4.83
25.106
ˢ²W

Step 4. Make the decision. The test value 4.83 is greater than 4.26, so
the decision is to reject the null hypothesis.

Step 5. Summarize the results. There is enough evidence to conclude


that at least one mean is different from the others
Analysis of Variance Summary Table
Source Sum of d.f Mean squares F
squares
Between 242.717 2 121.359 4.83
Within (error) 225.954 9 25.106
Total 468.671 11
THANK YOU
SO MUCH!

You might also like