You are on page 1of 10

Lab 9 - Chi-Squared Testing for Independence

Mansi Kumari(7908159 )

2023-04-02

Learning Objectives

By the end of this lab, you should have a grasp on the following concepts:

• How to conduct a ‰2 -test for independence in R.


• How to conduct a ‰2 -test for homogeneity in R.

Instructions

To complete this worksheet, add code as needed into the R code chunks given below. Do not delete the
question text. All text should be in complete English sentences. Be sure to change the author of this file to
reflect your name and student number.
To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your
email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it
to .pdf and upload your output to Crowdmark.

1
Exercises
In today’s lab, we will cover how to conduct a full ‰2 test for independence in R.
Importing the Census2000 dataset, which contains the following survey information on a sample of 2000
Grade 12 students across the United States:

• Gender: The gender of the student (only two options were given on the survey: Male or Female)
• Handed: The handedness of the student (either left- or right-handed).
• Travel_to_School: How the student travels to school (either “Car”, “Bus”, or “Walk”).
• Vegetarian: Whether or not the student is vegetarian.
• Preferred_Status: Whether the student would prefer to be famous, happy, healthy, or rich.

Census2000 <- read.csv("~/Downloads/Census2000.csv")


str(Census2000)

## ’data.frame’: 2000 obs. of 5 variables:


## $ Gender : chr "Male" "Male" "Male" "Female" ...
## $ Handed : chr "Right-Handed" "Left-Handed" "Right-Handed" "Left-Handed" ...
## $ Travel_to_School: chr "Car" "Car" "Bus" "Car" ...
## $ Vegetarian : chr "No" "No" "No" "No" ...
## $ Preferred_Status: chr "Happy" "Happy" "Happy" "Happy" ...

Visualize each variable with a bar graph.

barplot(table(Census2000$Gender))
800 1000
600
400
200
0

Female Male

2
barplot(table(Census2000$Handed))

1500
1000
500
0

Left−Handed Right−Handed

barplot(table(Census2000$Travel_to_School))

3
1500
1000
500
0

Bus Car Walk

barplot(table(Census2000$Vegetarian))

4
1500
1000
500
0

No Yes

barplot(table(Census2000$Preferred_Status))

5
1200
200 400 600 800
0

Famous Happy Healthy Rich

Suppose we wish to determine if a student’s gender bears any relationship to his or her handedness with a
‰2 test (at the 5% level of significance).
Give the hypotheses of this test.

H0 : Gender and Handedness are independent vs Ha : Gender and Handedness are dependent

Conduct this test, using the chisq.test function.

chisq.GvH <-chisq.test(Census2000$Gender, Census2000$Handed, correct = FALSE)


chisq.GvH

##
## Pearson’s Chi-squared test
##
## data: Census2000$Gender and Census2000$Handed
## X-squared = 1.156, df = 1, p-value = 0.2823

What is your conclusion to this test?


Since the p-value is above 0.05, we fail to reject Ho. That is, we have insufficient evidence to conclude that
there is a relationship between a student’s gender and handedness.
Use chisq.test to produce the observed and expected tables of this test.

6
chisq.GvH$observed

## Census2000$Handed
## Census2000$Gender Left-Handed Right-Handed
## Female 116 961
## Male 86 837

chisq.GvH$expected

## Census2000$Handed
## Census2000$Gender Left-Handed Right-Handed
## Female 108.777 968.223
## Male 93.223 829.777

Use the sum function to calculate the test statistic for this test.

sum(((chisq.GvH$observed - chisq.GvH$expected)ˆ2)/chisq.GvH$expected)

## [1] 1.156024

Use pchisq to determine the p-value of this test.

1 - pchisq(1.156, 1)

## [1] 0.2822967

Run a ‰2 test (at the 5% level) to see if there is a relationship between how a student travels to school (X),
and whether or not the student is a vegetarian (Y ).
Give the hypotheses of this test.

H0 : X and Handedness are Y vs Ha : X and Y are dependent

Use chisq.test to determine the p-value of this test.

chisq.TvV <- chisq.test(Census2000$Travel_to_School, Census2000$Vegetarian, correct = FALSE)


chisq.TvV

##
## Pearson’s Chi-squared test
##
## data: Census2000$Travel_to_School and Census2000$Vegetarian
## X-squared = 9.7644, df = 2, p-value = 0.00758

What is your conclusion to this test?

Use chisq.test to produce the observed and expected tables of this test.

7
chisq.TvV$observed

## Census2000$Vegetarian
## Census2000$Travel_to_School No Yes
## Bus 232 9
## Car 1571 62
## Walk 114 12

chisq.TvV$expected

## Census2000$Vegetarian
## Census2000$Travel_to_School No Yes
## Bus 230.9985 10.0015
## Car 1565.2305 67.7695
## Walk 120.7710 5.2290

Exercise: Use chisq.test to conduct ‰2 test (at the 5% level of significance) to determine if
there is a relationship between a student’s preferred status in life, and a student’s method of
travel to school. First, produce the p-value of the test.

chisq.test(Census2000$Travel_to_School,Census2000$Preferred_Status,correct = FALSE)

## Warning in chisq.test(Census2000$Travel_to_School,
## Census2000$Preferred_Status, : Chi-squared approximation may be incorrect

##
## Pearson’s Chi-squared test
##
## data: Census2000$Travel_to_School and Census2000$Preferred_Status
## X-squared = 9.6368, df = 6, p-value = 0.1408

Exercise: Give a fully worded conclusion to this test.

Load in the Company500 dataset.

Company500 <- read.csv("~/Downloads/Company500.csv")

Our goal is to see whether the same proportion of people are salaried for each income bracket. Use the table
function to get the counts comparing employment status to age bracket.

my.table <- table(Company500$Earnings.Bracket, Company500$Status)


my.table

##
## Hourly Salaried
## 30k-65k 43 23
## 65k-90k 115 34
## 90k+ 193 92

8
Use the function prop.table to get the proportions from this table.

prop.table(my.table, margin = 1)

##
## Hourly Salaried
## 30k-65k 0.6515152 0.3484848
## 65k-90k 0.7718121 0.2281879
## 90k+ 0.6771930 0.3228070

Use chisq.test to produce the p-value for this test.

chisq.test(Company500$Earnings.Bracket, Company500$Status, correct = FALSE)

##
## Pearson’s Chi-squared test
##
## data: Company500$Earnings.Bracket and Company500$Status
## X-squared = 5.1138, df = 2, p-value = 0.07754

What is your conclusion to this test?

Exercise: Perform a ‰2 test of homogeneity at the 1% level to determine whether the propor-
tion of salaried employees differs according to their department. Begin by using prop.table to
see the proportion of salaried employees per department. Is there one department in particular
that stands out from the rest?

my_table<- table(Company500$Department,Company500$Status)
my_table

##
## Hourly Salaried
## Admin 56 48
## CS 36 33
## Mfg 25 23
## RnD 82 32
## Xport 152 13

prop.table(my_table,margin=1)

##
## Hourly Salaried
## Admin 0.53846154 0.46153846
## CS 0.52173913 0.47826087
## Mfg 0.52083333 0.47916667
## RnD 0.71929825 0.28070175
## Xport 0.92121212 0.07878788

9
From the table we can see that transportation department is significantly different from the other departments
Exercise: Use chisq.test to determine the p-value of the appropriate test.

chisq.test(Company500$Department,Company500$Status,correct = FALSE)

##
## Pearson’s Chi-squared test
##
## data: Company500$Department and Company500$Status
## X-squared = 69.609, df = 4, p-value = 2.745e-14

Exercise: Give a fully-worded conclusion to this test.

10

You might also like