Professional Documents
Culture Documents
Mansi Kumari(7908159 )
2023-04-02
Learning Objectives
By the end of this lab, you should have a grasp on the following concepts:
Instructions
To complete this worksheet, add code as needed into the R code chunks given below. Do not delete the
question text. All text should be in complete English sentences. Be sure to change the author of this file to
reflect your name and student number.
To properly see the questions, knit this .Rmd file to .pdf and view the output. You will have a link in your
email that takes you to the Crowdmark submission page. Once you have completed the worksheet, knit it
to .pdf and upload your output to Crowdmark.
1
Exercises
In today’s lab, we will cover how to conduct a full ‰2 test for independence in R.
Importing the Census2000 dataset, which contains the following survey information on a sample of 2000
Grade 12 students across the United States:
• Gender: The gender of the student (only two options were given on the survey: Male or Female)
• Handed: The handedness of the student (either left- or right-handed).
• Travel_to_School: How the student travels to school (either “Car”, “Bus”, or “Walk”).
• Vegetarian: Whether or not the student is vegetarian.
• Preferred_Status: Whether the student would prefer to be famous, happy, healthy, or rich.
barplot(table(Census2000$Gender))
800 1000
600
400
200
0
Female Male
2
barplot(table(Census2000$Handed))
1500
1000
500
0
Left−Handed Right−Handed
barplot(table(Census2000$Travel_to_School))
3
1500
1000
500
0
barplot(table(Census2000$Vegetarian))
4
1500
1000
500
0
No Yes
barplot(table(Census2000$Preferred_Status))
5
1200
200 400 600 800
0
Suppose we wish to determine if a student’s gender bears any relationship to his or her handedness with a
‰2 test (at the 5% level of significance).
Give the hypotheses of this test.
H0 : Gender and Handedness are independent vs Ha : Gender and Handedness are dependent
##
## Pearson’s Chi-squared test
##
## data: Census2000$Gender and Census2000$Handed
## X-squared = 1.156, df = 1, p-value = 0.2823
6
chisq.GvH$observed
## Census2000$Handed
## Census2000$Gender Left-Handed Right-Handed
## Female 116 961
## Male 86 837
chisq.GvH$expected
## Census2000$Handed
## Census2000$Gender Left-Handed Right-Handed
## Female 108.777 968.223
## Male 93.223 829.777
Use the sum function to calculate the test statistic for this test.
sum(((chisq.GvH$observed - chisq.GvH$expected)ˆ2)/chisq.GvH$expected)
## [1] 1.156024
1 - pchisq(1.156, 1)
## [1] 0.2822967
Run a ‰2 test (at the 5% level) to see if there is a relationship between how a student travels to school (X),
and whether or not the student is a vegetarian (Y ).
Give the hypotheses of this test.
##
## Pearson’s Chi-squared test
##
## data: Census2000$Travel_to_School and Census2000$Vegetarian
## X-squared = 9.7644, df = 2, p-value = 0.00758
Use chisq.test to produce the observed and expected tables of this test.
7
chisq.TvV$observed
## Census2000$Vegetarian
## Census2000$Travel_to_School No Yes
## Bus 232 9
## Car 1571 62
## Walk 114 12
chisq.TvV$expected
## Census2000$Vegetarian
## Census2000$Travel_to_School No Yes
## Bus 230.9985 10.0015
## Car 1565.2305 67.7695
## Walk 120.7710 5.2290
Exercise: Use chisq.test to conduct ‰2 test (at the 5% level of significance) to determine if
there is a relationship between a student’s preferred status in life, and a student’s method of
travel to school. First, produce the p-value of the test.
chisq.test(Census2000$Travel_to_School,Census2000$Preferred_Status,correct = FALSE)
## Warning in chisq.test(Census2000$Travel_to_School,
## Census2000$Preferred_Status, : Chi-squared approximation may be incorrect
##
## Pearson’s Chi-squared test
##
## data: Census2000$Travel_to_School and Census2000$Preferred_Status
## X-squared = 9.6368, df = 6, p-value = 0.1408
Our goal is to see whether the same proportion of people are salaried for each income bracket. Use the table
function to get the counts comparing employment status to age bracket.
##
## Hourly Salaried
## 30k-65k 43 23
## 65k-90k 115 34
## 90k+ 193 92
8
Use the function prop.table to get the proportions from this table.
prop.table(my.table, margin = 1)
##
## Hourly Salaried
## 30k-65k 0.6515152 0.3484848
## 65k-90k 0.7718121 0.2281879
## 90k+ 0.6771930 0.3228070
##
## Pearson’s Chi-squared test
##
## data: Company500$Earnings.Bracket and Company500$Status
## X-squared = 5.1138, df = 2, p-value = 0.07754
Exercise: Perform a ‰2 test of homogeneity at the 1% level to determine whether the propor-
tion of salaried employees differs according to their department. Begin by using prop.table to
see the proportion of salaried employees per department. Is there one department in particular
that stands out from the rest?
my_table<- table(Company500$Department,Company500$Status)
my_table
##
## Hourly Salaried
## Admin 56 48
## CS 36 33
## Mfg 25 23
## RnD 82 32
## Xport 152 13
prop.table(my_table,margin=1)
##
## Hourly Salaried
## Admin 0.53846154 0.46153846
## CS 0.52173913 0.47826087
## Mfg 0.52083333 0.47916667
## RnD 0.71929825 0.28070175
## Xport 0.92121212 0.07878788
9
From the table we can see that transportation department is significantly different from the other departments
Exercise: Use chisq.test to determine the p-value of the appropriate test.
chisq.test(Company500$Department,Company500$Status,correct = FALSE)
##
## Pearson’s Chi-squared test
##
## data: Company500$Department and Company500$Status
## X-squared = 69.609, df = 4, p-value = 2.745e-14
10