Seminar2 1

Assignments
WEEK2 - Seminar 2.1

Bente Cicilia i6331883
Statistical concepts to be discussed: RR vs. OR, Cross table, expected values, chi-square
statistic
Question 1
What measure of association is suitable to express the strength of the relation between one
binary variable and one continuous variable (see e.g. https://www.youtube.com/watch?
v=6pIG4W8wPzE)?
a. Pearson correlation -> variabelen kunnen binair of continu zijn
b. Relative risk
c. Odds ratio -> beide moeten binair zijn hier
d. Chi-square
Question 2
What measure of association is suitable to express the strength of the relation between two
continuous variables?
a. Pearson correlation
b. Relative risk
c. Odds ratio
d. Chi-square
Question 3
What measure of association is most suitable to express the strength of the relation
between two binary variables?
a. Pearson correlation
b. R-square of a regression analysis
c. Chi-square
d. Odds ratio -> wordt gebruikt voor sterkte van associaties
Question 4
For which design is the relative risk an inappropriate measure of association between two
binary variables?
a. A cohort study
b. A case-control study-> kunt geen incidentie en prevalentie berekenen
c. An experiment with two conditions and a binary outcome
Question 5
For which design(s) is the odds ratio an appropriate measure of association between two
binary variables?
a. A cohort study -> kun je ook gebruiken maar RR is hier beter interpreteerbaar
b. A case-control study -> relatieve risico wordt niet direct berekend omdat de
studiepopulatie niet wordt gevolgd over de tijd
c. An experiment with two conditions and a binary outcome -> ook bruikbaar hier,
maar
Question 6
What frequencies are compared with the observed frequencies when the chi-square test
statistics is obtained?
a. The frequencies that are expected if there is no relation between 2 variables->
Chi-square test tests for independence between two variables (X/Y) that are
both categorical/factors.
b. The frequencies that are expected if there is a relation between 2 variables
c. The frequencies that are expected if there is a linear relation between 2 variables
d. The frequencies that are expected if there is a non-linear relation between 2
variables
Question 7
The chi-square test statistic for a contingency table can be calculated to examine the
relation between
a. two continuous variables
b. one continuous and one categorical variable
c. two categorical variables
d. one continuous and one binary variable
Question 8
In a study rats were randomized into one of two conditions, one condition in which the rats
were put on a restricted diet, and one in which there were no restrictions in this respect
(“ad libitum”). For each rat it was registered whether the lifespan was less than 2 years
(yes or no). The data are as follows:
9Lifespan less than 2 years?
Yes No
Restricted diet 12 89
Ad libitum 54 37
Yes No
diet
a. Calculate the odds of a lifespan shorter than 2 years for rats on a restricted diet
Odds:
 Odds 1 = odds(support social distancing | EPH1026) = 13/15 = 0.87

 Odds 2 = odds(support social distancing | GZW1026) = 10/39 = 0.26
 Odds ratio (OR) = Odds1/Odds2 = 0.87/0.26 = 3.3
 Interpretation: The odds of supporting social distancing for EPH1026 students is 3.3
times the odds of supporting social distancing for GZW1026 students
12/89= 0,135 -> Voor elke rat dat langer dan twee jaar leeft is er 0,135 rat die
korter dan 2 jaar leeft
b. Calculate the odds of a lifespan shorter than 2 years for rats on a “free-eating” (“ad
libitum”) diet
54/37= 1,459 -> Voor elke rat dat langer dan twee jaar leeft is er 1,459 rat die
korter dan 2 jaar leeft
c. Calculate the odds ratio for a restricted diet versus a “free-eating” diet. Give an
interpretation of this odds ratio.
0,135/1,459= 0,092 -> OR is Kleiner dan 1= negatief verband (slides blok 3)
d. Calculate the risk of a lifespan shorter than 2 years for rats on a restricted diet
12/ 12+89= 0,119
e. Calculate the risk of a lifespan shorter than 2 years for rats on a “free-eating” (“ad
libitum”) diet
54/54+37= 0,593
f. Calculate the relative risk for a restricted diet versus a “free-eating diet”. Give an
interpretation of this relative risk.
0,119/0,593= 0,201 (hoe verder weg van 1 hoe verder het verband dus OR
sterker verband)
g. Assume that there is no relation between the rat’s life span and the type of diet.
Calculate the expected number of rats for each of the cells in the table as given
above.
Restricted diet and over yrs: (89+12) *(89+37)/192= 66,281
Restricted diet and less than 2 yrs: (89+12) * (12+54)/192= 34,718
AD diet and less than 2 yrs: (37+54)*(12+54)/192= 31,281
AD diet and over 2 years: (37+54)*(89+37)/192= 59,718
Orginele formule klopt ook enkel verkeerde groep a en b en c en d gegeven

Yes and restricted = (89+12) * (12+54)/ 192= 34,718
No and restricted: (89+12) * (89+37)/192= 66,281
Yes ad lib= (54+37) * (12+54)/192= 31, 281
No ad lib: (54+37)* (89+37)/192= 59, 718
 Yes restricted: 101 * (66/192)

 Yes ad lib = 91* 66/192
 No restricted = 101 * 126/192
 No ad lib 91 * 126/192
h. Next calculate the chi-square statistic for the data in the table.
chi-square = [(12-42.68)2 / 42.68] + [(89-58.32)2 / 58.32] + [(54-42.68)2 / 42.68] +

[(37-58.32)2 / 58.32] = 11.53 + 13.48 + 2.95 + 10.06 = 37.02 = = 47,782
 Een groot chi-square duidt aan op een groot verband tussen variabelen en een
kleine verwijst op afwezigheid van een verband
Question 9
In a case-control study 200 patients with a liver disease are involved. Among these 200
patients, 48 persons are heavy drinkers. In the control group of 195 persons, who are free
of liver diseases, there are 32 heavy drinkers.
a. Put the data from the case–control study into a contingency table.
Cases Control
Heavy drinkers 48 32 80
Non heavy 152 163
drinkers 315
395
b. Which measure(s) of association is (are) meaningful here: odds ratio, relative risk or
both?
Case control study-> OR
c. Calculate the appropriate measure(s) of association.

(48/80) / (32/80)
---------------------------- = 1,5/0,932= 1,609
(152/315) / (163/315)
d. Give an interpretation of the calculated measure(s) of association.

Heavy drinkers have a 1,609 higher chance of getting a liver disease -> de
odds van het krijgen van lever ziektes bij iemand dat veel drinkt is 1,609 keer
de odds voor iemand dat niet veel drinkt
Question 10
A cohort of 10.000 persons is followed for 15 years. Among 4000 smokers there were 124
persons that developed lung cancer. Out of the remaining 6000 non-smokers 65 persons
developed lung cancer.
a. Put the data from this study into a contingency table.
Lungcancer yes Lungcancer no
smoker 124 3876 4000
6000
Non smoker 65 5935
10 000
b. Which measure(s) of association can be calculated here: odds ratio, relative risk or
both? Beide kunnen maar RR is beter
c. Calculate the appropriate measure(s) of association.
124/4000= 0,031
65/6000=0,011
RR= 0,031/0,011= 2,818
d. Give an interpretation of the calculated measure(s) of association.
Iemand dat rookt heeft 2,818 meer kans op longkanker
Question 11
In a cross-sectional study, researchers were interested to investigate the association
between smoking status (no vs yes) and coffee consumption (drinker vs non-drinker) in
healthy adults. The data are as follows:
Smoking status
No Yes
Coffee drinkers 5134 9189
Non-coffee 1052 821
drinkers
a. Calculate the probability of smoking for both coffee and non-coffee drinkers.
Coffee drinkers: 9189/14 323= 0,642 = 64,2% van de koffie drinkers heeft kans op
roken
Non coffee drinkers: 821/1873=0,438 = 43,8% van de non coffee drinkers heeft kans
op roken
b. Calculate the relative risk of smoking for coffee drinkers versus non-coffee drinkers.
RISK1/RISK 2= 0,642/0,438= 1,466
c. Give an interpretation of the calculated relative risk.
Iemand dat koffie drinkt heeft 1,466 meer kans op roken dan iemand die niet koffie
drinkt
 Ook: roken en koffie drinken kunnen beiden beinvloed worden door iets
anders -> geen causaal verband maar wel associatie
d. Assuming no relation between drinking coffee and smoking status, calculate the
expected number of persons in each of the cells of the above contingency table.
Ramtotaal . ramtotaal / algemeen total
Volledig totaal= 16 196
Coffe drinkers yes + no = 14 323
Non coffie drinkers yes +no = 1873
Drinkt koffie rookt niet: A+b * a+c / N = (5134 + 9189) * (5134+1052) / 16 196=
5470, 615
Drinkt koffie en rookt = A+b * b+d/N= (5134 + 9189) * (9189+ 821)/ 16196=
8852,385
Rookt niet en drinkt geen koffie= (c+d)*(a+c)/N= (1052+ 821) * (5134+ 1052) /
16196= 715, 385
Rookt niet en drinkt wel koffie= (c+d) * (b+d) / N= 1157,615
e. Obtain the chi-square statistic for the above data.

Drinkt koffie rookt niet = (5134-5470,615)2 /5470,615= 20, 712
Drinkt koffie en rookt= (9189 – 8852,385)2/ 8852,385= 12,799
Rookt niet en drinkt geen koffie= (1052- 715,385)2 / 715,385=158, 389
Rookt niet en drinkt koffie= (821- 1157,615)2 / 1157,615= 97, 882
 20, 712 + 12,799 + 158, 389 + 97, 882= 290, 782
Question 12 see SPSS instructions below!

Consider the table in problem 11. Create the corresponding dataset in SPSS and analyze the
data to address the following questions:
a. Ask for table in which also the expected frequencies for each cell of the table are
displayed. Check whether they are the same as the ones that you calculated in
problem 11d.
b. Also examine the chi-square statistic and check whether it is the same as the one
calculated in problem 11e.
c. In the SPSS output, can you also find the relative risk, which you calculated in
problem 11b?
SPSS instructions Problem 12
See also Andy Field ch.19.5 about chi-square in SPSS.
First create a .sav file in SPSS (e.g., smoke.sav)
Create 3 columns in your dataset:
• A column named coffee, which indicates whether an individual is a coffee drinker
(code 1) or not (code 2)
• A column named smoke indicating whether an individual is smoker (code 1) or not
(code 2)
• A column named freq containing how many subjects belong to each of the cells of
the cross table.
Next choose: Data → Weight cases, and select Weight cases by then select the variable
freq for the Frequency Variable, and press the OK button.
Save your data set before computing some statistics
Perform the required analyses: Analyze → Descriptive Statistics → Crosstabs
• For the Row(s) box select the variable coffee
• For the Column(s) box select the variable smoke
• Press the Exact button and select Exact, then press continue
• Press the Statistics button and select Chi-square and Risk, then press continue
• Press the Cells button and select Expected (next to Observed), then press continue

Analyze the data on the effects of the dosage of a vaccine on the occurrence of chickenpox
as obtained in an experiment.
Dataset: Vaccine.sav
a. What percentage of persons gets chickenpox for each of the dosages of the vaccine?
b. Is there a trend in these percentages across the different dosages?
Hoe meer diluted een vaccinatie is, hoe minder succes de vaccinatie heeft (udiluted =
95%, diluted 1:10 = 70% en diluted 1:100 = 15%)
c. Would you expect a large or a small chi-square?
Een hoge -> want er is wel een verband te zien

Read the SPSS data file: vaccine.sav
There are three variables (columns) in this data set:
• Dossage_vaccine (0 = undiluted, 1 = diluted 1 : 10, 2 = diluted 1 : 100)
• Succes (1 = no chickenpox, 0 = chickenpox)
• Freq (number of persons, weight in the analysis)
Data → Weight cases, and select Weight cases by then select the variable freq for the
Frequency Variable, and press the OK button. Save your dataset before computing some
statistics
• For the Row(s) box select the variable dosage_vaccine
• For the Column(s) box select the variable succes

• Press the Statistics button and select Chi-square, then press continue
• Press the Cells button and select Counts Expected (next to Observed) and
Percentages Row, then press continue
• To run the analysis click the OK button

Coffee is one of the most consumed beverages in the world. However, there exist
inconsistent results for coffee consumption and bladder cancer risk in epidemiological
studies. Using a large, international database, the research team aimed to increase the
understanding of the association between coffee consumption and bladder cancer risk. The
following table shows a cross-table of coffee consumption (per day) and bladder cancer
(yes vs no) among female participants.
Dataset: Bladder.sav
Bladder cancer
No Yes
Neve 871 130
r
<=1 853 171
1-2 1071 212
2-3 1006 156
3-4 835 155
>4 2553 448
a. What percentage of females developed bladder cancer within each category of
coffee consumption?
b. Is there a pattern in these percentages across coffee consumption?
Nee geen verband te zien
c. Would you expect a large or a small chi-square?
Klein want er is geen verband

Read the SPSS data file: Bladder.sav Note that as opposed to the vaccine dataset
(vaccine.sav), we now have individual data and no aggregated data. So there is no weight
variable (frequency) in the data set.
There exist two variables (columns):
• Bladder cancer (0 = No, 1 = Yes)
• Coffee consumption (from Never to more than 4 cups per day)
• For the Row(s) box select the variable Coffee consumption
• For the Column(s) box select the variable Bladder cancer
• Press the Statistics button and select Chi-square, then press continue
• Press the Cells button and select Counts Expected (next to Observed) and
Percentages Row, then press continue
• To run the analysis click the OK button

Seminar2 1

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Seminar2 1

Uploaded by

Copyright:

Available Formats

Assignments

WEEK2 - Seminar 2.1

 Odds 1 = odds(support social distancing | EPH1026) = 13/15 = 0.87

Orginele formule klopt ook enkel verkeerde groep a en b en c en d gegeven

 Yes restricted: 101 * (66/192)

chi-square = [(12-42.68)2 / 42.68] + [(89-58.32)2 / 58.32] + [(54-42.68)2 / 42.68] +

c. Calculate the appropriate measure(s) of association.

d. Give an interpretation of the calculated measure(s) of association.

e. Obtain the chi-square statistic for the above data.

Question 12 see SPSS instructions below!

• For the Column(s) box select the variable smoke

Question 13 see SPSS instructions below!

SPSS instructions Problem 13

• For the Column(s) box select the variable succes

• To run the analysis click the OK button

Question 14 see SPSS instructions below!

SPSS instructions Problem 14

• For the Column(s) box select the variable Bladder cancer

• To run the analysis click the OK button

You might also like