You are on page 1of 3

Biostatistics Session 2022–2023 (Semester 2)

Tutorial 2
1. To study the effect of smoking on lung cancer. All 101 lung cancer cases from patient
records of a specialized cancer hospital who had their first visit in 2010 were selected.
Among the lung cases, 65% were smokers while among their spouses only 40% smoked.
(a) Give the study design. Comment on the selection of cases and controls. Do they
represent the population?
(b) Give two probabilities which can be estimated from these data and give estimates of
these probabilities and their 95% confidence intervals.
(c) Compute an estimate of the odds ratio of the odds that the disease will occur given
being a smoker, compared to the odds of the disease occurring given being a non
smoker.
(d) Formulate a test statistic for the null hypothesis that there is no association between
exposure and disease. Carry out the test.
(e) Comment on the validity of the study design for estimation of this odds ratio.
(e) Comment on computing the relative risk using these data.

2. Consider the following investigation of the effect of maternal obesity on the event of an
emergency caesarean section. 1000 Pregnant women were selected and followed until de-
livery. The investigators over recruited obese women (n=100). Among the obese women,
15 had an emergency caesarian. In the non obese group, 50 had an emergency caesarian.
(a) Give the study design.
(b) Give two probabilities which can be estimated from these data and give estimates of
these probabilities and their 95% confidence intervals.
(c) Compute an estimate of the odds ratio of the odds of an emergency caesarean that the
disease for an obese mother, compared to the odds of an emergency caesarian for a
non obses mother.
(d) Formulate a test statistic for the null hypothesis that there is no association between
exposure and disease. Carry out the test

3. Investigators used a case control study to assess the effect of alcohol consumption on my-
ocardial infarction (MI). Their hospital-based case-control study comprised 374 partici-
pants who had a MI and 187 controls. The habit of binge drinking during the previous
12 months was significantly associated with myocardial infarction OR of 2.2 (95%CI =
1.2-4.2), Give an estimate for the standard error of the logarithm of this OR.

4. Small simulation study in R. Consider a hypothetical disease and exposure. In a population


the P(E = 1)=0.25, P(D = 1|E = 1)=0.25 and P(D = 1|E = 0)=0.5
(a) Give the odds ratio that D will occur for E versus non E in this population.
(b) Compute the probability of the disease in this population.
(c) Compute the following probabilities p1 =P(E = 1|D = 1) and p2 =P(E = 1|D = 0).

1
(d) You can use R to obtain observations from distributions. Try out the functions
rbionom() and rnorm for the binomial and the normal distribution respectively.
For example generate a series of ones and zeros of size 1000 with a probabilty of a
one of 0.3 and check whether indeed about one third of your sample is one. Do a
similar exercise for the normal distribution.
(e) Now generate data for a case control study. Assume you have 100 cases and 100
controls. Code to generate the exposure variables for cases and controls is as follows
(you need to fill in numbers for p1 and p2)
exposure<-as.vector(c(rbinom(100,1,p1),rbinom(100,1,p2)))
outcome<-as.vector(c(rep(1,100),rep(0,100)))
data<-cbind(outcome, exposure)
colnames(data)<-c(’outcome’,’exposure’)
(f) Check whether the probability of E = 1 in the cases and in the controls agrees with
your simulation settings.
(g) Use your sample to estimate the odds ratio of interest.

5. Consider a sample of genotypes consisting of 150 GG, 40 GA and 10 AA genotypes.


(a) Give the frequency of allele A.
(b) Give the expected genotype counts under Hardy Weinberg equilibrium.
(c) Give a test statistic to test the null hypothesis of HWE. Calculate this test. Formulate
your conclusion.
(d) Test the null hypothesis of HWE using R (see lecture)

6. Consider the genotypes AA, AB, BB. They have probabilities p, q and (1 − p − q) in the
population.
(a) Assume random mating. Give the six possible genotype combinations for two parents
and their probabilities.
(b) Give for each pair of parents, the genotype probabilities of having an offspring with
genotypes AA, AB or BB. Use Mendel’s first Law.
(c) Show that Hardy Weinberg Equilibrium holds after one generation under random
mating and by using Mendel’s first law.

7. A subject’s genotype consists of two alleles and can be AA, Aa or aa. As you know, a
genotype can be viewed as a pair of independent Bernoulli variables. In population I the A
allele has population frequency 0.1. In population II the frequency of the A allele is 0.2.
With probability 1/2 we select a subject from population I and otherwise we select a subject
from population II. Let the random variable X be the number of A alleles of this person.
Also, define the random variable Z as follows: Z = 1 if the subject we selected is from
population I, and Z = 0 otherwise.
(a) Find the genotype frequencies in population I.
(b) Find the genotype frequencies in population II.
(c) Find the expectation of X given Z = 1.

2
(d) Find the variance of X given Z = 1.
(e) Find the covariance between X and Z.

8. (a) An investigator wants to study whether there is a relation between alcholism and
pneunomia. They select 100 cases and 100 controls from a hospital. Now alcoholics
with pneunomia are more likely to be sent to the hospital. Comment on this study.
Which kind of bias occurs here?
(b) An investigator wants to study whether a flu jab protects agains covid. They take one
hospital and compare the frequency of Covid under nurses who took an anti flu jab
with nurses who did not take an anti flu jab. Comment on this study.
(c) We talk about information bias when the information obtained in cases differs from
the information obtained in controls. For example an interviewer may ask questions
differently to cases than controls. Comment on this.
(d) Find another example of selection bias and of information bias.
(e) Lung cancer is more frequent in men than in women. However smoking is the risk
factor for lung cancer. Men smokes more than women. An investigator performs a
case contol study to study the relationship between sex and lung cancer. Comment
on this study.

You might also like