You are on page 1of 56

Cohort studies

Following groups of subject over time

Classifications of research design

Laboratory, clinical, community Basic vs. applied Translational research (from bench to bed) Descriptive vs. analytic Observational vs. interventional Prospective vs. retrospective Quantitative vs. qualitative
Not mutually exclusive: basic research may be descriptive or analytic Clinical research may be observational or experimental, etc

Basic study designs

Observational Case reports Case series Cross-sectional studies Case-control studies Cohort studies Meta-analysis Interventional Lab experiments Clinical trials Field trials

Observational Study
A nonexperimental analytic study in which the investigator monitors, but does not influence, the exposure status of individual subjects and their subsequent disease status

Experimental Study
A study in which the investigator influences the exposure status of individual subjects (independent variable) and then monitors the subjects outcome (dependent variable)

Cohort = ancient Roman term = a group of soldiers that marched together into battle In clinical study = a group of person followed up together over time

Cohort studies
Descriptive: To describe incidence of certain outcome over time (absolute risk) Analytic: To analyze association between risk factors and outcomes (relative risk)

Prospective cohort
Select a sample from the population Measure predictor variable (present of absent) Follow-up the cohort Measure outcome (present or absent)

Advantages of cohort studies

1. Can assess several outcomes
2. Time-order generally clear 3. Prospective control over exposure and outcome measurement possible (in prospective studies) 4. Somewhat less potential for bias than case-control studies, but equal potential for confounding

Disadvantages of cohort studies

1. Generally require large samples
2. Not useful for rare outcomes 3. As an observational study, can never be assumed to be free of confounding and bias 4. Must usually control for potential confounding in the analysis, though can control in the design 5. Prone to loss to follow up / drop outs

Retrospective cohort
Basically the same with prospective Basic measures, follow up, outcome all in the past Only possible if all data are complete and prepared for other purpose

Retrospective cohort
Assemble cohort in the past Measure risk factors Follow-up Measure outcomes Analyze

Retrospective cohort: strength

Much less costly and efficient Less time consuming All subjects (assumed) from same population

Retrospective cohort; weakness

Secondary data May not include necessary data

Steps in planning cohort studies: things to consider

When to use cohort design Choosing among cohort designs Selecting subjects Measuring predictor and confounding variables Following subjects and measuring outcomes Analyzing cohort: incidence and relative

When to use cohort design

Accurately describing incidence May be the only way to establish temporal sequence of risk and outcomes

Malnutrition in chronic diarrhea may be the result rather than a cause Avoiding survivor bias

Only way to study certain rapidly fatal diseases

Allow investigator to study multiple outcomes / ever increasing outcomes

Choosing among cohort designs

Retrospective Quick Economical Prospective Rapidly occurring outcome, e.g. discharge from nursery after bone facture When less expensive design fail to answer research question properly When case control studies give conflicting results When key measurements must be performed before outcomes occur

Selecting subjects
Define group of subjects at the beginning of the study (inception cohort), e.g., cervical cancer, stage 1-2 Select samples with rapidly occurring outcomes, e.g., hip fracture elderly woman Adequate sample size Control in double cohort should be selected by random sampling

Measuring predictor and confounding variables

The quality of the result depend largely on the accuracy of measuring predictor and outcome variables The validity of the result (cause-effect relationship) also depends on the measurement and control of confounding variables

Following subjects
Important: minimize loss to follow up!!! Strategies:
During design:
Restriction: exclude those likely to loss
Moving Unwilling to return

Planning for future tracking

Address, telephone, mobile, fax etc

During follow-up
Periodic contact
Phone, visits, etc

Other relevant measures

Analyzing results
Descriptive: Incidence rate, mean, proportions, SD, etc Analytic: Relative risk Other analysis:

Survival analysis Multivariate analysis as appropriate

Cohort study
Analysis Incidence Relative/incidence risk Relative risk (RR) = the ratio between the incidence of an effect in the exposed group to that in the non-exposed

Cohort study: example

20 Disease + 50 Exp. + 30 5 50 Exp. 45 Disease Disease +

Disease -

Cohort study: analysis

Disease + Disease -

Exp +




Exp -



Relative risk = incidence in expose/incidence in non-expo


RR = 20/50 : 5/50 = 4

RR = 4
The probability of developing the disease in exposed group is 4 x the probability of developing the disease in non-exposed group Exposed individuals are 4 x more likely to develop the disease compared with nonexposed CI is recommended; if CI include 1, then statistically not significant (the probability that the result is caused by chance is high)

Famous cohort studies

Population-based 1. Cardiovascular 2. Child health 3. Special exposures

1. Cardiovascular disease
Framingham, Ma Tecumseh, Mi Evans county, Ga (biracial) Muscatine, IA Bogalusa, LA (children)

2. Child health
National birthday trust studies
One week of births in England and Wales in 1946, 1958 and 1970

Project on premature infants

All births < 1,500 g or < 32 weeks in Holland in 1983

The national childrens study http://www.Nichd.Nih.Gov/about/despr/despr.H tm A study in Jakarta of 100,000 pregnancies with offspring followed to age 21?

3. Special exposures
Atomic Bomb Casualty Commission (ABCC): Hiroshima and Nagasaki survivors (effects of radiation) Dutch famine survivors (effects of starvation)
Seveso (effects of dioxin exposure)

Case-cohort design: purpose

The case-cohort design is used to reduce the costs of exposure assessment

Case-cohort design: approach

1. A population at risk is identified and screened for disease, and prevalent cases are omitted. 2. A case-identification procedure is developed to detect new cases of disease in the cohort. (So far all is the same as any cohort study)

Case-cohort design: approach

3. The whole cohort is subject to caseidentification, but only a random sample (called the sub-cohort) receives detailed exposure assessment. 4. The cases are those emerging in the population (both in and out of the sub-cohort); the controls are subjects in the sub-cohort who are not cases.

5. Analysis is like a cohort study. Since the sampling fraction is known, and the entire population is sampled for caseness, true incidences and relative risks can be calculated.

Nested case-control study

1. A case-control study that is nested in the cohort study 2. Purpose: to reduce cost of exposure assessment 3. In a cohort study (for other exposure), specimen is kept until the cohort study finishes 4. Subjects who developed outcome are chosen as CASE; the CONTROLS are selected randomly from the subjects who did not develop outcome 5. Assess risk factors for case and controls 6. Analysis is similar with case-control study

Case-Control Studies
(Adapted from slides by Schenker M)

After this session, you will be familiar with: The basic design features of a casecontrol study Rationale for applying case-control designs Limitations of case-control studies Example applications applying casecontrol designs

Case-control studies (1)

For rare disease
Subjects with disease are selected first Find subject without disease with similar characteristics Determine the exposure in case and in controls Compare / analysis: odds ratio

Case-control studies (2)

Odds = Probability of event / prob of non event Odds = p/(1-p)
Odds ratio (OR) shows how great a risk factor play role in the occurrence of a disease

Case-control study: example

Exp. Yes Exp. No 10 Case 40 2


(Disease Yes)

Exp. Yes

Control 50
Exp. No 48
(Disease No)

Case-control study: example

Disease + Disease -

Exp +



Exp -

40 50

48 50

88 100


Odds ratio = odds in exposed/odds non-exposed = (10/12 : 2/12)/(40/88: 48/88) = 6

OR = 6
Originally: The probability exposure in cases is 6 times than the probability of exposure in controls Mathematically similar to:
The probability of developing the disease in exposed group is 6 x the probability of developing the disease in non-exposed group Exposed individuals are 6 x more likely to develop the disease compared with nonexposed

If CI includes 1 not significant

Design of Case-Control Studies

The investigator selects cases with the disease, and appropriate controls without the disease and obtains data regarding past exposure to possible etiologic factors in both groups. The investigator then compares the frequency of exposure of the two groups.

Odds Ratio = a/c : b/d = ad / bc

Cas a e a c

E + E -

Contr ol b d

When to use a case-control approach

1. Rare disease: Case-control approaches are the most efficient for rare diseases, e.g., idiopathic pulmonary fibrosis, most cancers. Cohort approaches would require large populations and prohibitive expense and follow-up time. Case-control designs may also be appropriate for more common diseases, such as COPD.

2. Case ascertainment system in place: The conduct of a case-control study may be facilitated by the availability of a case-ascertainment system. a) Population-based cancer registry b) Hospital-based surveillance systems c) Mandated disease reporting systems 3.When funding and time constraints are not compatible with a cohort study.

Issues in Case-Control Studies

A. Issues in Ascertainment of Cases 1. Diagnostic criteria for case studies a) Specificity e.g. lung cancer vs wheezing b) Diagnostic bias c) Validation 2. Sources (hospital, general population) 3. Incident or prevalent cases

Issues in Selection of Controls

1. General questions a) Conceptual (i) Should the controls be comparable to the cases in all respects other than having the disease? (ii) Should the controls be representative of all non-diseased people in the population from which the cases are selected?

Total Population

Reference Population



b) Practical Questions (i) Is the approach selected for control selection feasible? (ii) Can this approach be used given the funds available?

Sources of controls
a) b) c) d) Population of defined area Hospital patients Probability sample of total population Neighbors
i. walk (door to door) ii. phone (random digit dialing) iii. letter carrier routes

e) Friends or associates of cases f) Siblings, spouses or other relatives g) Other

Methodologic Issues
1. Handling potential confounding factors a) In the process of selecting controls: Matching The process of selecting controls so that they are similar to the cases in regard to certain characteristics such as age, sex and race. (i) Group matching (frequency matching, stratification) (ii) Individual matching (matched pairs)

C. Methodologic Issues
Handling potential confounding factors in matching: (iii) Problems with matching: - Matching on many variables may make it difficult or impossible to find an appropriate control. - Cannot explore possible association of disease with any variable on which cases and controls have been matched.

Methodologic Issues
Handling potential confounding factors in matching: b) In the process of selecting controls: Restriction c) In the data analysis: (i) Stratification (ii) Adjustment

Methodologic Issues
2. Evaluating Information on Exposure a) Problems of recall in case-control studies (i) Limitations in human ability to recall (ii) Recall bias (cases may remember their exposure with a higher or lower accuracy than controls do)

b) Avoiding other biases (i) Selection bias (ii) Information bias (iii) Non-response bias (iv) Analysis bias c) Validity testing (reliability, sensitivity and specificity)

Using Multiple Controls in Case-Control tudies a) Multiple controls of a similar type (e.g. 2 controls per case) b) Different types of controls (e.g. hospital and neighborhood controls)

Advantages of Nested CaseControl Studies

1. Possibility of recall bias is eliminated, since data on exposure are obtained before disease develops. 2. Exposure data are more likely to represent the pre-illness state since they are obtained years before clinical illness is diagnosed. 3. Costs are reduced compared to those of a prospective study, since laboratory tests need to be done only on specimens from subjects who are later chosen as cases or as controls.