Cohort Studies: Advantages/Disadvantages Study populations/Selection Biases Measuring exposures and outcomes Expressing outcomes - incidence

Survival analysis
By: Dr. Dick Menzies

Cohort Studies – General
• The general idea of a cohort study is that a group of persons are identified who do not have a disease and are defined on the basis of different exposures.
– Can measure multiple exposures

• These are then followed and the occurrence of disease is measured in the population over a period of time.
– Can measure multiple diseases

Experimental vs cohort studies
• Expt studies are a form of cohort study
– Persons are free of disease at outset – Some are exposed, others not – Measure occurrence of disease/cure/etc over time

• But, the term cohort studies is usually reserved for observational studies – ie exposures are not assigned, but occur naturally, or are chosen purposely by subjects, or by their MD‟s, etc

Advantages of cohort studies over experimental
• Ideal to study natural history, course of disease, prognostic factors. • Etiologic research as many exposures can not be controlled experimentally for ethical reasons
– Smoking, asbestos, air pollution

• Interventions not feasible for randomization
– Diagnostic tests, personalized management

• Some outcomes not well measured in trials:
– Compliance by patients and MD‟s,

intensive care. elderly. minimal.Advantages of cohort studies over experimental • Total population can be studied. Include children. or advanced disease (all usually excluded in RCT – esp Pharma trials) – Findings more likely to be applicable in real world – Adverse effects of interventions will be much more accurately measured – Population based estimates of exposure effects can be made • This implies that in cohort studies you MUST include as wide a spectrum of patients as possible . mentally incompetent. and people with early.

may bias (a lot) • Outcome assessment can be biased .Some disadvantages • Selection bias – Persons who get exposed not same as unexposed – Surgery – who is „operable‟ vs „inoperable‟ – Smoking – not the only difference – Healthy worker effect • Exposures that seem same. may not be – Also potential bias in measuring • Drop-outs – reduce power.

Worked in gas mask factory 1940-45) – Outcomes may be ascertained directly.Cohort Studies – Temporal relationships • Prospective: Subjects without disease are enrolled and then followed over time to determine occurrence (incidence) of diseases (outcomes) – Exposures are usually measured directly at baseline. AND occurred well before disease (useful for diseases like cancer) . and may be measured concurrently with outcomes • Retrospective: Exposure is defined based upon a past single event (eg. Hiroshima survivors) or period of exposure (eg. or also have already occurred – Key – exposure well defined.

patients KNOW they have disease already • Observer bias . .exposure assessment skewed by knowledge of disease status. esp if made repeatedly – Eliminates bias in measurement of exposures: • Recall .Advantages of cohort over case-control or other retrospective designs • KEY – exposure measurement is made before disease occurs – Accuracy of exposure usually better than retrospective.

as if a cohort design was used • When population has spectrum of years of exposure/age – Tuberculin or HIV sero-prevalence survey – Years of work as health professional • However.“Longitudinal prevalence studies” • In some cross-sectional studies inferences can be made about incidence. this design still has same problems of retrospective exposure assessment • Useful for growth curves (age accurately measured) .

military personnel • Even though more restricted the study population is not defined on the basis of specific exposures • Exposure defined – occupational or workforce exposures – Could be defined on basis of treatment received . physicians.Cohort Populations • General populations – no special exposures – Framingham study – a true general population • All persons in the community invited – Other examples are nurses.

Lead-time problems – better facilities = earlier Dx Multi-serial cohorts – start with all diabetics in 2004 Comparison?? Historical or concurrent elsewhere . rarest.Cohorts of patients • Commonly clinical investigators assemble clinical cohorts – groups of patients with a given condition – ? Case series. These can be true cohort studies • If different types/levels/factors • Prognostic indicator studies • Significant potential problems in these cohorts: – – – – Referral – only sickest.

McGill medical school class of 2004 .is one where people can enter or leave – Examples: A workforce study that is ongoing – A city or other geographic location • A closed cohort is where all persons in the cohort are defined at entry. – Eg.Open versus Closed Cohorts • An open cohort – or dynamic cohort . members can only exit. No one enters.

healthy worker effect.Selection Bias • Definition – selection bias occurs when there is a distortion in the estimate of effect (association) because the study or sample population is not truly representative of the underlying or source population in terms of the distribution of exposures and/or outcomes. and Berkson‟s bias • Cohort studies – selection bias. • Case control – detection or diagnosis or referral bias. drop-out bias .

Avoiding Selection Bias – a representative sample • In an un-biased sample we hope to have a representative sample as follows: Truth – distribution of exposure and disease in source population Exposed Diseased A Not Diseased C Not Exposed B D • Odds Ratio = (A/B) / (C/D) =AxD BxC .

Example – Un-biased Sample Exposed Diseased Not Diseased P1A P3C Not Exposed P2B P4D • Odds Ratio = (P1 x P4) (P2 x P3) IF (P1 x P4) x 1 THEN (P2 x P3) x (A x D) (B x C) OR = (A x D) = Truth! (B x C) .

i. 10% are sampled from each of the groups • However if P1 is higher than P2 this can be okay as long as P4 is also increased more than P3 .e.To achieve Un-biased Sampling • To achieve un-biased sampling the easiest is: • P1= P2=P3=P4 • This means the proportion sampled from each group is the same..

e. the odds ratio = 1 • The problem – physician at this hospital strongly believe spicy foods is an important risk factor for peptic ulcer disease.Referral or Diagnostic Bias in Case Control Studies • We are planning a case control study of spicy foods and peptic ulcer disease – Cases = endoscopy proven peptic ulcer disease – Controls = elective inguinal hernia repair at the same hospital • The truth: no relationship i. – Therefore they tend to refer patients for endoscopy more often if they had a diet of spicy foods .

. 100% of patients with peptic ulcer disease and history of spicy foods have endoscopy • But 50% of patients with peptic ulcer disease and NO spicy foods have endoscopy • And of Controls (for hernia repair) 25% eat spicy foods • Then the truth should be that 25% of cases eat spicy foods.Referral or Diagnostic Bias: Example • So.

0 25/37.0 • Note that among the cases only half of those without a history of spicy food are in fact diagnosed (or are diagnosed at this centre) .Referral or Diagnostic Bias Example (cont‟d) • TRUTH: Spicy Foods No Spicy Foods Odds Truth Cases Controls Total Diagnostic Bias Cases 25 25 50 25 75 75 150 37.5 2.5 25/37.5 25/75 25/75 1.5 Controls Total 25 50 75 112.

– Probability Hospitalization if Factor Z = 0.05 Probability Hospitalization if both = higher – These two independent conditions will appear to be associated – but may not be. P1 does not equal P2 does not equal P3 does not equal P4 .Selection Bias – Berkson‟s • This is described in case control studies in hospitalized patients • First described on mathmatical basis. • Fundamental problem is the same.1 Probability Hospitalization if Factor Y = 0. • In practice it is common that patients with 2 or more conditions ARE more likely hospitalized (eg CHF and pneumonia) so in hospital based Case-control study they appear to be strongly associated.

or non-compliance Refused participation or acceptance of assigned maneuver ELIGIBLE GROUP QUALIFIED GROUP ADMITTED GROUP EXCLUDED NONRECEPTIVE Figure 15-2.GROUPS INTENDED POPULATION AVAILABLE GROUP CANDIDATE GROUP LOSSES NOT AVAILABLE NOT CANDIDATES NOT ELIGIBLE REASONS FOR LOSSES Treated at other hospitals or by other doctors Not identified or accessible Did not fulfill diagnostic criteria Superimposed condition of severity. Diagram showing successive transfers from the intended population to the group admitted to a study of therapy . comedication. co-morbidity.

and why? . – Disease and occupational exposures. particularly if self-reported exposures. when: • Participants in a study are different from refuseniks • Potential subjects who have the exposure and the outcome are more (or less) likely to participate • Examples: – Fetal malformations and exposures.Volunteer Bias • Another term for selection bias. – (Both of these can also be affected by recall bias. because more likely to report possible exposures) • What was the mortality of non-participants in the Framingham study.

Surgical patients often appear to do better. or who who self-select to certain exposures are more. or less susceptible to develop health effects/ outcomes of interest. .Susceptibility bias • Just another term for selection bias • Persons allocated to one form of treatment. – Eg Cancer patients who have surgery vs medical or radiotherapy only.

physical requirements of job • Results in better health status initially than general population. mining) – Or. • Also occurs in smokers “healthy smoker effect” – Lung function in adolescent smokers > non-smokers . or certain control pop‟n – Strongly affects results in cross-sectional studies – Reduces risk or delays occurrence of health outcomes of interest.Healthy worker effect • An important bias – found in work-force studies – Reflects medical screening (military.

and dilute results • Particularly problematic if losses to follow up are greater in one of the exposure groups.Selection Bias in Cohort Studies – Dropout‟s • Losses to follow up occur in all cohort studies • Generally will reduce power. • REALLY important if due to development of disease .

Drop-outs from a work-force .impact • If a particular occupational exposure results in health effects quickly in a susceptible sub-group. because all outcomes of interest occur in small number of new workers (power problem) • Example: Allergy to lab animals in researchers – Asthma in Grain workers – Latex allergy in health professionals . and they then leave the work-force (quit) then this effect can be easily missed – In cross-sectional designs – none left – Even in cohorts – event rate appears low overall.

– Truth: IRR = 3.Selection Bias in Cohort Studies – Dropout‟s • Example: – study of incidence of diabetes in obese persons.0 – Losses – 33% in diabetes/obesity group (death/other) • 5% losses in all other groups – (P1 x P4) does not = 1 (P2 x P3) .

1% • Biased incidence rate ratio – 8.1% = 2.7% •In non-obese – 30/735 = 4.7%/4.1 .Selection Bias from Dropout‟s in a Cohort Example At onset Obese Not Obese 227 773 Dropped Out No DM Diabetes 10 35 9 3 Detected at end with diabetes 18 30 • Incidence (biased): •In obese – 18/208 = 8.

particularly in case control – Recruitment – high % in all groups – Same recruitment in exposed/not exposed or cases/controls – In cohort studies close follow up to prevent dropouts • Can assess impact in analysis – Comparing characteristics of dropouts with those who remained – Comparing those who participated with those who refused – Sensitivity analysis – best case/ worst case to assess impact of selection biases .Controlling Selection Bias • Most important strategy is prevention – Design strategies.

smoking. distance from blast) • Sometimes records exist (transfusions. every year or every six months.. obesity. to account for changes in exposure over time (obesity. smoking. BP). • Retrospective – Exposure is based upon past events – Usually exposures can not be directly quantified but proxies are taken (job description. blood pressure.Cohort Studies – Exposure Assessments • Prospective . housing These can be measured repeatedly eg.Measure exposures at outset – – – – These can be one or many Specific: cholesterol. dust levels) . Proxies: occupation.

or the „tappers‟? .Pitfalls in exposure assessments • Observer bias – if disease ascertained at same time – Blind observers to study hypothesis – Standardized protocols • Are all exposures the same? – EG Thoracenteses (pleural taps) ? – Complications of pleural tap at MGH/RVH >> MCI • Why – patients. their diseases.

– Easy if prospective. labs (direct) – Through health service utilization (databases) – Through vital statistics (databases) • Case definition is very important for outcome assessments – Due to enhanced case finding of milder disease among members of the cohort . exam. harder if retrospective • Outcomes then measured periodically – Through questionnaire.Cohort Studies – Outcome Assessments • Baseline measures – ensure that the cohort members are free of disease at the start.

– Solution = standardized protocols. blinded to exposure status (Factor X) .Pitfalls in outcome assessments • Ascertainment bias – if patients with Factor X are more likely to have testing to detect outcome. or blinding to exposures • Observer bias – if patients with Factor X more likely to be Diagnosed with outcome of interest – Common with more subjective tests – eg CXR – Solution – independent reviewers.


Cohort Studies – Measures of Incidence Incidence rate • Incidence rate = number developing disease Total number who entered cohort per unit of time • Incidence rate ratio = IRR = number with disease/number exposed number with disease/number unexposed * Note for IRR there is no unit of time but assumes that the amount of time was similar for those with and without disease and those exposed and unexposed .

Measuring Incidence in Cohort Studies How to handle drop outs etc..? • In a cohort study members drop out either because they are lost to follow up or die of other causes (or refuse to continue) • How to count – keep them in or exclude them from analysis? • It is better to use a method that allows variable length of follow up • Otherwise in large long term cohort studies maybe only 50% of persons are still in the cohort at the end • Also in a dynamic cohort have to be able to account for people who enter after the first year .

Incidence Density Method .5 .Example Patient 1 Exposed Follow up Years Disease YES 2 NO 2 3 4 YES NO NO 10 8 10 YES NO YES • Incidence rate ratio = (1/2) / (1/2) = 1 • Density method = (0/2 years) + (1/10 years) (0/8 years) + (1/10 years) • Incidence density ratio = (1/12) (1/18) = 1.

5 In this example it would be one and a half times higher (or 50% more) .Incidence Rate Difference • A patient asks “What is my risk because I smoke?” (or “how much will it go down if I quit smoking”) – Can answer using incidence density ratio = incidence density if smoking incidence density non smoking = 1.

Incidence Rate Difference (cont‟d) • If a public health official asks you what is the impact of air pollution on cancer in Montreal? • Incidence rate = number developing disease Total number who entered cohort per unit of time • Incidence rate ratio = number with disease/number exposed number with disease/number unexposed .

Survival analysis is advantageous when time to event is affected by the exposure. This would be an important advantage to patients. Accounts for variable length of follow up. • For example: A given cancer treatment increases survival at two years but five year mortality is unchanged.Cohort Studies – Survival Analysis • Survival Analysis is a method of analysis is used if you have time to event for all event. • Survival analysis takes this into account by analysing time to disease. .


Cohort Studies – Survival Analysis Types • Simplest – Direct • Next simplest – actuarial or life-table • Kaplan-Meier – still pretty simple. People who drop out or die of other causes are „censored‟. At each point numerator is all who have developed disease. Calculates cumulative proportion free of outcome (survived) at each point in time when that outcome occurs. while denominator is all without outcome in the interval just before • Cox regression analysis – multivariate analysis with same basic principles .

980 0.Table 17-1. Censored During Interval 0 1 1 Died During Interval 1 2 3 Cumulative Deaths 1 3 6 Cumulative Mortality Rate 0.939 0.875 .061 0.125 Cumulative Survival Rate 0.020 0. 2-3 yr. 1-2 yr. DIRECT ARRANGEMENT OF SURVIVAL DATA FOR 50 PATIENTS Cumulatively Followed from Onset Throughout This Interval 50 49 48 Interval 0-1 yr.

9 --Number Alive Before Death (s) 50 49 48 46 45 43 (42) Number Of Interval 1 2 3 4 5 6 7 Number Of Death 1 1 1 1 1 1 --- Number Of Survivors 49 48 47 45 44 42 --- Interval Survival Rate 0.960 0.980 0.920 0.980 0.5 1.979 0.3 2.977 --- Censored Before Next Death 0 0 1 0 1 0 --- .4 1.Table 17-3.879 Time of Death (s) That End (s) Interval 0.978 0.1 2. VARIABLE-INTERVAL (KAPLAN-MEIER) ARRANGEMENT OF SURVIVAL DATA FOR 50 PATIENTS Cumulative Survival Rate Before Death (s) 1.978 0.000 0.900 0.940 0.8 2.980 0.

General Hospital Ventilation and time to TST conversion – Kaplan-Meier curves .