Professional Documents
Culture Documents
12
© The Author(s) 2022. Published by Oxford University Press on behalf of the Johns Hopkins Bloomberg School of https://doi.org/10.1093/aje/kwac115
Public Health. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Advance Access publication:
July 1, 2022
Practice of Epidemiology
Initially submitted February 20, 2022; accepted for publication June 16, 2022.
In this paper, we propose a framework for thinking through the design and conduct of descriptive epidemiologic
studies. A well-defined descriptive question aims to quantify and characterize some feature of the health of a
population and must clearly state: 1) the target population, characterized by person and place, and anchored in
time; 2) the outcome, event, or health state or characteristic; and 3) the measure of occurrence that will be used
to summarize the outcome (e.g., incidence, prevalence, average time to event, etc.). Additionally, 4) any auxiliary
variables will be prespecified and their roles as stratification factors (to characterize the outcome distribution) or
nuisance variables (to be standardized over) will be stated. We illustrate application of this framework to describe
the prevalence of viral suppression on December 31, 2019, among people living with human immunodeficiency
virus (HIV) who had been linked to HIV care in the United States. Application of this framework highlights biases
that may arise from missing data, especially 1) differences between the target population and the analytical
sample; 2) measurement error; 3) competing events, late entries, loss to follow-up, and inappropriate interpretation
of the chosen measure of outcome occurrence; and 4) inappropriate adjustment.
Abbreviations: HIV, human immunodeficiency virus; NA-ACCORD, North American AIDS Cohort Collaboration on Research and
Design.
Editor’s note: An invited commentary on this article the Reporting of Observational Studies in Epidemiology
appears on page 2071, and the authors’ response appears (STROBE) guidelines (3).
on page 2073. We define a descriptive epidemiologic question as one that
aims to quantify some feature of the health of a population
and, often, to characterize the distribution of that feature
Epidemiologic questions arguably exist on a continuum across the population. The estimand for causal analyses
from purely descriptive to purely causal. To be concise, we is a contrast of potential outcomes in a single population,
ignore prediction questions here. There are several frame- where the potential outcomes are those we would expect
works intended to help guide causal analyses (1, 2), but to observe under some hypothetical intervention (1, 4–7).
the literature on theoretical and practical guidance for con- The fundamental problem of causal inference is that we
ducting descriptive analyses is limited. Here we present a cannot observe all of these potential outcomes (8). The
framework for conducting descriptive epidemiologic stud- estimand for descriptive analyses is a function of the out-
ies. Many, if not all, of the considerations discussed in comes that occurred for everyone in the target population.
this framework apply to estimation of valid causal effects The estimation challenge for descriptive analyses is that we
in a population, although they may be frequently over- may not completely observe all of the actual outcomes. A
looked. Where there may be differences in analytical deci- descriptive analysis might be cross-sectional or longitudinal;
sions depending on the type of study question, we highlight it might concern a dichotomous, categorical, or continuous
them. We summarize guidance provided herein in Table 1 outcome; and it might attempt to summarize the outcome
in the form of a checklist modeled after the Strengthening in any number of ways (e.g., median time to some event,
Title and abstract 1 Explicitly state that this is a “descriptive study” in the title or the abstract.
2 Summarize the target population and provide an informative and balanced summary
of estimated disease occurrence in the abstract.
Introduction
Background/rationale 3 State the motivation for the study, including, where relevant, the action that might be
informed by the results.
Objectives 4 State the descriptive estimand, explicitly including:
(a) the target population (who would be affected by any decisions made as a
Table continues
Am J Epidemiol. 2022;191(12):2063–2070
Epidemiologic Inference for Description 2065
Table 1. Continued
Discussion
Key results 17 Summarize key results with reference to the study objectives.
Limitations 18 Summarize potential sources of selection bias and measurement error and any
attempts to mitigate these biases. Discuss both the direction and magnitude of
any potential bias. Integrating quantitative bias analysis into the study to guide
these discussions is encouraged.
Interpretation 19 (a) Avoid causal interpretations of descriptive results; avoid overinterpreting
stratum-specific differences in measures of occurrence.
mean value, etc.). While much discussion focuses on the living with HIV who had been linked to HIV care (i.e.,
most common scenarios (e.g., dichotomous outcomes), this saw a clinician who was aware of their HIV status and had
framework is intended to be applied to descriptive analy- the ability to prescribe antiretroviral therapy) in the United
ses for any combination of study designs, outcomes, and States? We will explore specific components of this question
estimands. to make it more well-defined (and tie those components to
analytical decisions) below.
A WELL-DEFINED QUESTION
SPECIFYING THE TARGET POPULATION (AND ITS
We start with the premise that good epidemiologic RELATIONSHIP TO THE STUDY SAMPLE)
questions are impactful and well-defined. An impactful
question, if answered, would lead to knowledge that could For a descriptive question, we define the target population
inform action in the population it concerns (7). A well- as the group in which we would like to characterize the
defined question should be stated with enough specificity distribution of the outcome. The choice of target population
and clarity that answering it is at least theoretically is directly linked to the purpose of asking the question. The
possible. target population might be, for example, the population for
A well-defined research question (causal or descriptive) which we will be providing public health services. The target
states: 1) the target population, characterized by person population is not necessarily enumerated (in contrast to a
and place, and anchored in time; 2) the outcome, event, cohort or a sample), but we do need to be able to define
or health state or characteristic; and 3) the measure of membership in terms of person, place, and time (here, time
occurrence that will be used to summarize the outcome is used to define membership in the target population and
(e.g., incidence, prevalence, average time to event, etc.). A does not relate directly to measurement of the outcome).
causal question requires specifying additional components, For our example question, the target population is everyone
such as exposures and covariates that are thought to be living in the United States (place) who was aged ≥18 years,
confounders, effect modifiers, or mediators. For descriptive was infected and diagnosed with HIV, and attended ≥1
questions, consideration of additional variables is optional, clinical visit for HIV care with a clinician who was aware of
but if auxiliary variables will be considered, a well- their infection and could prescribe antiretroviral medication
defined descriptive question will 4) prespecify any other (person) before December 31, 2019, and was alive through
variables of interest and how they will be considered December 31, 2019 (time).
(e.g., to characterize the population, as a stratification A well-defined question specifies the target population
factor to characterize the outcome distribution, or as a priori. When data are available on a full census of the
a “nuisance” variable that we would like to adjust for target population (e.g., through administrative records or
or standardize over). For a descriptive question, indis- public health surveillance), no sampling is needed. However,
criminate adjustment for these other variables can lead when data on the entire population cannot be obtained, we
to uninterpretable results that may mislead (9); as such, rely on data from a sample of the target population or a
researchers should be clear as to the purpose of adjustment population that we hope is sufficiently representative of the
in descriptive studies, understand the implications of such target population with respect to both measured and unmea-
adjustments, and be cautious in interpreting adjusted statis- sured characteristics. The study sample is the enumerated
tics (10). set of individuals whose information is captured in a data
Example: We illustrate application of this framework to set, among whom we attempt to measure occurrence of the
description of one portion of the human immunodeficiency outcome (after inclusion and exclusion criteria have been
virus (HIV) care continuum (11): What was the prevalence applied, if data were not collected using these criteria (e.g.,
of viral suppression on December 31, 2019, among adults administrative data)). Many descriptive and causal questions
Am J Epidemiol. 2022;191(12):2063–2070
2066 Lesko et al.
are answered using convenience samples without a clear move across state lines may be double-counted because of
sampling frame (e.g., people recruited using Web-based challenges with deduplication. Thus, the number of people
surveys, frequent clinic attendees, or people who sought with HIV infection may be inaccurate. Additionally, data
medical care in a particular hospital system) and implicitly rely on HIV viral load and CD4 cell-count laboratory tests as
assume that the study sample is a random sample (perhaps a proxy for clinical visits, and the proxy is imperfect (19, 20);
conditional on covariates with known sampling probabil- thus, we cannot accurately apply the second inclusion crite-
ities) of the target population. Achieving a representative rion for target population membership: linkage to clinical
sample may involve considerable work and may be very care. Alternatively, we might use data from the North Amer-
resource-intensive (12). However, use of convenience sam- ican AIDS Cohort Collaboration on Research and Design
ples often results in study samples that are different from (NA-ACCORD) (21) or another clinical cohort of people
the target population in unmeasurable ways, particularly with HIV who have been linked to care. However, clinical
when subjects must actively seek out or opt into participa- cohort studies are often nested within academic medical
Am J Epidemiol. 2022;191(12):2063–2070
Epidemiologic Inference for Description 2067
NA-ACCORD data (which do not include everyone in the the complete picture about the distribution of the outcome
target population, although they do not include anyone who in the target population. Incidence tells us something about
should be excluded from the target population). Therefore, how frequently an event occurs over time. There are multiple
we might use surveillance data for our primary analyses, measures of incidence; in the interest of space, we will
but we might conduct secondary analyses that leverage the restrict our discussion to risks and rates. If individuals are
relative strengths of the different study samples and, for not followed over time and the event can recur, it may be
example, reweight NA-ACCORD data that include visits to difficult to distinguish the number of affected individuals
resemble the target population implied by the surveillance from the number of events. Prevalent outcomes are often not
data. of interest in causal investigations, as temporality is more
challenging to determine and reverse causation is a potential
problem. In addition, survival bias might affect results when
DEFINING THE OUTCOME considering prevalent exposures (31, 32). Finally, preva-
Am J Epidemiol. 2022;191(12):2063–2070
2068 Lesko et al.
Risks can be calculated in the presence of late entries with no viral load measurement in 2019 are lost to follow-up.
(left-truncation) and loss to follow-up (right-censoring) Viral suppression is influenced by access to health care and
under strong assumptions about independence between is only possible if people are receiving antiretroviral therapy
entering/leaving the study and risk of the outcome (38, 39). (except, in rare cases, for elite controllers) (41). In this set-
Left-truncation and right-censoring impute outcomes for ting, people who are lost to follow-up may have transferred
people who did not survive to enroll in the study sample to another clinic and may still be receiving treatment (if we
and for people who are censored (38). We can adjust for are using NA-ACCORD data) or may have moved out of
possible associations between censoring and the outcome the jurisdiction (if we are using surveillance data), and we
(and resultant selection bias) using inverse probability of might assume that they have the same probability of viral
censoring weights (40). However, the resultant risks are suppression as people with a viral load measurement (cen-
interpretable as the risk that would have been observed if no soring is appropriate; equivalently, we can restrict analyses
one were lost to follow-up (a hypothetical intervention), and to people with a measured viral load) (24). Alternatively,
Am J Epidemiol. 2022;191(12):2063–2070
Epidemiologic Inference for Description 2069
results for descriptive studies and clear justification and (STROBE) Statement: guidelines for reporting observational
proper interpretation in cases where adjustments are made. studies. Int J Surg. 2014;12(12):1495–1499.
4. Robins JM. Data, design, and background knowledge in
etiologic inference. Epidemiology. 2001;12(3):313–320.
CONCLUSIONS 5. Rubin DB. The design versus the analysis of observational
studies for causal effects: parallels with the design of
Descriptive epidemiologic studies seek to characterize randomized trials. Stat Med. 2007;26(1):20–36.
what is happening in the world to inform public health 6. Petersen ML. Commentary: applying a causal road map in
priorities, target interventions, and occasionally contrast settings with time-dependent confounding. Epidemiology.
with counterfactual scenarios to estimate intervention effects 2014;25(6):898–901.
(46, 47). Descriptive studies have value in their own right 7. Fox MP, Edwards JK, Platt R, et al. The critical importance
of asking good questions: the role of epidemiology
and not merely as stepping stools toward causal inference. doctoral training programs. Am J Epidemiol. 2020;189(4):
Characterizing what is happening in the world requires that 261–264.
Am J Epidemiol. 2022;191(12):2063–2070
2070 Lesko et al.
low- and middle-income countries: a systematic review. Bull 36. Rothman KJ, Lash TL, VanderWeele TJ, et al. Measures of
World Health Organ. 2013;91(5):377–385E. occurrence. In: Modern Epidemiology. 4th ed. Philadelphia,
24. Lesko CR, Chander G, Moore RD, et al. Variation in PA: Wolters Kluwer N.V.; 2021:53–77.
estimated viral suppression associated with the definition of 37. Cole SR, Lau B, Eron JJ, et al. Estimation of the standardized
viral suppression used. AIDS. 2020;34(10):1519–1526. risk difference and ratio in a competing risks framework:
25. Hermans LE, Moorhouse M, Carmona S, et al. Effect of application to injection drug use and progression to AIDS
HIV-1 low-level viraemia during antiretroviral therapy on after initiation of antiretroviral therapy. Am J Epidemiol.
treatment outcomes in WHO-guided South African treatment 2015;181(4):238–245.
programmes: a multicentre cohort study. Lancet Infect Dis. 38. Cole SR, Edwards JK, Naimi AI, et al. Hidden imputations
2018;18(2):188–197. and the Kaplan-Meier estimator. Am J Epidemiol. 2020;
26. Elvstam O, Medstrand P, Yilmaz A, et al. Virological failure 189(11):1408–1411.
and all-cause mortality in HIV-positive adults with low-level 39. Lesko CR, Edwards JK, Cole SR, et al. When to censor? Am
viremia during antiretroviral treatment. PLoS One. 2017; J Epidemiol. 2018;187(3):623–632.
12(7):e0180761. 40. Howe CJ, Cole SR, Lau B, et al. Selection bias due to loss to
Am J Epidemiol. 2022;191(12):2063–2070