Chapter2 PrevalenceStudiesOct2006

P
REVALENCE
STUDIES
INTRODUCTION
Definition.........................................................................................................33
Use in public health and research.....................................................................34
MEASUREMENTS OF PREVALENCE
Point prevalence...............................................................................................34
Period prevalence.............................................................................................34
Life time prevalence.........................................................................................34
EXAMPLES OF PREVALENCE STUDIES
Seroprevalence studies.....................................................................................36
Repeat prevalence studies................................................................................36
METHODOLOGY
Sampling .........................................................................................................36
Sample size .....................................................................................................37
Primary and secondary source of data .............................................................38
Types of bias ...................................................................................................38

Survivor bias .......................................................................................38
Participant selection bias .....................................................................38
Observer bias ......................................................................................39
DATA ANALYSIS
Measurement of prevalence .............................................................................39
Stratification ....................................................................................................40
Logistic regression .........................................................................................40
ADVANTAGES AND LIMITATIONS....................................................................41
CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY......................42
ADDITIONAL READING........................................................................................43
EXERCISES..............................................................................................................44
DATA FILE DICTIONARY....................................................................................59

Prevalence studies
INTRODUCTION
Definition
Prevalence or cross-sectional are the most common population-based epidemiological
studies. They are designed to estimate the frequency of a health event in the population at a
point in time or over a short period of time. Cross-sectional studies can also be used to
investigate associations between risk factors and disease, although this is not the most
efficient design to study causality.
A prevalence rate (P) is estimated by:
P number of persons with the event of interest (infection, disease, etc )

= population at risk of presenting the event of interest during a given time
The population at risk is usually the population living in the study area, or it is defined by
geographical, administrative, demographical, occupational, or other parameter, such as
health services clients. Prevalence rate is reported on a population base, eg. 5 cases of a
disease per 100 inhabitants (5%).
Prevalence is influenced by the incidence (I), and mean duration (D) of the disease. As a
proportion, the numerator is part of denominator and has no unit - value ranges from 0 to
1. When incidence and the population dynamic are constant, prevalence (P) may be
calculated as by:
P = incidence x duration of disease
The duration of a disease can be obtained when the incidence and prevalence are known.
An area reporting, for example, an incidence of 3.3 new cases of tuberculosis per year, per
100,000 inhabitants, and prevalence rate of 19.8/100,000 will estimate an average duration
of the disease as :
P 19.8
D= = = 6 months
I 3.3
Prevalence can be evaluated by a single (cross-sectional) measurement. Conversely,

incidence requires at least two measurements of the same population: one at the beginning
33
Prevalence studies
of a follow-up to exclude those who are already sick/infected, and the other to detect the
emergence of new cases. For infectious diseases of rapid evolution prevalence
measurement has no significance. For events (infections and diseases) of longer or chronic
duration, prevalence may indicate the risk of exposure for susceptible individuals.
Use in Public Health and Research - Prevalence studies are often used as a baseline
measurement for the monitoring of control programmes. They are also used in the
selection of participants for other studies such as case-control, cohort and clinical trials.
For example, in an initial serological screening for Trypanosoma cruzi infection among a
large schoolchildren population in a rural area in Brazil a prevalence of 7.9% (95%
confidence interval 6.8%-9.1%) was reported (Andrade et al., 1992). A sample of those
seropositive children was then selected to participate in a clinical trial to evaluate the
efficacy of benznidazol as a specific treatment. In addition, seropositive and seronegative
matched controls (case-control) were compared to evaluate environmental, familial and
nutritional risk factors associated with T. cruzi infection.
MEASUREMENTS OF PREVALENCE
Prevalence is considered a simple measure of burden of disease. In infectious disease,

however, its interpretation would requires a more extensive knowledge of the mechanisms
by which the infection is transmitted, pre-clinical and clinical evolution of the disease, as
well as, the duration of the infection/disease and the case-fatality rate. The criteria for
defining of infection and disease, and the probable duration of the event until recovery or
death, must be specified in the design stage of a prevalence study.
The most commonly used types of prevalence rate are: point, period or lifetime
prevalence.
Point prevalence - is the total number of individuals with a disease or an attribute at a

specific point in time divided by the population at risk of having the attribute or disease at
this point in time. The prevalence rate has been used as a synonym for the point preva-
lence rate.
Period prevalence - is the total number of individuals with a disease or an attribute in a

given period of time divided by the population at risk of having the attribute or disease
during the same period of time.
Lifetime prevalence - is the total number of person known to have had the disease or
attribute at least part of their life.
Figure 1 illustrates the concepts of point and period prevalence in malaria. The point
prevalence in endemic areas of malaria can be obtained by the parasitological screening of
a population over a short period of time. Differences between the prevalence of infection
and the incidence of clinical cases depend on the levels of endemicity. According to the
example in figure 1, at the beginning of 1992 the point prevalence of symptomatic
malaria was 4 cases, and 5 new cases were diagnosed during the year (incidence), yielding
a period prevalence of 9. At the beginning of 1993 the point prevalence of infection
was 12 cases and the number of clinical cases 3, which illustrates the differences between
point prevalences of infection and disease, respectively.
34
Prevalence studies
Prevalence estimates in control activities are influenced by the operation and diagnosis
criteria. Changes in case definition, treatment schemes and discharge criteria may change
prevalence figures. Mass interventions potentially interfere with the transmissibility of an
infectious disease, its incidence, duration and characteristics of the infection/disease of
existing cases. In the case of leprosy control, for example, target areas for elimination are
defined as those with prevalence rates below 1 case per 10,000.
Figure 2 illustrates the concept of point and period prevalence for leprosy. Assume 500
cases (N) at the beginning of the period (t0) and that all new cases (A = 250), regardless of
their clinical form, occurred at the same time, at mid-year (t 1). The period prevalence
(Δt1) is 750 cases; 500 at the start of the period, plus 250 new cases. Assuming that in time
t1 there were 350 discharges (B = 350), the prevalence at point t1 is the net number of
cases (N - B = 150) plus the new cases (A = 250), which totals 400 cases. Thus, in a
situation of stable incidence, reduction of the point prevalence will depend on the number
of patients treated (cured or discharged) and the proportion that defaulted from treatment.
35
Prevalence studies
Figure 2
Leprosy - Point and period prevalence
New cases
(A = 250)
Treatment+ Defaulters
{
N – B = 150
N=500
} Discharges
(B = 350)
t0 t1 t1 t2 t2
EXAMPLES OF PREVALENCE STUDIES
Seroprevalence studies - are particularly useful for infectious diseases that induce
antibody response or other biological markers. Seroprevalence studies are used to
determine geographic distribution of a large number of diseases, such as hepatitis A, B, C,
HIV and also in surveys before and after vaccinations to evaluate antibody seroconversion.
Prevalence is estimated with respect to age and sex in order to understand the dynamics of
transmission of infection in the community. This type of analysis allows the identification
of areas of high risk within the community, carriers, immune and susceptible individuals.
The analysis should indicate the current and past disease/infection/immunity situation,
providing useful information to predict future risk of transmission.
Repeat prevalence studies - are prevalence studies repeated at intervals, generally of

years or decades. They differ from cohort studies for not studying the same group of
individuals, but instead, by evaluating independent samples of a given population in
different period of time. Thus, it is unlikely that in probability sampling the same
individuals will be selected to measure the events of interest. These repeated prevalence
studies are used to evaluate the health/disease/infection status and trend of large
populations; they are important for the planning of health services and for the detection of
changes in the health situation over time. Examples are nutritional and household surveys
in censuses.
METHODOLOGY
Sampling
Random sampling - prevalence studies generally require random sampling of a population.

A probability sample substantially increases the chance that the participants will be
representative of the target population and so assures the internal validity of the study.
36
Prevalence studies
This also allows for the extrapolation of study results to other communities (external
validity).
Systematic sampling – a systematic sampling will use some type of pre-established

sequence to select participants, for example, from files of medical histories, houses on a
street, or patients presenting spontaneously in ambulatory clinics.
Stratified sampling - this involves dividing the population into distinct subgroups
according to some important characteristics and selecting a random sample of each
subgroup. If the proportion of sample drawn from each strata is the same as the proportion
of the total population, then all strata will be fairly represented in with regard to the
number of person in the sample. A two-stage sampling was developed by EPI-WHO to
evaluate vaccination coverage and the quality of health services. 30 urban settlements are
selected and 7 children in the given age group are selected in each settlement.
Sample size
While a probability sample gives a study internal validity, the precision of the prevalence
estimate obtained depends on the sample size. Thus, the width of the confidence interval
(estimated by the interval of the prevalence in the population) reflects the degree of
precision conferred by the size of the sample chosen.
The size of a sample for simple random sampling is calculated as follows:
n = Z * Z [P (1-P)] / (D*D)
where:
Z the value of the reference normal distribution for the desired confidence level (Z =
1.96 for the 95% confidence interval - 95% CI)
P the expected prevalence
D the highest acceptable error in the estimate (half-width of the CI - measurement of
precision)
For example, to estimate the seropositivity for dengue virus antibody in a population of
about 1 million inhabitants with an expected prevalence of 15% (P = 0.15) and a 95% CI
of 12% (D= 0.06), the number of persons to be studied would be:
n = 1.962 [0.15 (1 - 0.15)] / (0.062)

n = 136 persons
The estimate sample size should be increased to compensate for eventual refusals or losses.
The sample size conveys an idea of the order of magnitude of the population needed for
the study, but must not be rigid, as it is calculated on the basis of an estimated parameter
(expected prevalence). This estimate is usually obtained from a review of the literature.
Sample sizes must be based on different estimates of prevalence and precision in keeping
with the purpose of the study. A balance between what is desirable and what is practically
possible should be achieved. Opinion surveys are generally conducted on about 1,000
persons to obtain good precision (for example, 95% CI with a maximum width of 6%). It
37
Prevalence studies
should be emphasized that prevalence studies are not suited for events of low frequency of
occurrence.
Primary and Secondary Source of Data
 Official information systems - Sources of secondary data can be useful in prevalence

studies and must be used whenever possible as a first approach to the question to be
answered. Some of the advantages of using existing databases for epidemiological
purposes are: its low cost, the availability of information over a long period of time which
allow to evaluate secular trends, and, possible access through computerized systems.
In some case, data generated from information systems of control programmes make it
possible to build up time series. Other sources are the medical histories of general or
referral hospitals and of sentinel hospitals for infectious diseases.
The interpretation of secondary data requires a knowledge of the coverage and quality of
the information; of changes in the definition of cases over time; of administrative actions
such as changes from voluntary to transitory reporting, and changes in established
interventions and report forms.
The epidemiological interpretation should recognize the limitations, quality of the existing
database, potential biases associated with determination of the disease and the selection of
cases for treatment.
 Collecting primary data - Self-administered questionnaires and interviews are

common ways of obtaining information on morbidity, frequency of symptoms and
variables of interest in prevalence surveys. In addition, laboratory tests for the detection of
biological markers can be used to measure the prevalence of infection/disease and risk
factors. The quality of primary data over secondary is evident. Planned epidemiological
study can collect data in standardized and validated way, completion can be assured, and
careful recording make the data base useful for analysis and interpretation.
Types of bias
 Survivor bias – systematic error arising in cross-sectional studies for including only
prevalent cases. Cases with rapid evolution and early deaths are excluded, while longer
survival cases tend to be over-represented. Since the probability of surviving a disease
affects its prevalence, studies based on prevalent cases generate associations that reflect
determinants of the survival of cases.
 Participant selection bias - convenience sampling, such as that prompted by the

accessibility of persons presenting at public health services, specialized clinics and referral
services, etc., can introduce participant selection biases. Self-selected or voluntary
participants, tend to be healthier than the overall population. Sampling based on risk
behavior subgroups tend to overestimate the prevalence of some sexually transmitted
diseases. Some population subgroups such as blood donors and pregnant women are more
likely to show prevalence rates closer to the overall population.
38
Prevalence studies
The refusal of individuals to participate in prevalence studies, interviews or donating

biological material can also introduce participant selection bias, and must be held to a
minimum to ensure that the sample is representative of the base population. Working
alternatives for reducing the number of such refusals must be provided in the protocol in
advance. The differences between participants and non-participants in a study must be
evaluated in relation to the variables demographic and risk variables.
Stored collections of clinical and laboratory specimens are eventually used to estimate
prevalence of some diseases. Serum and biological material banks that do not record a
description of the population from which the specimens have been taken, the sampling
method used, and the circumstances in which they were obtained are without value for
epidemiological purposes. In order for the results obtained from these tests to represent
the actual prevalence, all the requirements of a project design must be satisfied: (a) clear
purposes, (b) representation of the population of interest, (c) sample size, and (d)
knowledge of the tests to be used, their sensitivity and specificity, the limits of their
accuracy and their significance.
 Observer bias – a flaw in measuring exposure or outcome that result in differential

quality (accuracy) of information between comparison groups. Standard procedures for
interviews and clinical examination will assure an uniform evaluation of associations and
effect in a survey.
DATA ANALYSIS
Measurement of prevalence
Confidence interval of a prevalence show the degree of uncertainty of the estimate.

Assuming a random sample of the population of interest, the larger the sample studied, the
more precise the observed proportion will be. A 95% confidence interval may be
presented in figures or graphically. The width of the interval reflects its precision. The
upper and lower limits of confidence intervals will be close to each other when estimations
are made from large samples. When two or more proportions show overlapping confidence
intervals it is assumed that there is no statistically significant difference between them.
However, Chi square test or Fisher test are considered the appropriate procedures to
compare two proportions.
In prevalence studies the association between exposure and disease can be evaluated.
The relative risk estimation or the odds ratio can be calculated especially when the
frequency of the disease/outcome is rare. In these circumstances the ratio between the
two prevalences (exposed and not exposed) called prevalence ratio (PR) can be used.
For example, to study the association between history of sexually transmitted disease
(STD) and homeless children, 496 children (101 homeless and 395 family children
working in the streets were interviewed for histories of STD. STD history was reported by
24.8% homeless children and 3.5% family children working in the streets. (Porto et al.,
1994). The results are presented in the following table:
39
Prevalence studies
History of STD Total

Children Yes No
Homeless 25 (24.8%) 76 (75.2%) 101 (100%)
In-street 14 (3.5%) 381 (96.5%) 395 (100%)
Total 39 (7.9%) 457 (92.1%) 496 (100%)
PR= 0.24 / 0.035 = 7.31 (CI 95% 3.7 – 12.9)

OR= 25*381 / 14*76 = 8.9 (CI 95% 4.2-19.1)
Therefore, the risk of reporting a STD was 7 times higher for homeless children than for
family children working in the streets.
Stratification - The main technique for the evaluation of confounding effect and to
examine interaction (modification of effect) between risk factors is stratification.
Stratified analysis is usually done in the following stages:
 Divide the study population into strata for the potential confounding variable;
 Calculate estimates of the effect of exposure (prevalence ratio and confidence interval),
for each specific stratum in relation to the baseline exposure level;
 Determine whether the magnitudes of the differences between the prevalence ratio of
the different strata suggest interaction or confounding;
 Estimate a summary (grouped) risk based on Mantel-Haenszel test in case of
confounding.
Logistic Regression – Although stratification can be used to adjust the prevalence for
more than one confounding variable, a large number of strata tend to produce clusters of
small numbers of observations, with loss of precision in the calculations. This limitation
of stratification in the simultaneous adjustment of several confounding variables can be
overcome to some extent by the use of modeling techniques. Multivariate models will help
better understanding the predictive value of a set of variables related to a particular
outcome. If the endpoint is binary logistical regression models can be applied to
prevalence studies to assess the effect of one exposure in the presence of other additional
risk factors. When the endpoint is continuous, linear regression is the option. Logistic
regression is usually done in the following steps:
 Identify the variables to be included in the model;

 Recode the variables if necessary,
 Select the outcome variable and the other variables (predictors) to be considered in the
model
 Create dummy variables when necessary
 Estimate the adjusted Odds
40
Prevalence studies
ADVANTAGES AND LIMITATIONS
Prevalence are considered less time consuming as compared to cohort or case-control

(when based on incident cases), less expensive, and are operationally less complex than
other epidemiological study designs. Whenever possible, they should be based on samples
of the general population and not on selected populations such as health services users.
Prevalence studies are not suitable for rare or short-duration diseases, which will afflict
few persons at a given point in time. It is frequently difficult to separate cause and effect
(risk factor and disease) because the measurements of exposure and disease are made at the
same time, and for this reason, can not be used to test etiological hypotheses.
41
Prevalence studies
CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY

. Define the importance of the question to be answered
. check the literature and other studies in the field
. verify if the research question generates scientific knowledge and possibly
an impact on public health
. Frame clearly the questions to be answered

. explain the event to be measured and the population to be studied
. make sure that the questions can be answered in technical and operational
terms
. Describe the technical procedures for evaluating the event of interest

. laboratory methods, interviews, questionnaires, clinical examination
. interpretation and categorization of the parameters to be evaluated
. Establish the sampling procedures

. define the reference population and the population to be studied
. establish the sampling method to be used
. Calculate the sample size

. estimate the expected prevalence to calculate sample size
. define the acceptable precision (error) in your estimate of prevalence
. Discuss ethical issues

. risks vs. benefits
. availability of medical care for participants in whom the event is detected
. confidentiality of the results
. use of preexisting biological samples and serum banks
. Describe the stages of analysis of the data

. give the parameters (proportion and 95% CI, mean), and describe the
statistical methods and comparison subgroups
. conduct a stratification analysis or logistical regression analysis to adjust
for potential confounders
42
Prevalence studies
ADDITIONAL READING
ANDRADE, A.L.S.S., ZICKER, F., LUQUETTI, A.O., OLIVEIRA, R.M., SILVA,

S.A., SOUZA,J.M.P. & MARTELLI, C.M.T. Surveillance of Trypanosoma cruzi
transmission by serological screening of schoolchildren. WHO Bulletin,70(5):625-9,
1992.
BEAGLEHOLE, R., BONITA, R. & KJELLSTRÖM, T. Basic Epidemiology. World

Health Organization, Geneva, 1993.
GORDIS, L. Epidemiology. Elsevier Science, 3rd edition, 2004
GIESECKE, J. Modern Infectious Disease Epidemiology, Boston:Little, Brown and

Company, 2003.
HENNECKENS, H.C. & BURING, J.E. Epidemiology in Medicine, 5th ed. Boston:
Toronto, Ed. Little, Brown and Company, 1987.
PAUL, J.R. & WHITE, C. Serological epidemiology. Academic Press New York and
London, 1973.
PORTO, S.O.B., CARDOSO, D.D.P., QUEIROZ, D.A.O., ROSA, H., ANDRADE,

A.L.S.S., ZICKER, F. & MARTELLI, C.M.T. Prevalence and risk factors for HBV
infection among street youth in Central Brazil. Journal of Adolescent
Health,15:577-81, 1994.
43
Prevalence studies
EXERCISES
Files: 1. ViewScreen
2. ViewHepbprev
Exercise 1
Serologic screening for Trypanosoma cruzi. A serologic survey was conducted to
estimate the prevalence of T.cruzi infection in schoolchildren aged 7 to 12 years, resid-
ing in endemic rural areas of central Brazil. Blood samples were collected on filter
paper from 1,990 children for indirect hemagglutination (IHA), indirect
immunofluorescence (IIF) and ELISA. Details of the study and methodology are
given in Andrade et al., 1992. The plan of analysis include (a) comparison of
seroprevalence by each technique, and (b) prevalence ratio by sex and age group. Use
the screen data table, included in the EpiGuide.MDB project to answer the following
questions.
**Before starting the exercise route out the results to a HTML file named “Results
SCREEN”. ROUTEOUT 'Results SCREEN' [Figure 1]
[Figure 1- Route Out results]

2- Define the folder to
save the HTM file
4- Mark this box if you

3- Write the file name want to replace an
existing file
1- Click on RouteOut
44
Prevalence studies
Question 1. Calculate the prevalence rate and 95% CI obtained with

each laboratory technique used. Is there any statistical
difference between the results?
Note 1: READ C:\EPIGUIDE\EpiGuide.mdb':viewSCREEN [Figure

2]
FREQ IHA [Figure 3]
FREQ IIF
FREQ ELISA
[Figure 2 – Read data file]
1- Click on Read from the Analysis

Commands tree
3 – Identify the data file you will use

in the exercise
2 – Change to the desired project:

EPIGUIDE.MDB
4 – Click Ok
45
Prevalence studies
[Figure 3 – Frequencies command]
2 – Choose the variable(s)
1- Click
Frequencies
3 - Click OK when finished
Question 2. Estimate prevalence and 95%CI of T.cruzi infection

assuming a positive diagnosis as having at least 2 positive
serological tests. Calculate the seroprevalence and 95% CI
for each municipality. Is there any difference between the
areas? Calculate the prevalence and prevalence ratio by
sex. Is there any association between sex and T. cruzi infec-
tion?
Note 2: [To continue the exercise you must create a new variable
called “POS”. Consider (+) as positive to at least two
serological tests and (-) as negative.]
DEFINE POS [standard] [Figure 4]

[Assign all records with a negative result (-)]
ASSIGN POS = (-) [Figure 5]
[Identify the positive records (+) - positivity to at least 2 tests]
IF IHA = ”P” and IIF = ”P” THEN
ASSIGN POS = (+)
END [Figure 6]
IF IHA = ”P” and ELISA = ”P” THEN
ASSIGN POS = (+)
END
IF ELISA = ”P” and IIF = ”P“ THEN
ASSIGN POS = (+)
END
FREQ POS
TABLES MUN POS [Figure 7]
46
Prevalence studies
For Epitable:
Run EPITABLE to calculate the 95% CI – select
DESCRIBE, then select PROPORTION, then select
SIMPLE RANDOM SAMPLING
[Use the results of the previous table]
For Open Epi: [Figure 8]

Access OPEN EPI from the EpiGuide CD or from
www.openepi.com
From Open Epi menu choose PROPORTIONS from the
COUNTS folder. Click on ENTER NEW DATA button.
Populate the table provided with the numbers for the
Numerator and Denominator. Use the results of the previous
command. Click CALCULATE. A result window will be
shown. Note the results and close the result window.
TABLES SEX POS
[Figure 4 – Define new variable]

2 – Type the new variable name
1- Click Define to create a
new variable
[Figure 5 – Assign values]
3 - Choose from the

Available Variables to
2 - Choose the variable construct the expression
to receive the new values
4 - Revise the 5 – Click OK

1 – Click Assign Assign Expression
47
Prevalence studies
[Figure 6 – IF command]
2
1
4
10
5 9
1 – Click IF to establish conditions for the new variable

2 – Choose the variable to build the condition
3 – Create the condition(s) to assign the values for the new variable
4 – Click THEN to access the THEN Block
5 – Click ASSIGN
6 – Choose the variable to receive the new values
7 – Choose from the Available variables to construct the expression
8 – Revise the assign expression
9 – Click ADD to return to the IF window
10 – Click OK when finished
48
Prevalence studies
[Figure 7 – Tables command]
3- Choose the
Outcome variable
4 – Click Ok
1- Click on Tables
[Figure 8 – Open Epi - Proportions]
1- Click Proportion
2- Click
Enter New Data
3 – Type the values for

Numerator and
Denominator
4 – Click Calculate
49
Prevalence studies
Question 3. Compare the T.cruzi infection prevalence between the

different ages. Is there any trend between seropositivity and
age? Create a new variable AGEGR as 7-9 and 10-12 years
old. Calculate the prevalence and prevalence ratio for each
age stratum.
Note 3: TABLES AGE POS

[Note the results – absolute values for negative and positive
for each age]
From the UTILITIES Menu of EPI WIN main page select

STATCALC and select CHI SQUARE FOR TREND
[Use the data produced by the preceding table]
Press F10 to leave STATCALC
Return to ANALYSIS
[Create a new variable “AGEGR” to group by age stratum]
Define AGEGR [standard]
RECODE AGE TO AGEGR

7-9=2
10-12=1
END [Figure 9]
TABLES AGEGR POS
[Figure 9 – Recode values into a new variable]
3 - Choose source 4 – Choose destination

variable variable (new)
1 – Define the new 7 – Click OK

variable
2 – Click Recode 5 – Type old values 6 – Type the new

or range of values values. Press enter
to go to the next line
50
Prevalence studies
Question 4. Based on the results of questions 2 and 3, is there any

association between exposure to T. cruzi in childhood and sex
and/or age?
[If you are doing the Advanced exercise leave Analysis OPEN
and proceed to Question 5]
[If you are not doing the Advanced analysis of this exercise you
can Exit Analysis]
EXIT [to leave ANALYSIS]
Advanced Exercise
Question 5. What are the adjusted prevalence ratios for group age and sex
(OR) after applying a logistic regression technique? What
happened to the associations between exposure to T. cruzi and
sex or age?
[To apply Logistic regression analysis you will have to recode

variables SEX and AGEGR INTO SEX_R and
AGEGR_R]
DEFINE SEX_R [standard]

RECODE SEX TO SEX_R
1=1
2=0
END
DEFINE AGEGR_R
RECODE AGEGR TO AGEGR_R
1=1
2=0
END
LOGISTIC POS = SEX_R AGEGR_R [Figure 10]
EXIT [to leave ANALYSIS]
51
Prevalence studies
[Figure 10 – Logistic Regression]
3 – Choose the
2 – Choose 4 – Click Make
Independent
the Outcome Dummy to create
variables
dummy variables
1 – Click
Logistic Regression
5- Type a table name 6 – Click

to save the residuals OK
Exercise 2
Prevalence of and risk factors for hepatitis B infection. A cross-sectional study was
designed to measure the prevalence of serologic markers for hepatitis B virus infection
(HBV) in first-time blood donors and convicts, and to evaluate risk factors associated
with seropositivity. The viewhepbprev, part of EPIGUIDE.MDB project, includes
results of HBsAg and anti-HBs (ELISA) for 1,033 blood donors and 201 convicts, and
14 potential risk factors variables. Details of the methodology and population are in
Martelli et al., 1990. Positivity to HBsAg or to anti-HBS was taken as HBV infection.
The plan of analysis was designed to (1) evaluate the prevalence of the HBsAg and
anti-HBs markers in the group of donors and convicts; (2) compare sex and age
distribution and frequency of potential risk factors between groups, and (3) calculate
the prevalence ratio of HBV positivity between exposure groups.
**Before starting the exercise route out the results to a HTML file named “Results
HEPBPREV”
ROUTEOUT 'Results HEPBPREV'
Question 1. Estimate prevalence of HBV carriers (HBsAg), immunes

(anti-HbsAG) and susceptibles (absence of marker) in the
study population. Construct a prevalence table for HBV
markers for donors and convicts. Discuss the results.
52
Prevalence studies
Note 1: READ ‘C:\EPIGUIDE\EpiGuide.mdb’:viewHepbprev

[For the variables HBSAG and ANTIHBSAG exclude
code
-1 (no information)]
SELECT HBSAG <> -1 AND GROUP = 2 [Figure 11]
FREQ HBSAG [Figure 12]

SELECT (to disable selection) [Figure 13]
SELECT ANTIHBSAG <> -1 AND GROUP = 2
FREQ ANTIHBSAG
SELECT
[Follow the same commands for GROUP=1]
[Create variable EXP (exposure to hepatitis B virus). For

the variable “EXP” exclude the No information entries
HBSAG = -1 AND ANTIHBSAG = -1]
DEFINE EXP [standard]

ASSIGN EXP = 2 [negative results]
IF HBSAG = 1 OR ANTIHBSAG = 1 THEN
EXP = 1
END
IF HBSAG = -1 AND ANTIHBSAG = -1 THEN

EXP = (.)
END
TABLES GROUP EXP
[Figure 11 – Select command]
1- Click Select 3- Define the selection criteria
2- Choose the variable(s) 4- Click OK when

to build the selection finished
criteria
53
Prevalence studies
[Figure 12 – Frequencies command]
2 – Choose the variable(s)
1- Click
Frequencies
3 - Click OK when finished
[Figure 13 – Cancel Select]
1- Click Cancel Select
2 - Click OK to cancel
current selection criteria
Question 2. Compare age and sex distribution of the study populations.

Stratify the two study populations by age groups (<=29
years; 30-39 years; >=40 years) and compare seropreva-
lences in each age stratum. Is there any seropositivity trend
among the blood donor’s group? Give possible explanations.
54
Prevalence studies
Note 2: MEANS AGE GROUP TABLES = (-) [Figure 14]
SELECT SEX <> 9

TABLES SEX GROUP
SELECT
[Create variable “AGEGR” (age groups)]
DEFINE AGEGR [standard]
IF AGE >= 15 AND AGE < 30 THEN
ASSIGN AGEGR = 1
END
IF AGE >= 30 AND AGE < 40 THEN
ASSIGN AGEGR = 2
END
IF AGE >= 40 THEN
ASSIGN AGEGR = 3
END
TABLES GROUP EXP STRATAVAR = AGEGR [Figure 15]
[Note the results – absolute numbers]
From Epi Info main page select the UTILITIES menu then select
STATCALC and select CHI SQUARE FOR TREND
[Use data from preceding table (Group = 2) for each level of
“AGEGR”]
Return to ANALYSIS
[Figure 14 – Means command]
2- Choose the variable to apply 3- Choose the variable to use

the means command for comparison
1- Click 4- Click Settings

Means
5 – Uncheck the Show

Tables in Output box
[Figure 15 – Tables with stratification]

55
Prevalence studies
2- Choose the 3- Choose the 4- Choose the

Exposure Variable Outcome variable Variable to
Stratify by
5 – Click Ok
1- Click on Tables
Question 3. Compare the frequency of the potential risk factors for

exposure to HBV (TRANSF, INJMED, INJDRUG,
TATTOO, VDRL, STD) among convicts and blood donors.
Are the differences statistically significant?
Note 3: SELECT <name of variable> <> -1

TABLES <name of variable> GROUP
Ex: SELECT TRANSF <> -1
Ex: TABLES TRANSF GROUP
SELECT
Repeat the procedure for each variable
Question 4. Calculate the prevalence ratio (PR) and 95% CI for

incarceration (as a risk factor) for HBV taking the blood
donors as reference group (PR=1). Are the convicts a
population at higher risk of HBV? Discuss. Within the
group of convicts, does the number of years of incarceration
(YEXP) increase the probability of HBV infection?
Comment on a possible confounding factor in this univariate
analysis.
56
Prevalence studies
Note 4: TABLES GROUP EXP

SELECT GROUP = 1
TABLES YEXP EXP
[Note the results – absolute numbers]
Select STATCALC from the UTILITIES menu in the Epi
Info main page, then select CHI SQUARE FOR TREND
[Use the data produced by the preceding table
(GROUP=1)]
Return to ANALYSIS
SELECT
Question 5. Assuming age as a confounding variable, adjust the PR for

HBV associated with incarceration by each age group.
Discuss results.
Note 5: TABLES GROUP EXP STRATAVAR =AGEGR
Question 6. Calculate the PR and 95% CI of the potential risk variables

mentioned in question 3. Which variables show an
association with HBV infection? Recalculate prevalence
ratios for convicts and donors separately, considering
incarceration as a possible confounding factor. Which risk
factor remained statistically significant associated to HBV
after adjustment ?
NOTE 6: SELECT <name of variable> <> -1

TABLES <name of variable> EXP
Ex: SELECT TRANSF <> -1
Ex: TABLES TRANSF EXP
TABLES TRANSF EXP STRATAVAR = GROUP
[Enter the same commands as for the other variables]
[Use STATCALC to calculate the PR for each
SELECT
EXIT [to close ANALYLIS]
Question 7. Given the dynamics of HBV transmission in areas of

low/medium endemicity, what are the principal
methodological considerations relative to the study popu-
lation? Consider selection and survival biases. Discuss the
HBV serologic screening of blood donors, vaccination of
children in areas of low endemicity, and vaccination of risk
groups.
57
Prevalence studies
REFERENCES
ANDRADE, A.L.S.S., ZICKER, F., LUQUETTI, A.O., OLIVEIRA, R.M., SILVA,

S.A., SOUZA,J.M.P. & MARTELLI, C.M.T. Surveillance of Trypanosoma cruzi
transmission by serological screening of schoolchildren. WHO Bulletin,70(5):625-9,
1992.
MARTELLI, C.M..T., ANDRADE, A.L.S.S., CARDOSO, D.D.P., SOUSA, L.C.S.,

SILVA, S.A., SOUSA, M..A. & ZICKER, F. Soroprevalência e fatores de risco para a
infecção pelo vírus da Hepatite B pelos marcadores AgHBs e Anti-HBs em prisioneiros
e primodoadores de sangue. Revista de Saúde Pública,24(4):270-6, 1990.
For Analysis:
DEAN AG, ARNER TG, SUNKI GG, FRIEDMAN R, LANTINGA M, SANGAM S,

ZUBIETA JC, SULLIVAN KM, BRENDEL KA, GAO Z, FONTAINE N, SHU M,
FULLER G. Epi Info™ a database and statistics program for public health
professionals. Centers for Disease Control and Prevention, Atlanta, Georgia, USA,
2002. http://www.cdc.gov/epiinfo/downloads.htm
DEAN A.G., DEAN J.A., COULOMBIER D. et al. Epi Info™, Version 6.04, a word
processing, database, and statistics program for public health on IBM-compatible
microcomputers. http://www.cdc.gov/epiinfo/Epi6/ei6.htm
DEAN, A., SULLIVAN, K, & SOE, M.M. OpenEpi - Open Source Epidemiologic
Statistics for Public Health. http://www.openepi.com
58
Prevalence studies
DATA FILE DICTIONARY
Project: EPIGUIDE.MDB
File: Screen
Variable Description Code Description of code
1 to
ID Identification number
1991
1 Posse
MUN Municipality 2 Simolândia
3 Guarani
1 Male
SEX Sex
2 Female
AGE Age in years 7 to 12
P Positive
IHA Hemagglutination test
N Negative
P Positive
IIF Immunofluorescence test
N Negative
P Positive
ELISA ELISA
N Negative
59
Prevalence studies
Project: EPIGUIDE.MDB
File: Hepbprev
Variable Description Code Description of code
ID Identification number
AGE Age in years 15 to 71

1 Male
SEX Sex 2 Female
9 No information
-1 No information
TRANSF Blood transfusion 1 Yes
2 No
-1 No information
INJMED Use of injectable medicine 1 Yes
2 No
-1 No information
INJDRUG Use of injectable drug 1 Yes
2 No
-1 No information
TATTOO Presence of tattoo 1 Yes
2 No
-1 No information
ACP Acupuncture 1 Yes
2 No
-1 No information
HBSAG Serology for HbsAg 1 Positive
2 Negative
-1 No information
ANTIHBSAG Serology for anti-HBsAg 1 Positive
2 Negative
-1 No information
VDRL Serology for VDRL 1 Positive
2 Negative
-1 Blood donors
0 less than 1 year
YEXP Years of incarceration
1 1 year
2 2 or more years
Convicts
1
GROUP Population under study First-time donors
2
1 Yes
STD Report of sexually transmitted disease
2 No
60

Chapter2 PrevalenceStudiesOct2006

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter2 PrevalenceStudiesOct2006

Uploaded by

Copyright:

Available Formats

P

Types of bias ...................................................................................................38

CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY......................42

DATA FILE DICTIONARY....................................................................................59

A prevalence rate (P) is estimated by:

P number of persons with the event of interest (infection, disease, etc )

P = incidence x duration of disease

Prevalence can be evaluated by a single (cross-sectional) measurement. Conversely,

Prevalence is considered a simple measure of burden of disease. In infectious disease,

Point prevalence - is the total number of individuals with a disease or an attribute at a

Period prevalence - is the total number of individuals with a disease or an attribute in a

Repeat prevalence studies - are prevalence studies repeated at intervals, generally of

Random sampling - prevalence studies generally require random sampling of a population.

Systematic sampling – a systematic sampling will use some type of pre-established

The size of a sample for simple random sampling is calculated as follows:

n = 1.962 [0.15 (1 - 0.15)] / (0.062)

Primary and Secondary Source of Data

 Official information systems - Sources of secondary data can be useful in prevalence

 Collecting primary data - Self-administered questionnaires and interviews are

 Participant selection bias - convenience sampling, such as that prompted by the

The refusal of individuals to participate in prevalence studies, interviews or donating

 Observer bias – a flaw in measuring exposure or outcome that result in differential

Confidence interval of a prevalence show the degree of uncertainty of the estimate.

History of STD Total

PR= 0.24 / 0.035 = 7.31 (CI 95% 3.7 – 12.9)

 Identify the variables to be included in the model;

ADVANTAGES AND LIMITATIONS

Prevalence are considered less time consuming as compared to cohort or case-control

CHECKLIST FOR THE DESIGN OF A PREVALENCE STUDY

. Frame clearly the questions to be answered

. Describe the technical procedures for evaluating the event of interest

. Establish the sampling procedures

. Calculate the sample size

. Discuss ethical issues

. Describe the stages of analysis of the data

ANDRADE, A.L.S.S., ZICKER, F., LUQUETTI, A.O., OLIVEIRA, R.M., SILVA,

BEAGLEHOLE, R., BONITA, R. & KJELLSTRÖM, T. Basic Epidemiology. World

GORDIS, L. Epidemiology. Elsevier Science, 3rd edition, 2004

GIESECKE, J. Modern Infectious Disease Epidemiology, Boston:Little, Brown and

PORTO, S.O.B., CARDOSO, D.D.P., QUEIROZ, D.A.O., ROSA, H., ANDRADE,

[Figure 1- Route Out results]

4- Mark this box if you

Question 1. Calculate the prevalence rate and 95% CI obtained with

Note 1: READ C:\EPIGUIDE\EpiGuide.mdb':viewSCREEN [Figure

[Figure 2 – Read data file]

1- Click on Read from the Analysis

3 – Identify the data file you will use

2 – Change to the desired project:

[Figure 3 – Frequencies command]

2 – Choose the variable(s)

3 - Click OK when finished

Question 2. Estimate prevalence and 95%CI of T.cruzi infection

DEFINE POS [standard] [Figure 4]

TABLES MUN POS [Figure 7]

For Open Epi: [Figure 8]

TABLES SEX POS

[Figure 4 – Define new variable]

[Figure 5 – Assign values]

3 - Choose from the

4 - Revise the 5 – Click OK

1 – Click IF to establish conditions for the new variable

[Figure 7 – Tables command]

[Figure 8 – Open Epi - Proportions]