You are on page 1of 60

DEPT.

OF DEMOGRAPHY & SOCIAL STATISTICS


OBAFEMI AWOLOWO UNIVERSITY, ILE-IFE,
NIGERIA

THE CONCEPT OF
WEIGHTING
Many Slides courtesy of ICF Macro Intl.
Samson Olusina Bamiwuye (VISITING SCHOLAR , DPS , WITS UNIVERSITY)

November, 2013
ISSUES IN DATA USAGE (courtesy of Kofi
Awusabo-Asare)
Lies
Dammed lies
Statistics
 10, 40, 100, 500, 2000 Which measure of central
tendency will you use if
 Above is the salary of you were:
An employer ?
five workers in an
A Trade Union official?
establishment
Give reasons for your
choice of method

Calculate Application
(courtesy of Kofi Awusabo-Asare)
 provide an overview of DHS Data
 gain practical understanding of the correct

use of DHS data, especially the Use of


Weights and Svyset suit of commands
 Prepare a plan for the processing and

analysis of data.
 Explore other MEASURE DHS Resources

6
 Most slides were obtained from 2010 DHS Fellow
Workshop in Calverton, USA and 2012 DHS
Fellow Workshop in Uganda.
 Three slides from 2012 Workshop on Analysis of

Survey Data using Stata (UCC, Ghana)


 MEASURE DHS website: www.measuredhs.com

 Special Thanks to the Head of DPS, Wits


University - Prof Cliff Odimegwu
 World Fertility Surveys: 1972-1984
 Contraceptive Prevalence Surveys: 1976-1984
 DHS round 1: 1984-1989
 DHS round 2: 1988-1993
 DHS round 3: 1992-1997
 DHS round 4: 1997-2002
 DHS round 5: 2003-2008
 DHS round 6:2008-2013
 DHS ROUND 7: NEW CONTRACT WON!
ROUNDS OF SURVEYS SINCE INCEPTION OF DHS

ROUND 1 ONDO DHS 1987


ROUND 2 NDHS 1990
ROUND 3 NDHS 1999
ROUND 4 NDHS 2003
ROUND 5 NDHS 2008
ROUND 6 NDHS 2013
NGIR51FL
1-2 Country initials
3-4 Type of recode(HR, PR, IR, BR, KR, MR, CR)
5 DHS phase (3,4,5)
6 File version/Survey within phase (0,1,A,J)
7-8 Type (FL, SV, DT, SD)
 NGKR51FL
 NGIR6AFL
 GHIR70FL
 ETIR61FL
 NGMR6AFL
 ZMCR51FL
 ZWBR51FL
 NGHR6AFL
 Maternal mortality
 Women’s status
 Domestic violence
 Female genital cutting
 Malaria
 Health expenditures
 HIV/AIDS
 Pill-taking behavior
 Sterilization experience
 Children’s education
 Consanguinity
 Environmental health, etc.
DHS surveys provide high-quality, representative national and
sub-national data on a wide range of maternal, child health,
and adult health indicators
Increasingly more data on key biomarkers and associated risk

factors
These data are useful in identifying higher-risk and vulnerable

populations, understanding risky behaviors, assessing


availability and access to services, and planning for prevention,
care and support, and treatment programs
Data are comparable across countries

Repeat surveys in a number of countries provide information

for understanding trends in key indicators to monitor and


evaluate programs
 Cross-sectional data—difficult to assess causality, especially
with behavioral data
 Sexual behaviors in the recent past may not correlate well with
HIV infection which may have preceded recent behavior
 Possibility of reporting bias, especially on sensitive questions
relating to sexual behavior or domestic violence
 Low participation rates can bias survey results (non-response)
 Surveys exclude non-household populations, such as those
living on the street or in institutions
 Non-availability of some indicators in the previous
surveys, and changing definitions limit analysis of trends
 Surveys are not suitable for low prevalence indicators
(e.g., in concentrated HIV epidemics), need large samples
 Cannot provide small area estimates, unless the sample
sizes are large
 Cannot detect small changes, unless the sample sizes are
large
 Surveys are not conducted annually; relatively expensive
than routine surveillance
 Provide answers to research questions being
studied
- Distributional Characteristics of data
- Variance in the data
- Differences within the data
- Relationships between/among variables
Such a plan helps the researcher assure that
at the end of the study:
 all the information (s)he needs have indeed

been collected, and in a standardised way;


 (s)he has not collected unnecessary data

which will never be analysed.


 Provides you with better insight into the

feasibility of the analysis to be performed as


well as the resources that are required.

18
 The plan for data processing and analysis
must be made after careful consideration of
the objectives of the study as well as of the
tools developed to meet the objectives.
 The procedures for the analysis of data

collected through qualitative and quantitative


techniques are quite different.

19
When making a plan for data processing and
analysis the following issues should be
considered:
 Sorting data,
 Performing quality-control checks,
 Data processing, and
 Data analysis.

20
 When the plan for data analysis is being
developed the data, of course, is not yet available.
However, in order to visualise how the data can be
organised and summarised it is useful at this
stage to construct DUMMY TABLES.
 A DUMMY TABLE contains all elements of a real

table, except that the cells are still empty.

21
Age Frequency Percentage
15-19
20-24
25+
Total 100.0

22
RESIDENCE Currently Not currently Total
using any using any
method of contraceptive
contraceptive
N (%)
N (%)
Urban
Semi-Urban
Rural
Total
Chi-square= ****; df= ********; p< *****
23
NEVER DO ANALYSIS when:
You have not read DHS Documentation Guides,

especially DHS Recode Manual


You have not initialized data for analysis using

svyset range of commands


You have not read the basic information and

Tables in the DHS Final Reports


You do not know about the use of correct weights

to use or the weighting procedures


When you don’t understand how to handle missing

values
ENSURE:
You properly registered for data access on

MEASURE DHS website and get your own datafiles


You always retain your original data file

unmodified; save as a different file name


Check the frequency tabulation of each variable,

before further analysis–mean, bivariate or


multivariate analyses
You understand basic assumptions, limitations

and interpretation of statistical method you want


to use
 Confirm your results at first before building your
dofile, especially when you are recoding variables
 Ensure you are able to replicate some tables in

the FR before proceeding into further analysis:


eg: Current use of contraceptive among ever
married women:
Does your % and N tally with the % and N in the
FR for that variable.
 Register on the website of Measure DHS
 Request datasets among those available
 • A separate request is needed for biomarker

and GPS data


 Wait 24-48 hours
 Full step-by-step instructions:

 http://www.measuredhs.com/data/Access-In
structions.cfm
 DHS recode manual = DHS analysis bible
Download at:
 http://

www.measuredhs.com/publications/publicati
on-dhsg4-dhsquestionnaires-and-manuals.c
fm
 DHS Final Reports provide a wealth of
descriptive statistics about the most commonly
used indicators; they also provide sample sizes
(denominators) for calculating them
 You always need to make sure that your sample

sizes (and indicators, if same) match those in the


final report!!!
USING DHS DATA
THE BIG QUESTION?

Some things to check


1. Correct data file?
2. Correct denominator/selection of cases?

3. Correct weights?
4. Correct recoding – handling of special values
5. Correct variables? – check the recode manual
6. Correct tabulation? (e.g. row vs. column
percent)
 Read the table heading!
 Check your denominators

◦ Is the age range limited to men 15-


49 rather than all men interviewed?
◦ Is the denominator all children under
5? Youngest children born in the last
3 years?
• Missing values: check the questionnaire to see if
some people were skipped out of the question (if
the question is not applicable to them).
– HIV knowledge: if the respondent has never heard of
HIV, it is assumed that they do not know specific ways
HIV is transmitted
– If a woman is not using contraception, she is not asked
whether her husband knows she is using it
• Missing; don’t know: DHS standard = assumed to
be the negative category
– All “yes” answers = tabulated as positive responses; “no”
answers and DK/missings are treated as equivalent
 Make sure you are weighting, and using the
correct weight
◦ DHS convention is to show
weighted %s AND weighted Ns
 Definition: a weight is a number equal to the
inverse of the overall probability of being
selected, both through sample selection and
non-response.

 In practice: each woman interviewed is


assigned a weight, or a number that reflects
how much influence her data should have on
the national estimates.
 Weighting is an essential aspect in household
survey data analysis.
 Weighting is not required for a population

census or for fully self-weighting surveys


because samples are allocated proportionately
to the respective population across all strata,
clusters or secondary sampling units.
 In all other cases, appropriate weights must be

applied to each and every primary sampling


unit to derive meaningful estimates.
 In national surveys, weighting is necessary even
if a self-weighting sampling method is applied
because the ‘response rates’ vary among the
different population groups or secondary
sampling units (thus, the representations are
different).
 As such, sample weights are necessary for

analysing all common household survey data


sets.
 
• DHS samples are designed to be representative of the
total population of a country.

• Sometimes, regions or provinces are under- or over-


sampled to make sure enough households/women from
each province are included in the total sample.

• Also, response rates (especially for HIV testing) may be


very different by province or urban/rural residence
• Weights are used to restore the representativeness of the
sample, so the total sample “looks like” the country’s total
population
– Weights “take into account” or “adjust for” disproportionate
sampling and non-response.
 Use the weight applicable to the unit of
analysis (household, women, men)
 When doing analysis involving households,

use the household weight


 When doing analysis involving women, use

the Women’s weight – even when involving


household variables, as this information is
now ‘brought down’ to the women’s level
 When doing analysis involving men, use the

Men’s weight
weighting
Don’t forget to
Unit of analysis Variable
divide by
Households hv005
1,000,000!
Women or
children v005
Domestic Violence d005
In Stata:
Men mv005
gen wgt=v005/1000000
HIV test results hiv05
• Make sure you can match your results to the Final
Report tables
– A huge advantage of DHS data is that you can almost
always check your work against the FR tabs

• Even if you are using a subset of data (e.g. currently


married women using contraception), run your
tabulation for the denominator used in the FR table
to check your work (e.g. all ever-married women)

– We all make mistakes in our coding. Always check your


coding against the final report.
– It is your responsibility as an analyst to make sure your
coding is correct. DHS cannot publish results that are
incorrect.
For frequencies or tables where you don’t need
significance testing or confidence intervals:
– tab var1 var2 [iw=v005/1000000]

– generate hivwgt=hiv05/1000000
– tab hiv03 v025 [iw=hivwgt]
• iw is iweight, or “importance” weight

This kind of weight is faster to use, as Stata has to


perform few calculations
Without using
SVY, higher
levels of
education are
significantly
associated
with
contraceptive
use

Using SVY,
the
relationship is
no longer
significant at
the 95% level
• For analyses where you do need significance
testing or confidence intervals
• We need to tell Stata we’re using survey data so
that Stata takes the sample design into account
when calculating standard errors
• General format:
– svyset [pw=weight], psu(cluster)
strata(strata)
o pweight is a sampling weight
• To tabulate with a confidence interval:
– svy: tab var1 var2, ci
• Use help svy for lots of additional information
and explanation!
• To look at the standard error of education
levels among women:
– svy: tab v106, se
• To look at the confidence interval of education
levels among women by urban/rural:
– svy: tab v106 v025, col ci
• To run a Pearson’s chi-squared test
(approximation) to see if levels of education
among women are statistically significantly
different by urban/rural:
– svy: tab v106 v025, col pearson
The ‘Rule of Thumb’: Use the weight from the
smaller sample.
 Stata provides two ways to analyze survey data
such as the DHS data.
The survey Commands
 The preferred way is to use the family of

commands that begin with svy:. (See help survey


in Stata for a list of commands that can be run
after svy:)
 These commands were designed especially for

analyzing data from sample surveys.


 Before using any of the survey estimation
commands, first use the svyset command to
specify the variables that describe the
stratification, sampling weight, and primary
sampling unit variables.
 You can try svyset by running the following
commands:
 gen psu = v021
 gen strata = v022
 gen sampwt = v005/1000000
 svyset psu [pw = sampwt], strata(strata)

Where, v005 sample weight; v021 primary


sampling unit ; v022 sample stratum number

 PLEASE CHECK DHS MANUALS TO CONFIRM


APPROPRIATE STRATA
 Weights restore the representativeness of
sampled data
 Weights correct for non-response
 All tables in DHS final reports are weighted

unless otherwise noted


 Must use weights when using DHS data!
 Unweighted Results will MOST LIKELY BE

REJECTED when you send such papers for


publication
 CHECK DHS MANUALS FOR MORE INFORMATION
 Preliminary Reports
 􀂾Final Reports
 􀂾Key Findings
 􀂾Fact Sheets
 􀂾Wall Charts
 􀂾Policy Briefs
 􀂾Analytical Reports
 􀂾Comparative Reports
 􀂾Further Analysis Reports
 Module 1:Introduction to Demographic and
Health Surveys (DHS)
 Module 2: Basic Statistics and Demographic and

Health Terms
 Module 3: Indicators and the DHS
 Module 4: Steps in Conducting a DHS Survey
 Module 5: Understanding DHS Tables and

Figures
 The DHS STATcompiler www.STAcompiler.com
 The DHS STATmapper www.STATmapper.com
HIV/AIDS Survey Indicators Database
www.measuredhs.com/hivdata
 HIVmapper: www.HIVmapper.com
 Facilitators Guide:
 Sign up today for email alerts to receive

announcements about new publications and


important events www.measuredhs.com and also
DHS USER FORUM:
DHS Social Media
 Facebook: /measuredhs
 Twitter: @measuredhs
 YouTube: /measuredhs
Linked In: /company/measuredhs
 Pinterest: /measuredhs
 TUBERCULOSIS
 FEMALE GENITAL CUTTING
 MALARIA
 NUTRITION
 MILLENUM DEVELOPMENT GOALS INDICATOR
 MIGRATION
YES WE CAN!

WE CAN CREATE OUR


OWN USER FORUM
BEFORE WE LEAVE
HERE!

THANK YOU
THANK YOU
THANK YOU

You might also like