You are on page 1of 11



Analysis of the National Survey on Drug Use and Health
The National Survey on Drug Use and Health is an annual survey of the civilian
non-institutionalized population of the United States of America who are the age of twelve
or older. The Data collected is used to create both national and state-level estimates of the
use of illegal drugs, alcohol and tobacco products. The survey also gathers data on the
mental health of the civilian non-institutionalized population. This includes residents of
households living in houses, condominiums, apartments, dorms, shelters, group homes and
civilians living on military bases. Active military personnel, homeless and transient persons
are not included. The Survey polls approximately 70,000 randomly selected individuals
and is funded by The Substance Abuse and Mental Health Services Administration. The
information collected from The National Survey on Drug Use and Health provides national
and state-level estimates on the use of tobacco products, alcohol, illicit drugs (including
non-medical use of prescription drugs) and mental health in the United States.
The survey is conducted through an in-person interview at the selected person’s
place of residence. The survey uses a combination of computer assisted personal
interviewing (CAPI) to collect demographic information such as race, ethnicity, age and
income. The computer assisted personal interviewing system is carried by the interviewer
and improves survey data quality by preventing the interviewer from accidently skipping
questions, preventing data entry errors, and alerting the interviewer to inconsistent
responses. (Gilbert). An audio computer-assisted self-interviewing (ACASI) system is

The survey maintains a series of core questions which remain constant from year to year. which . and one of the strengths of the National Survey on Drug Use and Health is its ability to capture trends. use of alcohol and tobacco. dependence and abuse involving drugs and alcohol. However. The main topics covered by this survey include usage of illegal drugs. nonmedical uses of prescription psychotherapeutic drugs. Any single survey is subject to both measurement and non-measurement errors. the survey must also be updated yearly in order to keep up with the various new drugs and the new ways of administering them. not just to capture drug use in a single year. Audio computer-assisted self-interviewing are used so that the interviewee may answer sensitive questions more honestly than if they were asked by an interviewer since they can answer privately without feeling judged (Gilbert). Historically the focus of the survey was on illicit drug use among youth and young adults. and the treatment of substance abuse and mental health problems. The sample design for the National Survey on Drug Use and Health first stratifies the country by states.used for the majority of questions in the survey. while adding additional questions in order to accurately capture any changes which may have affected people’s interpretation of the survey questions. but over the years the scope of the survey has become broader and has expanded into other areas One of the main goals of the National Survey on Drug Use and Health is to identify trends in the data from year to year. (SAMHSA 113) The 2013 National Survey on Drug Use and Health was an extension of previous years’ surveys from 2005 to 2009. especially under various brand names. which were extended through 2013. The survey must maintain a constant methodology in order to accurately analyze data from year to year. mental health problems. One of the largest issues is the emergence of new prescription drugs.

adjacent Census blocks were merged to form the second-stage of the cluster sample. A list of 227. these are the secondary sampling units. which are the geographic regions designed for the national Census. Addresses were deemed ineligible if they contained active military personnel or were considered institutionalized group quarters such as rehabilitation or mental health facilities. one cluster of census blocks was chosen within each Census tract with probability proportional to sample size.600. Michigan. Then. The substratum were determined based on the population density of the area in order to yield the same amount of elements in each region. The Survey uses Census Tracts. Each state is then stratified into substratum based on population. Pennsylvania.067 were deemed eligible. For the rest of the states the target sample size was 900 with actual sample sizes ranging from 852 to 953. New York. Eight of the primary sampling units. of which 190. (SAMHSA 114) Within every secondary sampling unit. Within each selected Census tract. 48 Census tracts were selected with probability proportional to their population size. Ohio.075 addresses was created from the addresses in each of the area segments. each with approximately the same number of sample elements within.are the primary sampling units. California. which were 3 month periods spanning the year. The primary sampling units were substratified into 900 secondary sampling units.503 to 2. such that each large sample state had 48 secondary sampling units and each small state had 12 secondary sampling units. Actual sample sizes ranged from 3. (SAMHSA 116) The households which were eligible are the third-stage of the cluster sample and were . Florida.729. as the first-stage in the three-stage cluster design with equal probability. These sampled segments were distributed evenly into four different samples. and Texas were designated as large sample states with target sample sizes of 3. Illinois.

and inputting the data on a computer without the interviewer knowing their responses. The sampling begins with each of the selected households or dwelling units being mailed an introductory letter. The core questions compose the first part of the survey. A field interviewer then contacts them and collects demographic information on all people living in the household or dwelling unit. stimulants and sedatives. while the second part of the interview consists of the interviewee reading. inhalants. The core set of questions remain constant from year to year in an effort to gather data for basic trend measurements of prevalence estimates. substance dependence or . pain relievers. hallucinogens. or listening to the questions through headphones. The supplemental or non-core questions include perceived risks of substance abuse. (SAMHSA 115) The National Survey on Drug Use and Health consists of core and noncore sections. Finally. crack cocaine. questions pertaining to use of tobacco. marijuana. This algorithm is designed to select the appropriate number of people in each age group. A preprogrammed algorithm selects anywhere from 0 to 2 people to take the survey. and include initial demographic items. injectable drug habits.selected randomly through an automated screening procedure programmed into a computer carried by the interviewers. heroin. cocaine. This interview is also available in Spanish if the interviewee requests it. tranquillizers. from zero to two people from each household were chosen to participate in the survey (SAMHSA 114). The interviewer interviews the selected number of residents separately in a private area of the dwelling in order to maintain privacy. Interviewees are paid $30 upon the successful completion of the interview. alcohol. The first half of the interview is read by the interviewer and imputed into a computer.

but also reported first using the drug at his or her current age. which was developed specifically for the survey. Logical editing of the data involved using data from the respondent on previous question in order to reduce the amount of item nonresponse. and also to make related data elements consistent with each other as well as any other inconsistencies which may have arisen in the responses. (SAMHSA 119) For most variables.abuse. treatment for substance abuse. (SAMHSA 115) The data collected by field interviewers is then processed into a raw data file. For example. PMN also allows for the relative importance of covariates can be determined by standard modeling techniques. (SAMHSA 118) This includes items that were accidently skipped. which is an improvement over hot-deck imputation. The next step in processing the data was the logical editing of the data. Only interviews in which data on lifetime cigarette use. the correlations across response variables to be accounted for by making the imputation multivariate. Predictive Mean Neighborhoods allow for the ability to use covariates to determine donors. the inconsistent period of most recent use was replaced with an "indefinite" value. Written responses in which the participants marked OTHER and specified were assigned a numeric code that was run through an online database to generate the appropriate variable. any missing or ambiguous responses in The National Survey on Drug Use and Health are imputed using a methodology known as predictive mean neighborhoods. incarceration. if a respondent had reported his or her last time using a drug more than 12 months ago. and the inconsistent age at first use was replaced with a missing data code. If this occurred. and . then it is impossible for both of those statements to be true. health insurance coverage and income. and at least 9 out of 13 of the other substances in the core section existed.

In the modeling stage of Predictive Mean Neighborhoods method. core drug use variables such as most recent usage. The predicted means are computed both for respondents with and without missing data. the missing or ambiguous value is replaced by a responding value from a donor randomly selected from a set of potential donors. (SAMHSA 120) In standard hot-deck imputation. The models used in the 2013 survey include binomial logistic regression. PMN is a combination of a model-assisted imputation and a random nearest neighbor hot-deck procedure. The hotdeck procedure within the PMN is used to ensure that missing values which are imputed are consistent with values for other variables.sampling weights to be easily incorporated in the models. In the hot-deck procedure for PMN. The donors consist of either the set of the closest 30 respondents or the set of respondents with a predicted mean within 5% percent of the predicted mean. multinomial logistic regression. income. These potential donors are those which are considered close according to a distance function to the unit with the missing. whichever set is smaller. the donors consist of respondents with complete data who have a predicted mean close to that of the item. item non-response is replaced with a response from a similar respondent who has a value for that particular data point. and immigrant status. poisson . the respondent with the predicted mean closest to that of the item is selected as the donor. health insurance. If no respondents are available who have a predicted mean within 5 percent of the item value. the model chosen depended on the individual variable nonresponse is being imputed. work status. For random nearest neighbor hot-deck imputation. Variables imputed using PMN are the core demographic variables. frequency of use and age of first use.

regression. The designed based weights dk incorporate this extra level of selection. poststratification of household weights to meet population controls for various household level demographics by state. adjustment of responding person weights for nonresponse at questionnaire level. where an extra stage of selection occurs at the Census Tract before the selection of a segment. This generalized exponential model incorporates specific bounds for the adjustment factor. adjustment of household weights for extremes. time-to-event regression and ordinary linear regression. (SAMHSA 120) The analysis weights were developed by using design-based weights as the product of the inverse of the selection probabilities at each selection stage. In each of these models they incorporate the sampling design weights. poststratification of selected person weights. Ak(𝜆) represents an adjustment factor which is used to account for nonresponse to poststratify to known population control totals. poststratification of responding person weights and . The final weights w k=dk ak (𝜆) minimize the distance function: The purpose of using weights in this sample design is to ensure that the actual population matches those surveyed. This survey has a four stage sample selection scheme. This approach was used at several stages of the weight adjustment process which include adjustment of household weights for nonresponse at the screener level. Weight adjustments were based on Deville and Särndal's logit model.

Instead. a state which has stricter laws and a larger police force may have less drug use among its population due to there being less available drugs to consume. An effort was made to include as many state specific covariates into each of the multivariate models as possible in order to accurately adjust the weights. the State group within the census division. the census division level within the Nation. 𝑁 is the estimated proportion of drug users: Standard Error was estimated using a Taylor series approximation approach to find an ̂d is free of sampling error.adjustment of responding person weights for extremes. When the domain size 𝑁 the SE for the total number of substance users is: . a hierarchical structure was used in grouping states with covariates defined at the national level. Estimates for the proportions of drug use were found using a ratio estimate where 𝑌̂ d is a linear statistic estimating the number of substance users in the ̂d is a linear statistic estimating the total number of persons in domain d. (SAMHSA 121) The variance for the total number of drug users was estimated using a Taylor series linearization approach. To clarify. the state specific covariates are variables which may influence the population of a state to be more or less likely to use drugs. an estimate of estimate for the SE of 𝑝̂ d. This would lead to lower values for drug use and cause omitted variable bias in a regression model. and the State level. It was not possible to retain all state specific covariates because subdivision of state samples by demographic covariates often produced sample sizes which were too small for the level of accuracy wanted. and 𝑝̂ d domain d. For example.

a minor being asked about drug use may be more inclined to lie about their drug use habits if they believe their parents or guardians may find out.S. Many people are also very hesitant about providing information that would be self-incriminating to the government. . participants in the survey may provide inaccurate information either intentionally or unintentionally (Harrison. Since the survey collects self-reported data. Hughes 4). for fear of being arrested. A participant providing incorrect data unintentionally may be simply failing to remember all instances of drug use and their frequency. For example. One of the largest concerns with the accuracy of the National Survey on Drug Use and Health is the accuracy of the data collected. 𝑁 ̂d is not forced to match U. It is for these reasons that privacy of the responses collected. as well as ensuring that the interviewee believes that their information will be private is crucial. or suspicious of reporting data on activities which are illegal. When this occurs.S. Even if 𝑁 ̂d is estimates. It is also possible that participants choose to lie on the survey because they are either ashamed of their drug use. or be unaware of what drugs they have actually consumed.̂d are those in which they This standard error is accurate when the domain size estimates 𝑁 have been forced to match the U. the estimated standard error is likely accurate as long as the variance of 𝑁 small relative to the variance of 𝑝̂ d (SAMHSA 124). and on a ACASI system. These are some of the reasons that the interview is conducted alone. Census Bureau population from the sample design. Census Bureau population estimates through the weight ̂d is not subject to any additional sampling error calibration process.

The National Survey on Drug Use and Health is widely used and accepted in various journals.Socioeconomic as well as cultural factors may influence certain groups’ accuracy of the data provided as well as response rates since some groups may be less trusting of the government (Harrison. (SMA) 14-4863. One such example is the Youth Behavioral Risk Survey. Despite some inconsistencies with other surveys. and tobacco in the civilian. The National Survey on Drug Use and Health tends to be consistent with other surveys on drug use. sampling methods. or when their time of last use was. Inconsistencies may be the result of different populations being sampled. however there are some inconsistencies. . and methods of estimation. 2014. Ambiguous responses were allowed because forcing interviewees to respond to questions they felt uncomfortable answering would lead to false responses. or due to the survey being conducted in school as opposed to in the household. The survey is the primary source of information on the use of illicit drugs. Rockville. alcohol. questionnaires. method of data collection. Annotated Bibliography Substance Abuse and Mental Health Services Administration. HHSPublication No. (SAMHSA 137). and is used extensively for other surveys and research projects. While item response on the survey is generally high. Hughes 7). Results from the 2013 National Survey on Drug Use and Health: Summary of National Findings. noninstitutionalized population of the United States aged 12 years old or older. NSDUH Series H-48. which finds higher usage of drug among high schools students than the National Survey on Drug Use and Health. MD: Substance Abuse and Mental Health Services Administration. This discrepancy may stem from differences in the sample design. interviewees were allowed to give inconclusive or inconsistent responses on whether they have ever used a given drug.

2014.. L.. including both the results from the survey as well as sample design and analysis Harrison. Mar. This Source documents the uses and history of CAPI as well as ACASI systems. The validity of self-reported drug use: Improving the accuracy of survey estimates (NIH Publication No. (Eds. A.). Substance Abuse and Mental Health Services Administration (SAMHSA). Department of Health and Human Services (HHS). MD: National Institute on Drug Abuse. It describes the benefits as well as issues with using these systems and their impact on the data collected.p. NIDA Research Monograph 167). Gilbert. N. & Hughes. This report presents a detailed account of the National Survey on Drug use and Health.This report was prepared by the Center for Behavioral Health Statistics and Quality (CBHSQ)."Social Research Update 3: Computer Assisted Personal Interviewing.S. (1997). Nigel. Rockville. s choolsdadsasdads . 97-4147. It also analyzes the conditions in which people will lie or misrepresent drug use in surveys. U. "Social Research Update 3: Computer Assisted Personal Interviewing. This source studies analyzed the conditions in which survey respondents will present valid information when asked about drug use in surveys. Web. 1993. 04 Dec.