Professional Documents
Culture Documents
Ce Information1 PDF
Ce Information1 PDF
Table of Contents
Section 1. CE program
Section 2. CE PUMD
Section 3. Interview Survey
Section 4. Diary Survey
Section 5. Sample code
Section 6. CE PUMD methodology
Section 1. CE program
The Consumer Expenditure Surveys (CE) program provides data on expenditures, income, and demographic
characteristics of consumers in the United States. The CE program provides these data in tables, LABSTAT
databases, news releases, publications, and public-use microdata files.
CE data are collected by the Census Bureau for the Bureau of Labor Statistics (BLS) in two surveys, the
Interview Survey for major and/or recurring items, and the Diary Survey for more minor or frequently purchased
items. CE data are primarily used to revise the relative importance of goods and services in the market basket of
the Consumer Price Index. The CE program conducts the only Federal household survey to provide information
on the complete range of consumers' expenditures and income. For more information, see the overview section
in the CE chapter in the BLS Handbook of Methods.
Section 2. CE PUMD
CE PUMD provide the individual responses to the two surveys from respondents. The data have been adjusted to
protect the confidentiality of respondents. The CE PUMD allow researchers to analyze expenditure, income, and
demographic data beyond what is provided in published tabulations.
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 1/14
4/8/2019 Consumer Expenditure Surveys
CE PUMD include data from both the Interview and Diary Surveys. Most files are analogous between the two
surveys, however the Interview Survey files contain roughly 50 additional detailed data files, as well as paradata
files that provide detail about the collection process. Table 3 Interview Survey files and content lists the major
files currently available, and their content. For a more comprehensive list of files provided in the CE PUMD, see
the Dictionary for the Interview and Diary Surveys.
For both the Interview and Diary Surveys, the files use the following conventions:
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 2/14
4/8/2019 Consumer Expenditure Surveys
How do data users link an interview or diary for a given Consumer Unit (CU) in different files?
NEWID links data for one CU across interviews and files. Users cannot link CUs across surveys because
the Diary and Interview surveys use different samples.
How is the variable NEWID structured?
NEWID is a unique sequential number concatenated with the number of the interview. The last digit of
NEWID indicates the interview number in a series of 4, or the week of diary collection in a series of 2. All
values prior to the last digit, identify a CU.
The Interview Survey is a rotating panel survey in which approximately 10,000 addresses are contacted each
calendar quarter that yield approximately 6,000 useable interviews. One-fourth of the addresses that are
contacted each quarter are new to the survey. After a housing unit has been in the sample for four consecutive
quarters, it is dropped from the survey, and a new address is selected to replace it. For more information, see the
chapter Consumer Expenditures and Income in the BLS Handbook of Methods.
Before 2015, the Interview Survey included a preliminary bounding interview, and each CU could be contacted
up to five times over five quarters. Although data from the bounding interview were not published, its purpose
was to minimize telescoping errors.ⅰ The CE program stopped fielding the bounding interview in 2015 due to
concerns about its effectiveness in reducing telescoping errors, cost, and impact on respondent burden. For more
information, see Ian Elkin's article Recommendation regarding the use of a CE bounding interview.ⅱ
For the Interview Survey, the files use the following conventions:
Each annual data release of the CE PUMD is processed using new data and new disclosure avoidance
guidelines. For quarters that appear in two different data releases, an "x" is added to the end of the file
name. This "x" is used as an indicator to inform users that the two files were processed under a different
set of rules and conditions and therefore the content may differ slightly. It is at the user's discretion as to
which file to use.
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 3/14
4/8/2019 Consumer Expenditure Surveys
Invalid blank due to invalid nonresponse; nonresponse that is not consistent with other data reported
B
by the CU
Valid blank for an expenditure that is a "parent record" where the expenditure was allocated to other
H
records and the original expenditure was overwritten with a blank
V Valid value; imputed or adjusted in some other way then topcoded or suppressed
W Valid value; allocated and imputed or adjusted in some other way then topcoded or suppressed
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 4/14
4/8/2019 Consumer Expenditure Surveys
Table 3 summarizes the Interview Survey files currently available. If users encounter a file that is not listed
below, consult the Dictionary for the Interview and Diary Surveys for additional details.
The roughly 50 detailed data files include expenditure and non-expenditure information that is directly collected
from sections of the Interview Survey (See the Survey materials page for more information). The Dictionary for
the Interview and Diary Surveys contains additional information related to the content and makeup for each of
these files. Each detailed data file consist of five quarters of data. Because these files correspond to specific
sections in the survey, they have a number of differences between them. These are the main differences:
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 5/14
4/8/2019 Consumer Expenditure Surveys
The number of records per CU differs. Some files having multiple records per CU, some have one record
per CU, and some have no records per CU interviewed each quarter.
The method to identify unique records differs. Users can identify unique records with NEWID and
depending on the file these variables:
SEQNO is assigned sequentially during the interview as each expenditure record is recorded into
the database.
ALCNO is assigned sequentially for each record that has been allocated from one expenditure. For
example, a CU may report spending $50 on a pair of men's pants and a shirt. The CE program will
allocate out that record into two separate records, one for men's pants and shorts ($30) and one for
men's shirts ($20).
Here is an example of the detailed data file VEQ (Vehicles, maintenance and repair) and some of the variables it
contains.
Paradata files provide data about the interview process. Beginning in 2009, the CE program began releasing
paradata for the Interview Survey. The CE program does not release paradata for the Diary Survey. Paradata are
available in two datasets:
The Diary Survey is a panel survey in which approximately 5,000 addresses are contacted each calendar quarter
that yield approximately 3,000 useable interviews.ⅵ After a housing unit has been in the sample for two
consecutive weeks, it is dropped from the survey, and a new address is selected to replace it. For more
information, see the chapter Consumer Expenditures and Income in the BLS Handbook of Methods.
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 6/14
4/8/2019 Consumer Expenditure Surveys
For the Diary Survey, the files use the following conventions:
Blank due to invalid nonresponse; nonresponse that is not consistent with other data reported by
B
the CU
For Diary Survey expenditures located on the EXPD files, the variable ALLOC can be utilized to
determine if an expenditure has been adjusted, allocated, topcoded, or any combination of the three. Table
5 lists the allocation codes and its corresponding flag values.
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 7/14
4/8/2019 Consumer Expenditure Surveys
Table 6 summarizes the Diary Survey files currently available. If users encounter a file that is not listed below,
consult the Dictionary for the Interview and Diary Surveys for additional details.
NEWID, UCC,
DTID Income imputation iterations Annual 4 2004
IMPNUM
Annual weighted calendar year estimates of national totals, averages, standard errors, and CVs. The
code integrates data from both surveys on a UCC level. This code is available in SAS, R, and STATA on
the PUMD documentation page.
Code to approximate the published tables. CE PUMD estimates do not fully match the table estimates.
For more information, see FAQ 26 on the CE FAQ page. This code is available in SAS on the PUMD
documentation page.
When preparing estimates with UCCs, users need to understand their hierarchical groupings. The hierarchical
stub files list UCCs' hierarchical grouping by major categories. For more information on stub files, see the
PUMD documentation page. When embarking on a research project, the data and its estimates have several
limitations due to issues including small samples or respondent recollection that may affect the estimates.
This section describes the estimation procedures for the Interview Survey and the estimation procedures for the
Diary Survey; the formulas to estimate weighted annual calendar year estimates; and sampling statements. The
CE program integrates information from both the Interview and Diary Surveys in its publications. Therefore any
analysis limited to only the one survey may produce results that do not match the published CE estimates. In
addition, users may find that estimates do not match the published estimates due to the non-disclosure criteria
that are applied to the CE PUMD. For more information on non-disclosure requirements, see the Protection of
Respondent Confidentiality page.
For the Interview Survey, users may want to consider the following general concepts:
When calculating data for 2016, interviews conducted in January 2017 cover expenditures made between
October 2016 and December 2016, and are used to estimate data for these three months in 2016. Similarly,
interviews conducted in March 2017 cover expenditures between December 2016 and February 2017 and
are used to estimate data for December 2016. Thus, users have to use the first file for 2017 to estimate data
for the last quarter of 2016. Charts 1 illustrates that concept. The green months show those that are in
scope for the estimates of 2016 and the yellow months show those months in 2017 that out of scope.
Chart 1:
Months in
scope for
quarter 5
(FMLI171)
A similar differentiation of scope happens at the beginning of the year. The data collected in January of
2016 are not in scope for 2016 expenditures because the January interview collects data for the last 3
months of 2015. However, data collected in February and March 2016 are partially in scope. Data
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 9/14
4/8/2019 Consumer Expenditure Surveys
collected in February includes data for January of 2016, and data collected in March 2016 includes data for
January and February 2016. See chart 2.
Chart 2:
Months in
scope for
quarter 1
(FMLI161)
Finally, for the months April through December all months are in scope. For example, quarter 2 interviews
conducted in April, May, and June collect expenditure data for January 2016 through May 2016, which are
all in scope for 2016. See chart 3. The same holds true for Quarter 3 and 4.
Chart 3:
Months in
scope for
quarter 2
(FMLI162)
How much does a CU contribute to a calendar year estimate in each interview months?
A CU's contribution depends on the interview month and year. For information on how to identify a CU's
contribution to a calendar year estimate, see Section 6.3 Formulas.
Is the periodicity of variable values consistent across files?
No, it is not. Different files and different variables within files may have different periodicities. For more
information, see the Table 3: Interview Survey files and content.
This section provides users of the Diary Survey with procedures to estimate annual calendar means.
CUs self-report a detailed description of all expenses using a product-oriented diary for two consecutive 1-week
periods. Data entries can start on any day of the week. Data collected each week are treated as statistically
independent - each week's diary is separately weighted to be representative of the population. For more
information, see the collections and data sources section in the chapter of Consumer Expenditures and Income in
the BLS Handbook of Methods.
For the Diary Survey, users may want to consider the following concepts:
For example for 2017 estimates, users need the files for quarter 1 through 4 for 2017.
How much does a CU contribute to a calendar year estimate in each interview month?
In the Diary Survey, a CU contributes 100 percent of its expenditures to the calendar year. Unlike the
Interview Survey, the Diary Survey has no lag between the time an expenditure occurs and the time it is
reported, which means that the potential contribution of each CU to the mean is the same.
6.3 Formulas
The formulas described below can be used to calculate weighted estimates that use data from both surveys. The
formulas calculate annual calendar year aggregates, averages, and standard errors for expenditures and reported
income. While these formulas can also be used to calculate annual averages of imputed income as well, they
cannot be used to calculate standard errors. For more information on this topic, see the Description of Income
Imputation Beginning with 2004 Data.
When integrating data across surveys, keep in mind that estimates created from the Diary Survey will yield
a weekly amount and therefore, users will need to adjust their estimates so that each survey result
represents the same time period. Inflating the Diary Survey UCC estimate by a multiplier of 13, will result
in a quarterly amount, which can then be summed with an Interview Survey estimate.
This section presents the methods to calculate the population, aggregate values, and average values for
expenditures or income for a calendar year.
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 11/14
4/8/2019 Consumer Expenditure Surveys
Denominator: Population
For the first four quarters, MO_SCOPE is defined by the value of QINTRVMO:
X = Expenditures or income variables by NEWID. This formula can be used for quarterly,
annual, weekly, or monthly data.
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 12/14
4/8/2019 Consumer Expenditure Surveys
Non-sampling errors can be attributed to many sources, such as definitional difficulties, differences in the
interpretation of questions, inability or unwillingness of the respondent to provide correct information, mistakes
in recording or coding the data obtained, and other errors of collection, response, processing, coverage, and
estimation of missing data. Estimates using a small number of observations are less reliable. Research articles
examining CE measurement error and nonresponse bias are included in the CE library. The CE program
regularly examines CE data in the annual data quality assessment and compares CE results with other sources of
federal statistics. For more information, see the Data Quality and Comparisons page.
Standard error
Note that this method does not work for imputed income data. For information on
calculating sampling errors from imputed income, see the User's Guide to Income
Imputation in the CE.
The CE survey sample is a nationwide household survey representing the entire U.S. civilian noninstitutional
population. It includes people living in houses, condominiums, apartments, and group quarters such as college
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 13/14
4/8/2019 Consumer Expenditure Surveys
dormitories. It excludes military personnel living overseas or on base, nursing home residents, and people in
prisons. The civilian noninstitutional population represents more than 98 percent of the total U.S. population. For
more information, see sample design in the chapter Consumer Expenditures and Income of the BLS Handbook
of Methods.
6.4.2 Weighting
Each CU included in the CE sample represents a given number of CUs in the U.S. population, which is
considered to be the universe. Weighting is used to adjust the relative contribution of each CU to reflect the
inverse of its selection probability, as well as to account for nonresponse and to match certain characteristics to
known control totals. For more information, see sample design in the chapter Consumer Expenditures and
Income of the BLS Handbook of Methods.
ⅰ Telescoping errors refer to the temporal displacement of an event. Respondents of the CE surveys may perceive
recent events to be more remote than they are (backwards telescoping) and distant events to be more recent than
they are (forward telescoping).
ⅱ Ian Elkin, Recommendation regarding the use of a CE bounding interview, 2013, Bureau of Labor Statistics.
ⅴ Quarterly summary expenditures are presented as two variables - one containing expenditures made in the
previous calendar quarter and one containing expenditures made in the current calendar quarter.
ⅵ For more information on the number of contacted addresses and completed interviews, see the CE Data
Quality Profile.
ⅶ Primary keys identify each unique record in the database.
U.S. Bureau of Labor Statistics | Consumer Expenditure Surveys, PSB Suite 3985, 2 Massachusetts Avenue, NE
Washington, DC 20212-0001
https://data.bls.gov/cgi-bin/print.pl/cex/pumd-getting-started-guide.htm 14/14