01 Using Models To Estimate Hog and Pig Inventories Proceedings of A Workshop - 25526

THE NATIONAL ACADEMIES PRESS
This PDF is available at http://nap.edu/25526 SHARE

   
Using Models to Estimate Hog and Pig Inventories:

Proceedings of a Workshop (2019)
DETAILS
110 pages | 6 x 9 | PAPERBACK

ISBN 978-0-309-49572-1 | DOI 10.17226/25526
CONTRIBUTORS
GET THIS BOOK Nancy J. Kirkendall, Rapporteur; Committee on National Statistics; Division of
Behavioral and Social Sciences and Education; National Academies of Sciences,
Engineering, and Medicine
FIND RELATED TITLES
SUGGESTED CITATION
National Academies of Sciences, Engineering, and Medicine 2019. Using Models to

Estimate Hog and Pig Inventories: Proceedings of a Workshop. Washington, DC:
The National Academies Press. https://doi.org/10.17226/25526.

Visit the National Academies Press at NAP.edu and login or register to get:
– Access to free PDF downloads of thousands of scientiﬁc reports

– 10% off the price of print titles
– Email or social media notiﬁcations of new titles related to your interests
– Special offers and discounts

Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press.
(Request Permission) Unless otherwise indicated, all materials in this PDF are copyrighted by the National Academy of Sciences.
Copyright © National Academy of Sciences. All rights reserved.

Using Models to Estimate Hog and Pig Inventories: Proceedings of a Workshop
Nancy J. Kirkendall, Rapporteur
Committee on National Statistics
Division of Behavioral and Social Sciences and Education
Copyright National Academy of Sciences. All rights reserved.

THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001
This activity was supported by contracts between the National Academy of Scienc-
es and the National Agricultural Statistics Service (Agreement 58-3AEU-4-005, as
amended). Any opinions, findings, conclusions, or recommendations expressed in this
publication do not necessarily reflect the views of any organization or agency that
provided support for the project.
International Standard Book Number-13: 978-0-309-49572-1

International Standard Book Number-10: 0-309-49572-5
Digital Object Identifier: https://doi.org/10.17226/25526
Additional copies of this publication are available from the National Academies Press,
500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-
3313; http://www.nap.edu.
Copyright 2019 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Academies of Sciences, Engineering, and Medicine.

(2019). Using Models to Estimate Hog and Pig Inventories: Proceedings of a Workshop.
Washington, DC: The National Academies Press. doi: https://doi.org/10.17226/25526.

The National Academy of Sciences was established in 1863 by an Act of Con-

gress, signed by President Lincoln, as a private, nongovernmental institution
to advise the nation on issues related to science and technology. Members are
elected by their peers for outstanding contributions to research. Dr. Marcia
McNutt is president.
The National Academy of Engineering was established in 1964 under the charter
of the National Academy of Sciences to bring the practices of engineering to ad-
vising the nation. Members are elected by their peers for extraordinary contribu-
tions to engineering. Dr. John L. Anderson is president.
The National Academy of Medicine (formerly the Institute of Medicine) was

established in 1970 under the charter of the National Academy of Sciences to
advise the nation on medical and health issues. Members are elected by their
peers for distinguished contributions to medicine and health. Dr. Victor J. Dzau
is president.
The three Academies work together as the National Academies of Sciences, En-
gineering, and Medicine to provide independent, objective analysis and advice
to the nation and conduct other activities to solve complex problems and inform
public policy decisions. The National Academies also encourage education and
research, recognize outstanding contributions to knowledge, and increase public
understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medi-
cine at www.nationalacademies.org.

Consensus Study Reports published by the National Academies of Sciences, En-

gineering, and Medicine document the evidence-based consensus on the study’s
statement of task by an authoring committee of experts. Reports typically in-
clude findings, conclusions, and recommendations based on information gath-
ered by the committee and the committee’s deliberations. Each report has been
subjected to a rigorous and independent peer-review process and it represents
the position of the National Academies on the statement of task.
Proceedings published by the National Academies of Sciences, Engineering, and

Medicine chronicle the presentations and discussions at a workshop, symposium,
or other event convened by the National Academies. The statements and opin-
ions contained in proceedings are those of the participants and are not endorsed
by other participants, the planning committee, or the National Academies.
For information about other products and activities of the National Academies,
please visit www.nationalacademies.org/about/whatwedo.

PLANNING COMMITTEE FOR THE WORKSHOP ON

USING MODELS TO ESTIMATE HOG PRODUCTION
ERIC V. SLUD (Chair), U.S. Census Bureau and University of Maryland,

College Park
KATHERINE BENNETT ENSOR, Rice University
RONALD L. PLAIN, University of Missouri-Columbia (emeritus)
LEE L. SCHULZ, Iowa State University
CHRISTOPHER K. WIKLE, University of Missouri
NANCY J. KIRKENDALL, Project Director

ANTHONY MANN, Program Associate

COMMITTEE ON NATIONAL STATISTICS
ROBERT M. GROVES (Chair), Georgetown University

ANNE C. CASE, Princeton University
JANET M. CURRIE, Princeton University
DONALD A. DILLMAN, Washington State University
DIANA FARRELL, JP Morgan Chase Institute, Washington, DC
ROBERT GOERGE, University of Chicago
HILARY HOYNES, University of California, Berkeley
DANIEL KIFER, Pennsylvania State University
SHARON LOHR, Arizona State University (emerita)
THOMAS L. MESENBOURG, U.S. Census Bureau (retired)
SARAH M. NUSSER, Iowa State University
JEROME P. REITER, Duke University
JUDITH A. SELTZER, University of California, Los Angeles
C. MATTHEW SNIPP, Stanford University
JEANETTE WING, Columbia University
BRIAN HARRIS-KOJETIN, Director

CONNIE F. CITRO, Senior Scholar
vi

Acknowledgments
This Proceedings of a Workshop has been reviewed in draft form by

individuals chosen for their diverse perspectives and technical expertise.
The purpose of this independent review is to provide candid and critical
comments that will assist the National Academies of Sciences, Engineer-
ing, and Medicine in making each published proceedings as sound as
possible and to ensure that it meets the institutional standards for quality,
objectivity, evidence, and responsiveness to the charge. The review com-
ments and draft manuscript remain confidential to protect the integrity of
the process.
We thank the following individuals for their review of this proceed-
ings: Cynthia Z. Clark, independent consultant, McLean, Virginia, and
Lee L. Schulz, Department of Economics, Iowa State University. We also
thank staff member Linda Casola for reading and providing helpful com-
ments on this manuscript.
Although the reviewers listed above provided many constructive
comments and suggestions, they were not asked to endorse the content
of the proceedings nor did they see the final draft before its release. The
review of this proceedings was overseen by Thomas A. Louis, Depart-
ment of Biostatistics, Johns Hopkins Bloomberg School of Public Health.
He was responsible for making certain that an independent examination
of this proceedings was carried out in accordance with standards of the
National Academies and that all review comments were carefully consid-
ered. Responsibility for the final content rests entirely with the rapporteur
and the National Academies.
vii


Contents
1 Introduction 1
2 Motivation and Challenges 5
3 The Quarterly Hog Inventory Survey 13
4 Setting Official Estimates: The Hog Board 19
5 Modeling Efforts 27
6 Web-Scraping Efforts 37
7 Modeling Swine Population Dynamics 43
8 Discussion of Detection and Monitoring 53
9 Discussion of Modeling 63
10 Discussion of State-Level Estimation 69
11 Discussion of Visions for the Future 79
References 85
Appendixes
A Agenda and List of Participants 87
B Biographical Sketches of Steering Committee
Members and Speakers 93
ix


1
Introduction
In 2014, the National Agricultural Statistics Service (NASS) engaged

the Committee on National Statistics (CNSTAT) to convene a planning
committee to organize a public workshop for an expert open discussion
of their then-current livestock models. The models had worked well for
some time. Unfortunately beginning in 2013, an epidemic that killed baby
pigs broke out in the United States. The epidemic was not fully realized
until 2014 and spread to many states. The result was a decline in hog
inventories and pork production that was not predicted by the models.
NASS delayed the workshop until 2019 while it worked to develop
models that could help in times both of equilibrium and shock (disease or
disaster), as well as alternative approaches to help detect the onset of a
shock. The May 15, 2019, workshop was consistent with NASS’s 2014
intention, but with a focus on a model that can help predict hog invento-
ries over time, including during times of shock (see Box 1-1 for the com-
mittee Statement of Task).
PLANNING THE WORKSHOP

CNSTAT recruited and gained approval for the planning committee:
five individuals with expertise in model-based estimation and measures
of uncertainty for small geographic areas using multiple data sources;
knowledge of data sources relevant for hogs and pigs; and uses of hog
and pig estimates for decision making and analysis. The workshop was
organized into an introduction and 10 different sessions: (1) the require-
ments of the problem; (2) the surveys that collect hog inventory data from
1

2 USING MODELS TO ESTIMATE HOG AND PIG INVENTORIES
BOX 1-1
Statement of Task
The National Research Council’s Committee on National Statis-
tics (CNSTAT) will convene a steering committee to organize a public
workshop for the National Agricultural Statistics Service (NASS) on
model-based methods for producing estimates of hogs with measures of
uncertainty. Hog models being developed at NASS combine informa-
tion from surveys, administrative data, and expert opinion to produce
the requisite estimates with the goal of replacing current methods that
lack transparency and lack measures of uncertainty. The purpose of the
workshop is to provide feedback on the appropriateness of these models
and to suggest improvements or possible alternative approaches. Issues
for the workshop to consider include the appropriateness of the modeling
techniques developed by NASS, the extent to which model assumptions
can be validated, the robustness of the estimates to failure of one or more
assumptions, and other technical issues of model specification. In addi-
tion, the workshop will consider the suitability of the data that feed into
the model and the properties desired in the estimates of uncertainty.
farmers and other auxiliary data sources; (3) the functioning of the Agri-
cultural Statistics Board; (4) the models that have been developed and
pursued; (5) current web-scraping efforts; (6) current modeling efforts;
(7) discussions on detection and monitoring of disease; (8) discussions
of NASS modeling efforts; (9) discussions of extending the models to
provide state-level estimates; and (10) concluding thoughts about future
directions. The planning committee conducted all of its preparatory work
by email and teleconference.
The Workshop on Using Models to Estimate Hog Production took
place at the National Academies of Sciences, Engineering, and Medicine
in Washington, DC, May 15, 2019. The audience included present and
past NASS leadership and technical staff.
THE WORKSHOP AND STRUCTURE

OF THIS PROCEEDINGS
The workshop was opened by Eric Slud, chair of the planning com-
mittee, who explained the intent of the workshop to present underlying

INTRODUCTION 3
data sources and the technical basis for models developed by NASS for
estimation of national-level hog inventories and to discuss additional
data sources, model improvements, and extensions such as responding
better to shocks and developing state-level models. The topics, he noted,
call in a variety of statistical and agricultural knowledge well represented
by the participants.
Brian Harris-Kojetin, CNSTAT director, highlighted Principles and
Practices for a Federal Statistical Agency, the flagship CNSTAT publi-
cation.1 CNSTAT was pleased to work with NASS on the organization of
the workshop, he said, reminding the audience that it was not a consensus
study, but a workshop in which individual participants present their ideas
to NASS.
Linda Young, NASS director of research and development, said the
agency thought it had a good model 5 years ago. However, the model per-
formed well during times of equilibrium, not during shocks such as dis-
ease or flooding, which is when the NASS Agricultural Statistics Board
could most use the help of a good model. She expressed hope that the
workshop would provide NASS with insights to help solve the problem
of identifying shocks, monitoring data, and using models to combine
data sources to publish high-quality estimates with standard errors, even
during times of shock. She said that NASS wants to identify the presence
of a shock in real time and determine its impact on the processes NASS
is monitoring without having to wait for final definitive data to become
available. She said that she was looking forward to the help the commit-
tee and invited participants could provide to solve this difficult problem.
After the welcoming remarks and introductions, six sessions provided
background to the problem, describing the data, estimation, and model-
ing that has been done by NASS, with each presentation followed by a
question-and-answer period. They were followed by four discussion
sessions (see Appendix A for the agenda and a list of participants). The
remainder of this proceedings reflects the six NASS presentation ses-
sions and the concluding four discussion sessions. Chapter 2 provides
background and challenges for the effort. Chapter 3 describes survey
processes and data sources. Chapter 4 describes the Hog Board and its
role in setting official estimates. Chapter 5 describes previous and current
1National Academies of Sciences, Engineering, and Medicine. (2017). Principles and Practices
for a Federal Statistical Agency: Sixth Edition. Washington, DC: The National Academies Press.
doi: https://doi.org/10.17226/24810.

NASS modeling efforts. Chapter 6 describes NASS web-scraping efforts.

Chapter 7 describes NASS modeling innovations that account for swine
population dynamics. Chapter 8 turns to the first of the discussion ses-
sions, and it summarizes the discussion of detecting and monitoring out-
breaks. Chapter 9 provides a discussion of U.S.-level modeling efforts.
Chapter 10 summarizes the discussion of state-level models and data
sources. Chapter 11 summarizes the final discussion about visions of
future work among steering committee members, invited discussants, and
members of the audience.
This proceedings was prepared by a rapporteur as a factual summary
of what occurred at the workshop. The steering committee’s role was
limited to planning and convening the workshop and serving as expert
discussants during the workshop. The views contained in the proceed-
ings are those of individual workshop participants and do not necessarily
represent the views of nonparticipants, other workshop participants, the
steering committee, or the National Academies of Sciences, Engineering,
and Medicine.

2
Motivation and Challenges
In the first technical session, Nell Sedransk, director of the National

Institute of Statistical Sciences, provided an overview of project goals and
challenges. She described the responsibility of the National Agricultural
Statistics Service (NASS) to produce quarterly national- and state-level
estimates of hog and pig inventories by weight groups, the NASS approach
to preparing estimates, the reasons for developing models, and challenges
raised by shocks to the system such as disease. Katherine Ensor (Rice
University) moderated the session. The presentation was followed by
questions and answers from the audience.
OVERVIEW OF HOG MODELS

Sedransk noted her and subsequent presentations represent a NASS
team effort. The findings and conclusions are those of the authors and
should not be construed to represent any official U.S. Department of
Agriculture (USDA) or U.S. government determination or policy.
She began by describing the clear requirements of the problem. NASS
produces and publishes quarterly estimates of U.S. hog inventories by cat-
egory, as well as estimates for selected states. Hog inventory is a Federal
Principal Economic Indicator and, as such, has clear requirements for
publication.1 Variables collected on the Quarterly Hog Inventory Survey
(described in Chapter 3) and published quarterly span the process from
breeding through marketing and include counts of sows farrowed (giving
1The Office of Management and Budget’s Statistical Policy Directive No. 3 states “Economic indi-
cators must be released promptly…. (This) reduces the chance of unauthorized, premature disclosure.”

birth), pig crop, market hogs by four weight groups, and breeding stock.
A key derived variable is litter rate (pig crop divided by sows farrowed).
Sedransk noted that before about 2008, NASS did not have a model.
At that time, the survey estimates and state-level recommendations
provided by the NASS state offices were reviewed and evaluated by an
expert group (now called the pre-board) within NASS, revised if neces-
sary, and provided to the Hog Board, one of the Agricultural Statistics
Boards (ASB). Using survey estimates, state-level recommendations,
revised estimates from the expert group, and auxiliary information, the
ASB made final determinations and set the official NASS estimates for
U.S. totals and then the state-level estimates for selected states (for more
information on the ASB, see Chapter 4). State estimates were constrained
to add to the determined U.S. totals.
Since modeled results have become available, this process has
adapted but is largely unchanged. Present-quarter survey data are a key
input to the model. Model-based estimates are first provided to the pre-
board along with survey estimates and state recommendations. Based
on any revisions from the pre-board, the model may be rerun to obtain
model-based estimates for presentation to the Hog Board. The timing of
this sequence is such that run time for the model can be no more than 2
or 3 hours.
Sedransk described the value that could be added to the process by
good models. First, a model would provide statistical estimates with stan-
dard errors. Estimates and standard errors would be based on survey data,
as well as reflect the internal relationships in the data between and within
quarters. It could reflect long-term trends and seasonal patterns, localized
geographic events or differences, and composition of inventory based
on biology (breeding, growth, death, market). Models may be predic-
tive, and differences between the data and the prediction might help to
identify change points. A model might help to allocate national numbers
to state numbers.
MODEL COMPONENTS AND AVAILABLE DATA

Sedransk identified components of a hog inventory model that will
be important to include, noting no one model will always be best. First,
accounting for trends and seasonality is important. Hog inventory data
have a strong seasonal component—peaking in autumn months and

MOTIVATION AND CHALLENGES 7
decreasing in spring months. The seasonality is more pronounced in the

northern regions than in the south with regional differences due to cli-
mate, weather, feed, and other factors. In addition, the data show a slow
long-term upward trend that has been present for decades.
Constraints help to make the model estimates more consistent and rel-
evant. Constraints that may be relevant include local (temporary) caps on
slaughterhouse capacity or slaughterhouse access; biological factors (e.g.,
reflecting hog growth over time and the quarterly reporting of inventories
by weight category); and so-called balance-sheet constraints. A balance-
sheet constraint shows current inventory equal to previous inventory plus
new inventory (growth into category, purchases, and imports) minus loss
of inventory (death, slaughter, exports, or sale).
Publicly available data include the historical official NASS estimates
for all inventory items at the national level and state level (for selected
states) that have been published quarterly in the Hogs and Pigs Report.2
These data include initial (first published), revised between first and final,
and the final estimate (published five quarters after the initial estimate).
The final estimate is regarded by NASS as its most accurate. The
historical survey-based hog estimates by category at the national and
state level are available within NASS. These data items have been consis-
tently reported using the same definitions since 2008. However, total hogs
have been reported annually by NASS since 1866 and total market
hogs have been reported since 1963. These long series can reveal long-
term trends and the impact of previous epidemics.
NASS also has access to information from USDA’s Food Safety and
Inspection Service—historical national-level hog slaughter numbers pub-
lished monthly. These numbers are important because almost all pork in
the market comes from hogs that go through inspected slaughterhouses.
The count of hogs slaughtered reflects the most accurate count of hogs
raised for market.
Sedransk noted that hog inventories are dominated by large opera-
tions whose supply of pork to market is a huge production process. The
process thrives on uniformity. As a result, the pig and hog survival
function in equilibrium is remarkably stable. In the absence of disease,
hogs that weigh 100 lbs. are very likely to go to market at about 265 lbs.
Hog growth from birth to weaning, pig crop to market weight, are well
2For more information, see https://usda.library.cornell.edu/concern/publications/rj430453j?locale=en.

defined and stable, though there are some slight seasonal differences in
the growth rates by region. The system is so stable it leads analysts to rely
heavily on past data to evaluate reasonableness of survey results. This
process works well in times of equilibrium, she said.
DISEQUILIBRIUM
Sedransk explained some of the challenges with modeling and
accounting for shocks are that the impact depends on the event (disease
may kill pigs of a specific age group, may result in smaller litters, or may
result in culling to prevent the spread of disease), as well as an opera-
tion’s response that can be localized and dynamic. Natural disasters tend
to have localized impacts, slaughterhouse capacity limitations may be
regional, and market forces can be national, regional, or localized.
Sedransk showed a number of charts illustrating the impact of the
Porcine Epidemic Diarrhea virus (PEDv) that began in 2013. The first
set of charts (not pictured here) showed time series for U.S. Total Hogs,
Iowa Total Hogs, and Colorado Total Hogs from March 2012 through
November 2013 to illustrate the start of the epidemic. The second set (see
Figure 2-1) showed the same time series but from March 2012 through
November 2015 to illustrate both the epidemic and recovery. Looking at
the data through November 2013, the U.S. Total and Iowa Totals showed
virtually no impact of the disease. Sedransk noted that Iowa is dominated
by very large operators, which also dominate the U.S. total. In contrast,
Colorado showed a clear drop in inventories. The Colorado plot might
have been useful as an early indication of PEDv. However, the decline
in Colorado was not enough to show up in national totals because the U.S.
Total is not responsive to what happens to small states or small opera-
tions. Figure 2-1 illustrates both the initiation of PEDv and the recovery.
The U.S. level illustrates the decline and a relatively quick recovery. Like
the U.S. Total, the decline in Iowa was profound, but the recovery was
quite quick because of large operators’ ability to recover more quickly.
In contrast, the downturn in Colorado persisted longer. This illustrates
the differences among states, large and small operators, and the spatial
component of the modeling challenge, she said.
She noted that emerging disequilibriums are challenging to detect.
They likely have a spatial component; they can be localized and may or
may not be reflected in national totals. They can be spatially dynamic.

FIGURE 2-1 Total hog inventory, United States, Iowa, and Colorado,
March 2012–November 2015.
SOURCE: Prepared by Nell Sedransk for presentation at the workshop.

They may impact large and small operators differently. The approaches
NASS has pursued to confirm or detect a shock include data diagnostics
(using existing data) and web scraping.
MODEL DECISIONS
Sedransk observed that a number of approaches to modeling can
work, and NASS needs to decide on one approach. One of the fundamen-
tal decisions is whether to pursue one comprehensive model or several
models that may be linked, switched, or compounded. If there is an equi-
librium model, diagnostics are needed to help identify departures from
equilibrium. Another question is whether the model should be top down,
starting at the national level and partitioning down to the state level, or
bottom up, starting at the operator level or state level and aggregating up
to the national level. Good data are key to a successful modeling effort,
she stressed. Data are needed at the appropriate level of detail that reflect
important aspects of the process. One question is how to incorporate new
types of data, such as spatial imaging. Sedransk noted that imputation for
nonresponse of large operators is important. If based solely on past data,
the imputation may damp out the impact of a shock, making the shock
more difficult to detect.
She noted some technical issues with modeling to account for shocks,
such as how to make inferences for nonsampled operations. If an opera-
tion has an outbreak of disease, it is not necessarily true that all other
operations in the state are at risk, but those closer to the affected farm are
at greater risk. One question is how to account for this spatial component
of disease spread. Another question has to do with estimating uncertainty
for hybrid models with mixed components. Errors can be due to model fit,
model specification, or sampling variability. She reminded the audience
that computations must be done within 2 or 3 hours, a point expanded
upon during the discussion.
DISCUSSION
Ensor started the discussion by asking about the computing-time
constraint of 2 to 3 hours, given the amount of data to assimilate. Linda
Young (NASS) replied the constraint relates to the role of the NASS
Hogs and Pigs Report as a Federal Principal Economic Indicator. As
such, one of the directives is that estimates are released to the public

according to a prespecified schedule. From the day sampling starts until

publication of the official NASS estimates is less than 1 month. There is
a time restriction on everything in this process. It takes more than 15 days
to collect and summarize the data. Sedransk added that the data collec-
tion is quite large, with about 7,500 operations across the United States
surveyed quarterly.
Gavin Corral (NASS) reported that he runs the current program to
provide model-based estimates to the ASB. The timeline of getting the
data happens very quickly. Usually the data come in on a Wednesday and
he has 2 or 3 hours before the numbers are due to the pre-board. There is
a lull time while the pre-board meets, then the model must be rerun for
the ASB with updated input from the pre-board.
Ron Plain (University of Missouri, emeritus) asked who is involved
at the state or regional level in developing state recommendations. Dan
Kerestes (NASS) replied that data are collected, edited, and analyzed by
NASS staff in the field offices. State recommendations and survey esti-
mates are prepared by field office staff who are experienced in working
with the hog data.
Ensor asked about automation of the data collection process. Kerestes
said over the years, NASS has improved the data collection process. For
example, individuals can now respond through the Internet versus mailing
back questionnaires. Some large operators request personal interviews,
and enumerators go to their farms and conduct interviews to collect the
data. Some operators feel that it is more secure to give the information to
someone they have been seeing for the past 10 to 15 years.
Matthew Branan (Animal and Plant Health Inspection Service) asked
where the model sits in the overall process. Young replied that the model
estimates are used in the process of setting estimates. The model is run
to provide input to a pre-board analysis, where it is considered along with
other information. Based on the results of the pre-board analysis, the data
inputs to the model are updated and the model is run again to provide
input to the ASB. Corral added that the model results are considered to
represent one scenario to describe the current quarter’s data along with
other scenarios. Kerestes added that the model estimates come in as
another input to the ASB along with the recommendations from the field
offices and the analysis by the pre-board.

Lee Schulz (Iowa State University) said as a consumer of the data,

he appreciated the opportunity to meet the people involved. He asked
whether data quality might contribute to the challenges related to detect-
ing disequilibrium and asked about improving the accuracy of the data.
Young replied she does not think data quality is the root of the problem.
Part of the reason the data collection extends so long is the need to feel
comfortable with estimates from key operators. Sometimes there are
delays from one of the operators, and numbers must be adjusted as a con-
sequence of that additional information.
Young pointed to Sedransk’s charts to indicate the problem, noting
there was no early signal for PEDv in Iowa, the nation’s top hog-producing
state. The signal for PEDv was in Colorado, which produces much less. It
is difficult to detect shocks that start with small operators in small states.
Kerestes noted that when hogs become ill, operators themselves may not
know immediately if the disease will have a big impact. Every disease is
different. It is not clear how quickly operators report incidents because
they do not always understand the situation.
Eric Slud (University of Maryland and U.S. Census Bureau) sug-
gested that it might be feasible to compute point estimates related to hog
totals within the required 2 to 3 hours, with technical documentation and
difficult-to-compute variance estimates released later. He asked whether
this might be acceptable to NASS, adding that it has been done by the
Census Bureau. Young replied that they would have to carefully consider
that option.

3
The Quarterly Hog Inventory Survey
Emilola Abayomi from the Research and Development Division

of the National Agricultural Statistics Service (NASS) described the
quarterly Hog Inventory Survey, the source of key inputs to NASS offi-
cial estimates of hog inventories, with details about the sample design,
an overview of the survey process, and a description of the implica-
tions of the Porcine Epidemic Diarrhea virus (PEDv) as a major shock
to the system. The session was moderated by Ron Plain (University of
Missouri). The presentation was followed by questions and answers from
the audience.
THE HOG INVENTORY SURVEY

Abayomi reported that the Hog Inventory Survey is administered
quarterly. December is the base quarter, when the greatest number of
operations are asked to report. Follow-on surveys are administered in
March, June, and September. The target population for the survey is all
agricultural operations that own one or more hogs or pigs. The primary
estimates derived from the survey include total inventory, breeding herd,
market inventory by weight class (<50 lbs., 50–119 lbs., 120–179 lbs., and
180+ lbs.), sows farrowed, and pig crop (the latter two reported monthly
for the previous quarter).
The first step of any survey is developing a frame that will cover the
target population, she said. NASS constructs and maintains a list frame
of all known agricultural operations. Data such as contact information,
demographic information, type of agricultural entity, and variables useful
13

TABLE 3-1 Stratum Design for Iowa

Stratum Number of Hogs and Pigs Sampling Weight
80 1–99 24.0
82 100–999 2.19
86 1,000–9,999 1.53
88 10,000–29,000 1.00
90 30,000–49,000 1.00
92 50,000–89,999 1.00
98 90,000+ 1.00
SOURCE: Prepared by Emilola Abayomi for presentation at the workshop.
for developing agricultural surveys are maintained on the frame. Regular

maintenance and list building are needed to ensure that the frame remains
current and is representative of all U.S. agriculture. The list frame for the
Hog Survey consists of operations on the NASS list frame that are identi-
fied as having hogs and pigs. This list frame has been estimated to cover
97 percent of all hog inventory.
She explained that states are divided into three tiers for sam-
pling. The first tier consists of the 16 top hog-producing states. A
sample of producers is selected to report for these states quarterly, and
state-level estimates are published quarterly. Most of these states have
a target coefficient of variation (CV) of 6 percent. However, seven criti-
cal states1 have a target CV of 3 percent. Fourteen states in the second
tier have a substantial amount of hog inventory. They are sampled every
quarter; however, their state estimates are only published in December.
They also have a target CV of 6 percent. The remaining states are only
sampled in December, and their state estimates are published in Decem-
ber. They have a combined target CV of 6 percent. A stratified random
sample is designed for each state. The strata are categorized by the total
number of hogs and pigs that are owned by an operation as recorded on
the list frame. Tables 3-1 and 3-2 show the stratum limits for Iowa, the
top-producing hog state, and Colorado, a top-producing state but not one
of the top seven. A random sample of operations is drawn from each stra-
tum in a state. Stratum 98 is referred to as the extreme operators. They
1These states are Illinois, Indiana, Iowa, Minnesota, Missouri, Nebraska, and North Carolina.

THE QUARTERLY HOG INVENTORY SURVEY 15
TABLE 3-2 Stratum Design for Colorado

Stratum Number of Hogs and Pigs Sampling Weight
80 1–99 31.92
82 100–499 1.00
98 500+ 1.00
SOURCE: Prepared by Emilola Abayomi for presentation at the workshop.

always have a sampling weight of 1.00 (but other strata may also have
sampling weights of 1.00).
Abayomi explained that the June Area Survey implements an area
survey to provide a measure of undercoverage for the NASS list frame.
It is completely independent of the list and, by definition, is complete.
This area frame is used to determine and provide the adjustment of 3 per-
cent undercoverage of the list. In June, the area frame sample is selected,
farms are identified, and area frame records are linked to the list. Any of
the area frame farms that are not on the list are called NOL (not on list)
records. All of the NOL records that were identified as having hog inven-
tory are included in the Hog Inventory Survey during December2 as part
of the area frame sample. For March, June, and September, the data from
NOL records are modeled.
She went on to say that once the sample is selected, the data are col-
lected. As noted in Chapter 2, the survey timeline is fairly condensed.
There are 15 days of data collection, starting on the reference date
(December 1, March 1, and so on), followed by 4 to 5 days of editing,
imputation, and interpretation of results. There are 5 to 6 days of review
and preparation of official estimates. During the last week of the month,
the information is released to the public.
All modes of data collection are utilized for this survey, she said.
Mail and web are ideal due to data collection costs. However, telephone
follow-ups help collect data, and some personal interviews are conducted
with larger operations who require special accommodations.
All data go through a process of review and adjustment to ensure con-
sistency and quality control, she said. Regardless of the data collection
mode, all the data go through an interactive system called Blaise (a com-
2To maintain the independence of the list frame and the area frame, the NOL records are not added
to the list frame.

puter software editing system). Blaise uses logical edits to ensure that the
relationship between responses are consistent. Each record (response) that
fails this edit will be marked as “dirty” and subject for additional review.
“Dirty” records are reviewed by a statistician to try to determine whether
the information provided was accurate or adjustments will be made.
Revised records are evaluated through Blaise. Once all data are deemed
clean, they move to the data analysis phase. During data analysis, the stat-
istician reviews outliers and influential records in comparison to current
and past data. Once data analysis is completed, all data are deemed clean
and the process moves to a summary stage.
DISEASE SPREAD
Abayomi described shocks, followed by challenges for modeling
given the way disease spreads and can be reported in the data.
She defined a shock as any event that can cause a sudden change in
hog inventories, noting that a shock can take various forms. Examples
include natural disasters such as Hurricane Florence (North Carolina)
and Hurricane Michael (the Carolinas, Florida, Georgia) in 2018; and the
flooding in 2019 in Nebraska and Iowa.
A shock can also take the form of a disease, such as PEDv. As
described in Chapter 2, the disease, first detected in the United States
in 2013, kills young pigs. Disease spread is another challenge in looking
at the impact of shocks. PEDv illustrates disease spread. A network of
Animal and Plant Health Inspection Service (APHIS) laboratories began
to collect information about this virus in 2013 and shared the information
among themselves. Eventually, they voluntarily gave this information to
APHIS headquarters, which began to produce weekly reports on PEDv
accessions and the number of positive PEDv samples that were identi-
fied. Abayomi showed that in July 2013, PEDv was detected in 9 states.
As of November 2013, 17 states had positive detections of the virus. The
virus had gained momentum and began to spread to neighboring states.
In March 2014, 28 states had positive detections, and by June 2014, 32
states had positive detections.
In summary, she said there was a short time period for the spread
of this virus. Although, it started with nine states, that number almost
quadrupled within a short time span. This example shows the relationship

THE QUARTERLY HOG INVENTORY SURVEY 17
of geographic proximity to virus transmission. Due to this relationship,

there is a need to be able to predict these shocks fairly quickly and to
account for the impact on NASS estimates.
DISCUSSION
Ron Plain asked whether response rates are greater among large
operations or small operations. Lori Harper (Methodology Division,
NASS) replied that they tend to get data from the largest operations
because NASS makes extra efforts to reach them and, if efforts fail,
NASS imputes for them. The middle-sized operations, with hog inven-
tories ranging from 5,000 to 20,000, have the highest nonresponse rates.
These operations are typically a challenge to reach via telephone if they do
not respond by mail. The smaller operators are contacted less frequently
than larger operations, tend to be at home in the evening, and answer
NASS calls.
Katherine Ensor asked whether NASS has analyzed the operator-
level individual time series data. She said it seems like a lot of data are
rolled up into summary information. Linda Young replied that the state
statisticians spend a lot of time looking at operator-level time series and
trends as part of their analysis. Headquarters statisticians also have the
operation-level data. Ensor noted that operator-level data might be par-
ticularly useful to try to understand the spread of disease.
Nell Sedransk observed that not every operator reports four times a
year, usually only two or three. To try to identify and characterize pro-
duction patterns, NASS has restricted attention to those operators with
a long enough time series to see how the operations have changed over
time. She noted that very small operations, with fewer than 100 hogs,
tend to be very individual. NASS has had better luck focusing on the
middle-sized and the larger operations. She said in equilibrium, the major
producers have a very smooth throughput with few unpredicted shifts.
The middle-sized operators show a much greater impact of shocks.
Ensor commented the individual operator time series provide a rich
dataset to understand equilibrium as well as shocks, and asked whether
more could be done with these data. Dan Kerestes said quite a bit has
been done. As part of the processing and editing of the survey data, ana-
lysts use a software tool that displays the company-level time series (cur-

rent and historical data) for all inventory categories. This use also illus-
trates the importance of high response rates, especially among larger
companies, he noted. Chris Wikle (University of Missouri) suggested
that it might also be useful to examine the spatial aspects as well as the
time series aspects.
Eric Slud asked whether the unit-level time series are used to do
the imputation for unit-level nonresponse. Harper replied this is done
for extreme operations. The statisticians in the field offices examine an
operation historically, looking at trends and growth to come up with
the best possible values to impute. For nonrespondents not considered
extreme operations,3 NASS uses an adjusted estimator, an extension
of a re-weighted estimator to include information about operator status
(e.g., whether still in the hog business). Slud asked how NASS adjusts
for item nonresponse or incomplete forms. Harper replied that for par-
tial responses, she thinks NASS uses manual imputation using the data
reported, as well as historical data provided by that operator.
Lee Schulz (Iowa State) asked about the potential for bias in the impu-
tation process, particularly in detecting future shocks. He asked whether
PEDv resulted in any adjustments to the process or model. Gavin Corral
reported updates in the way NASS looks for shocks (described in Chapter
5) but no changes to the Kalman filter model. The current approach to
detecting shocks would identify PEDv, but with a one-quarter lag. That
model basically gives a red flag for an unusual quarter. The red flag would
be reported to the pre-board and Hog Board (described in Chapter 4). The
pre-board might use that information to develop one or two scenarios for
consideration by the board.
Nancy Kirkendall (National Academies of Sciences, Engineering, and
Medicine) provided an example of using time series models with an adjust-
ment to reflect market changes to impute for nonresponse. She reported
that the Energy Information Administration used time series models for
each respondent to a survey. The models were used to project the current
report. If a company did not respond, that projected value was adjusted by
the ratio of the sum of all reported values divided by the sum of the pro-
jected values for the companies that reported. The adjusted number was
used as an imputed value.
3Extreme operators are the largest operators in a state, that is, those assigned to stratum 98 as il-
lustrated for two states in Tables 3-1 and 3-2.

4
Setting Official Estimates: The Hog Board
Christopher Wikle (University of Missouri) moderated this session of

the workshop. He introduced Seth Riggins (National Agricultural Statis-
tics Service [NASS]) to summarize the hog data and the industry and to
provide detail about how NASS uses multiple data sources to set official
estimates. A discussion followed Riggins’s presentation.
AN INTRODUCTION TO THE DATA AND THE INDUSTRY

Riggins reiterated an earlier statement that one of the reasons the Hog
and Pigs Report is important is that it is designated as a Federal Principal
Economic Indicator by the U.S. Office of Management and Budget (OMB).
For purposes of setting and evaluating estimates, the most important data
items are the comparison of current quarter inventories (by category) with
the previous quarter and with previous-year same-quarter inventories
expressed as a percentage, the number of sows farrowed (giving birth)
during the past quarter, the pig crop, and sow farrowing intentions for the
next two quarters (by month).
Riggins provided the definitions used in the data effort. Market hogs
are all hogs that are destined directly to slaughter after finishing (grow-
ing). Breeding hogs produce market hogs and future breeding gilts and
boars, but they are also eventually slaughtered. Sows farrowed are female
pigs that gave birth during the past 3 months. The pig crop is the number
of pigs born alive during the past 3 months that are still alive or were sold
or slaughtered before the survey reference date (December 1, March 1,
June 1, or September 1). Pigs per litter is the pig crop divided by sows
19

FIGURE 4-1 U.S. quarterly hogs and pigs 2010–2019.

NOTE: Graph is not zero-based, accentuating the drop.
SOURCE: Prepared by Seth Riggins for presentation at the workshop.
farrowed to give a rate of production. Farrowing intentions are reported
for the next two quarters separately. For example, in June, data are col-
lected for intended farrowings for June through August; in September, for
intended farrowings for September through November.
Riggins illustrated the quarterly number of hogs and pigs for the past
10 years (see Figure 4-1). The data show a fairly constant growth from
63.6 million hogs in March 2010 to 74.3 million hogs in March 2019. The
dip in 2013 and 2014 was due to the Porcine Epidemic Diarrhea virus
(PEDv).
Riggins also discussed changes in the industry since 1994 (see Table
4-1). In 1994, there were a large number of producers with smaller inven-
tories. The data show fewer producers in 2017, but much of the inventory
is concentrated in large operations.
THE ESTIMATION PROCESS

Riggins summarized the NASS estimation process, which involves
Riggins himself as the headquarters hog statistician, regional field office
statisticians, and the Agricultural Statistics Board (ASB). The process
starts with revisions to past data and progresses through compiling the
information and evaluating it, the pre-board, the Hog Board, and a brief-
ing to the secretary of agriculture just prior to release.

SETTING OFFICIAL ESTIMATES: THE HOG BOARD 21
TABLE 4-1 Changes in the Hog Industry since 1994

Percentage of
Number of Hog Inventory Inventory on 2000
Year Operations (million head) Head Operations
1994 (December) 208,780 59.6 37
2002 (Census) 78,895 60.4 75
2007 (Census) 75,442 67.8 85
2012 (Census) 63,246 66.0 90
2017 (Census) 66,439 72.4 93
NOTE: Column heading Percentage of Inventory on 2000 Head Opera-

tions—means the percentage of total hog inventory held by operations
that have 2,000 or more hogs and pigs.
SOURCE: Prepared by Seth Riggins for presentation at the workshop.
Revisions
Riggins explained that to start the process, the headquarters hog
statistician, livestock branch chief, livestock section head, and a meth-
ods branch representative determine whether revisions are necessary for
the previous three quarters (for the March, June, or September publica-
tions) or for up to seven previous quarters for the December publication.
This review is primarily based on newly available commercial slaughter
numbers and a balance sheet approach that looks at percent changes by
inventory category. Revisions are determined at the national level, and the
state-level estimates are revised to meet the new national targets. Occa-
sionally, NASS receives data from large operations that can influence
a state-level estimate, in which case it considers a state-level revision.
He noted that there are also revisions due to the Census of Agricul-
ture, conducted every 5 years. Twenty quarters of national and state data
are open for revisions, if necessary, based on results of the most current
census. After that point, estimates are considered final and are never
revised again.
Preparing and Evaluating Current Data

The regional field office statisticians run survey summaries each
quarter once they collect, process, edit, and review the data, Riggins

explained. The data are then submitted to him as the headquarters statis-
tician. He reviews the state-level estimates and prepares for a pre-board
meeting (usually held the following day).
He said that the pre-board members compare preliminary national
estimates with the state recommendations. The primary data relationships
reviewed are the ratio of the current quarter to previous year for the same
quarter expressed as a percent, the inventories of sows farrowed and pig
crop in conjunction with the two small weight groups, and the previous-
quarters’ pig crop inventories with the current 50–119 lbs., 120–179 + lbs.,
and 180 + lbs. weight groups. They examine national-level balance sheets
for the current data and the balance sheets for the two most recent past
quarters and the past year that have been updated to include updated
slaughter data, death loss estimates, and imports and exports.
Riggins explained that the balance-sheet approach results in residuals
that are carefully examined. For example, at the 3-month level, the previ-
ous quarter’s inventory estimates plus the pig crop, plus imports minus
the death loss, minus exports, and minus slaughter are compared to the
current quarter’s estimate. The difference is the residual.
One of the important things reviewed by the pre-board members is
the percentage changes of the two small weight groups from the previ-
ous quarter moving to the larger two groups plus part of the 50–119 lbs.
weight group to make sure the current inventory number includes the
smaller pigs reported in the previous quarter. They next look at a com-
parison of the model-based numbers with the survey numbers for totals
and by various inventory categories. They use this information to develop
two or three pre-board scenarios to present to the ASB the following
morning. One scenario is based on the model-based estimates. Another
scenario might be to adjust the weight group inventories to cover the
expected growth of small pigs reported in the previous quarter to larger
weight categories.
The Agricultural Statistics Board

Riggins reported that the ASB is composed of the NASS statistics
division director, national hog statistician, livestock branch chief, live-
stock section head, methods branch representative, survey administra-
tion representative, and two or three regional field office representatives.
The purpose of the meeting is to set national targets for select inventory

categories. The survey administration representative provides an over-

view of the survey results, including response rates both within sampling
strata and across states, as well as anything that might have affected the
survey process.
He said the methods branch presents an overview of the summary sta-
tistics. During this presentation, coefficients of variation (CVs), percentage
changes from the previous quarter/previous year published estimates, and
previous quarter/previous year ratios of the top 20 and top 100 producers’
survey results are discussed. The methods branch has developed a set of
ranges instead of a model or balance sheet. The methods branch represen-
tative uses the proposed revisions against current-to-previous ratio sum-
maries, as well as previous quarter and previous year and 6-month ratios,
to develop ranges. These ranges are plotted on a high–low graph to see the
overlap with the point estimate and the standard error range. One method
compares the matched record ratio.1
There is discussion about the weather, industry news, slaughter plants,
regional field office comments, and comments about disease that may
have come in during the survey process. During the past year, for example,
African swine fever and its potential effect on the American hog producer
was a topic of discussion. The board members discuss any comments
received from extreme operators about national- or state-level changes,
imports and exports, and anything unusual from the previous quarter or
previous year. Imports and exports are fairly minor compared to overall
hog levels, but they can have some influence, he pointed out. The board
members also talk about hog prices and commercial slaughter. Typically,
over the past year, Riggins said, discussions have been about expansion
of hog production related to slaughter capacity on a daily and yearly basis
and labor supply to run shifts at the packing plants.
Finally, Riggins said, the hog statistician presents the pre-board sce-
narios and balance sheets. After a discussion, each board member enters
his or her target into a software package, and the hog statistician then
pulls up a summary of everyone’s recommended targets. A roundtable
discussion ensues about why board members chose their national targets,
and a consensus on the national targets for total inventory, sows farrowed,
pig crop, and pigs per litter is established.
1Only respondents who participated in the current quarter, previous quarter, and past year are in-
cluded in the ratio.

FIGURE 4-2 Trade expectations compared to National Agricultural Sta-

tistics Service estimates (percentage of previous-year NASS estimate),
March 2019 example.
NOTE: Trade expectations are estimates published by non-U.S. Depart-
ment of Agriculture entities in advance of NASS release of the official
estimates.
SOURCE: Presented by Seth Riggins at the workshop, based on a slide
used in the March 2019 briefing to the secretary of agriculture. See
https://www.nass.usda.gov/Newsroom/Executive_Briefings/2019/03-28
-2019.pdf.
After national targets are set, the headquarters statistician works

with the regional field office representatives over the next few business
days. They use a top-down approach to setting the national- and state-
level estimates. First, national numbers for weight groups are set, then
the state-level estimates are set to sum to the national targets. All states
must still balance within the weight groups, and the sows farrowed, pig
crop, and pigs per litter also have to balance within the state and at the
national level.
Publication and Briefing the Secretary of Agriculture

Riggins explained that publication is usually 1 week after the ASB
meets and always within 30 days of the reference date. The Hogs and
Pigs Report is a confidential publication, so before public release the
data cannot be discussed outside of the members entrusted with the
data. A briefing to the secretary of agriculture is given in a “lockup”

shortly before release. (The lockup procedures were detailed in one of

the appendixes provided to the committee and discussants participating in
the workshop.)
The briefing informs the secretary of how the survey process went,
anything unusual in the industry, import/export changes, revisions to pre-
vious quarters, and trade expectations. Trade expectations are discussed
so the secretary knows that this report is going to have an influence on
the market. However, NASS does not receive pre-report trade expectations
until the day before the executive briefing, and it has no influence on set-
ting the national targets. As an example, Riggins showed a slide used in the
March 2019 briefing to the secretary (see Figure 4-2). The red squares are
the NASS official estimates. The black dots are an average of the indus-
try’s prereport expectation, and the gray bars are the range.
DISCUSSION
Ron Plain observed that the reference date for the survey data is
the first day of the month. The data are released about the 28th day of the
month, and approximately three-quarters of the 180 + lbs. weight group
will have been slaughtered by the release date. He asked whether NASS
looks at slaughter over the past 28 days and uses it to adjust the 180 + lbs.
weight group inventory.
Riggins replied that at the time a state has to send in its estimates,
only 2 weeks of preliminary daily slaughter data are available. At that
point, slaughter is a weak indication compared to the survey data.
While there are more data by the 28th of the month, there is no time to
adjust the 180 + lbs. group. Moreover, he added, the slaughter data are
still fairly preliminary. NASS takes the first 2 weeks of slaughter data
into consideration but does not let those data influence how they look at
survey estimates.
Lee Schulz referred to the trade expectations, noting a balance sheet
approach is used. He asked about the possibility of NASS using the trade
expectations data. Riggins replied that the preliminary slaughter data are
informative, but much can happen in 2 weeks. The trade expectation data
are only available a day before release, Riggins said, although Schulz
commented the data may be prepared earlier. Dan Kerestes, who is an
NASS member of the ASB, commented that trade can go any direction,
as seen in Figure 4-2. He noted NASS must approach trade data with

caution. NASS tries to use as much slaughter information as possible, but

there is a point in time where information must be cut off because time
is limited. The Hog Board, in the course of a 2-hour meeting, discusses
slaughter data, weather conditions, and disease, he added.
Kerestes said that after the ASB meeting, Riggins finalizes weight
category inventories and state numbers, making sure all information is
accounted for, makes biological sense within each state, and adds up to
the national level. As much slaughter and trade data as possible are used,
within the time constraint and efforts to be statistically accurate and unbi-
ased, he said.
Linda Young observed an example of potential bias in Figure 4-2. She
noted that March 1 is the reference date, and estimates reflect the inven-
tory at that point in time. She asked Riggins to explain why the NASS
estimates in the bottom two lines (March to May farrowings and June
to August farrowings) were off so much compared to trade expectations.
Riggins replied that on this particular March 1 (the survey reference
date), information became available about the effect of African swine
fever on the Chinese herd. There was some indication that the Chinese
might buy more pork as a result, and the futures market for pork in 6 to 9
months went up quite a bit. He said he suspects that the industry forecast-
ers assumed an increase in farrowing over what the operations actually
reported on March 1.
Katherine Ensor asked about the context in which model results are
used in this process. She questioned whether an improved model would
have more weight in the board discussion for its final estimates. She also
asked whether shocks would be overweighted if the model took them into
account more fully. Young said she understood that if the model hits the
final estimates better than the current process, the board will rely on the
model more strongly. Everybody wants the most accurate possible num-
bers, she stressed, but right now, no modeling efforts do that.
Ensor commented that the decision process Riggins described, with
experts coupled with statistical models, is a nice decision framework. She
agreed 2 hours is not very long for a discussion of this nature and asked
about the opportunity for a “pre-pre-board,” perhaps a week earlier, to get
expert opinion to integrate into the modeling discussions. Young replied
they could not have the relevant information that far in advance. The flow
of information is the real problem with improving timeliness, she said.

5
Modeling Efforts
Lee Schulz (Iowa State University) moderated this workshop session,

introducing Gavin Corral from the Research Division of the National
Agricultural Statistics Service (NASS) to summarize the modeling
efforts that NASS has undertaken over the past few years and to com-
menton their strengths and weaknesses.
Corral said that the goals of his presentation were to identify the fun-
damental elements of the Kalman filter model (KFM) for hog inventory,
provide NASS criteria for model evaluation, and discuss model perfor-
mance. He also discussed a second model, the sequential generalized
linear model (SGLM), its performance, and a comparison between the
two models. He concluded with a discussion of a third model that pro-
duces shock diagnostics.
PURPOSES OF A MODEL
As context, he said, the purpose of a hog inventory model is to pro-
duce estimates for the hog inventory categories described in earlier chap-
ters. For purposes of evaluation, NASS compares the model estimates to
the initial official estimate released by the Hog Board. It would like
to have the difference between the two estimates within about 470,000
hogs, or approximately 1 day of slaughter. It also compares the model
estimates to the final revised estimate, available 1 year later. During times
of equilibrium, the initial and the final official estimates tend to be very
close, and NASS usually uses the initial official estimate for comparison
because it is available sooner. During times of shock, however, the initial
27

and final estimates diverge. NASS uses the final official estimate for com-
parison because it is more accurate.
NASS wants model-based estimates to be as efficient as possi-
ble, as measured by coefficients of variation (CVs). The models should
respect the interrelationships between categories of hog inventory over
time, referred to as satisfying biological constraints. NASS wants the
number of hogs in the system at one point in time to make sense with
the number of hogs slaughtered at another point in time. The survey
results alone may fail to do this. There tends to be a downward bias to the
survey results that may not reflect the hog growth lifecycle.
NASS seeks model-based estimates that provide accurate estimates
of inventory during times of shock. While disease (i.e., Porcine Epi-
demic Diarrhea virus [PEDv]) was a key issue in 2013 to 2015, NASS
would like to make accurate estimates during all shocks, whether disease,
natural disasters, tariffs, or other causes. Sometimes changes in the
industry before or after a shock also affect hog inventories, which are
important to track.
Corral discussed the KFM and SGLM against four criteria. First,
how well does each model capture inventories during times of equilib-
rium? Second, how well does it detect and adapt for shocks, such as the
PEDv? Third, how well does it account for the biological considerations
of the hog lifecycle? And, finally, how well does it satisfy the balance
sheet constraints that incorporate slaughter data, imports, and exports?
KALMAN FILTER MODEL

Corral gave a quick overview of the KFM, detailed more fully in
Busselberg (2013). The model is a state-space approach with a state equa-
tion to describe the state of the system and how it changes over time;
in this case, the state includes a transition matrix that describes how
national-level hog inventories change over time. The current state depends
on the state in the past five quarters to capture cycle dynamics and
annual trends. The measurement or observation equation describes how
the measurements (e.g., survey data) relate to the state equation. The
Kalman filter is used to update the time series estimates for the state
(hog inventories) given the survey data. It has worked well during times
of equilibrium.

MODELING EFFORTS 29
This model has a number of constraints built into the state equations
to reflect the biological considerations and balance sheet relationships. For
example, there is a limit on the ratio of death loss (of pigs weaned) to pig
crop, and the annual increase of pig crop must be greater than the annual
increase in market weight groups. There is a weight group transition, with
an assumption about the growth of pigs within weight classes. The annual
increase in slaughter is equal to the annual increase in births for the two
preceding quarters. The total number of market hogs in a quarter should
equal the combined total slaughter numbers for the next two quarters. This
is related to the 6-month time period for hogs going from weight group
one to slaughter. There is a constraint that relates market hogs more than
180 lbs. to slaughter during the estimation quarter but after the reference
quarter. Although the quarter is in progress, daily slaughter information
is still available. At the time of the board meeting, 2 full weeks of daily
slaughter information is available. Another constraint is that sows far-
rowed make up one-half of the previous quarter’s breeding herd. Finally,
he noted, the KFM includes a constraint for a constant survival rate
across all weight classes (not considered in the new model, as discussed
in Chapter 7).
Corral illustrated KFM performance as measured against initial and
final official NASS estimates for total hogs. Figure 5-1 shows a time plot
from 2013 to 2017. Three estimates are shown: The initial estimate in
black, the final estimate in red, and the KFM estimate in blue. In the
epidemic years of 2013 to 2015, distance between the final and the ini-
tial estimates is seen. The KFM also missed the final estimate, Corral
pointed out. It tracked the initial estimate fairly closely, had some trouble
coming out of the shock, then started tracking reasonably well and rela-
tively quickly.
Figure 5-2 plots the differences between the final estimate for total
hogs and the KFM (blue) and initial estimates (black). This plot illustrates
the challenges the KFM had during the epidemic years. The differences
spike in March 2014 and December 2015. The KFM then underestimated
from late 2015 to March 2016. At the largest spike, the KFM was off by
almost 3 million hogs.
The difference between the final and initial estimates has a general
decreasing pattern, Corral noted, which illustrates that the KFM strug-
gles to get close to the final estimates after times of shock.

Referring to Corral’s four criteria listed above, the KFM does

a good job of capturing the picture during equilibrium. It does not detect
and adjust for shocks, although it did fairly well coming out of the shock.
It accounts for the hog lifecycle and for the balance-sheet requirements.
Andrew Lawson (Medical University of South Carolina) asked
whether the KFM is fitted to all 21 quarters. Corral replied that the model
was run 21 times to obtain Figures 5-1 and 5-2. Lawson asked whether
NASS had considered fitting the model to all the data. Corral said
it was discussed in the past. Linda Young reminded the audience that
the data shown represent the model-based estimates that were sent to the
pre-board for its consideration. It is being compared to the final revised
estimate—the target, even though much was not yet known at the time
the model was run. The question is whether the model will be effective in
producing the board estimate.
SEQUENTIAL GENERAL LINEAR MODEL

Corral next described the SGLM, developed by Kedem and Pan in
2015. The choice of SGLM was based on giving more weight to current
and immediate data to better capture changing dynamics. The model also
enables a dynamic selection across a wide range of potential variables.
They included additional variables from which to select, including eco-
nomic variables such as pork price. Kedem and Pan used survey results,
board estimates, and differences in revisions.
The SGLM works by testing a large number of potential covariates
using spectral analysis and selecting among them for the final model. In
the documentation, they specified a 4-year window. Depending on which
item the model is estimating, there are usually four to eight different
covariates.
Corral stressed that the SGLM makes an independent prediction of
each inventory item. This results in a major challenge because the esti-
mates do not follow biological constraints and do not satisfy balance-sheet
requirements. The covariates for the model change each quarter.
Figure 5-3 is a companion to Figure 5-1, which shows KLM results.
Figure 5-3 shows the comparison of the SGLM results with initial and
final NASS estimates. This plot shows the initial estimate in black,

MODELING EFFORTS 31
FIGURE 5-1 Total hogs, comparing initial, final, and Kalman filter
model estimates.
SOURCE: Prepared by Gavin Corral for presentation at the workshop.
FIGURE 5-2 Difference between initial and Kalman filter model esti-
mates with final estimates.

FIGURE 5-3 Total hogs, comparing initial, final, and sequential gener-
alized linear model estimates.
FIGURE 5-4 Difference between initial and sequential generalized

linear model estimates with final estimates.

MODELING EFFORTS 33
final in red, and SGLM in green. Notable is that during epidemic years,
the SGLM did a fairly good job of getting close to the final estimate.
The problem arises when coming out of the shock, he said.
Figure 5-4 is a companion to Figure 5-2. It shows the difference
between the final board estimate and the SGLM estimate, and between the
final and the initial estimates. It confirms the fact that the model did well
going into the epidemic, but was not good coming out of the epidemic.
In summary, referring to the four criteria, Corral noted that the
SGLM, like the KFM, captures equilibrium well. It is good at adapting
to shocks but not at adjusting during recovery. It does not account for
biological constraints and does not satisfy balance-sheet requirements.
This summary illustrates the strengths and weaknesses of the KFM and
SGLM. Corral said NASS hopes that the model described in Chapter 7
makes progress toward meeting the four criteria that he set forth earlier
in this chapter.
DETECTING DISRUPTIONS
Corral briefly described a third, new model to identify shocks devel-
oped by Wang and colleagues (2019). It is a Bayesian, hidden Markov
model that captures the dependence structure in the data. The model uses
a Dirichlet mixture model with an unknown number of distributions for
the non-null hypothesis. The algorithm allows for an optimal false nega-
tive rate, while controlling the false discovery rate. As input, it uses a
variety of variables, including sows farrowed, pig crop ratios, and dif-
ferences in revisions. Corral runs the model quarterly and provides its
indication of a shock (if any) to the Hog Board. The only challenge NASS
has with this model is that it is not good at detecting a shock that begins
in the current quarter.
Corral concluded by saying that the KFM model is the most useful
tool for NASS right now and the one currently used. Its shortcomings
arise during shock periods. The Wang et al. diagnostic tools are useful
and provide needed information, but have a lag in detecting a shock.
DISCUSSION
Schulz asked whether the KFM constraints are dynamic or static.
He noted in considering hogs moving between weight groups or going
to market, there is potential for operators to speed or slow that process,

possibly depending on market hog prices, slaughter capacity, finish-

ing capacity, or feed prices. All of those factors can potentially move
the biologic process perhaps by as much as several weeks. He noted the
relevance for disasters such as flooding, when hogs might be delayed in
getting to market for a week or so. Corral replied that the constraints
in the KFM are static. Since the beginning of the project, he added,
the static constraints have been discussed by team members as they
try to improve models.
Andrew Lawson asked for the definition of a shock. He said that some
things are easier to detect than others, and some shocks are slow and start
building, while others are very sharp. Some shocks may be predictable
and some not. He noted shocks could vary in terms of their impact on
biology because different diseases might vary in their predictability and
their impact.
Corral responded that kind of thought process led to consider-
ation of a web-scraping technique (discussed in Chapter 6) as a totally
independent data source. If a shock exists and people are reporting it, web
scraping should capture it.
Nell Sedransk defined a shock as any event that will have a substan-
tial effect on hog inventories and hog inventory estimates (see also Chap-
ter 3). With that definition, a shock can be many things. Diseases have
different patterns than disasters. But all types of shocks mean the equilib-
rium model will fail in some way. The goal is to counter that failure and to
provide hog inventory estimates that are valid and accurate to the board.
She agreed that some shocks can start small, pointing to how PEDv began.
The granularity of NASS data and reporting is quarterly. However, web
scraping may yield indications of a shock at any time. She noted that the
rigidity in the KFM derived from the heavy weighting of past data, which
is why it lags in picking up changes. It also lags in some sense because
hard constraints are placed on data, not on parameters.
Chris Wikle asked about the number of free parameters in the KFM
and whether they are estimated in every run. Corral replied that parame-
ters are estimated in every run, with perhaps six or seven parameters esti-
mated in each run. Wikle asked about any problems with convergence
during estimation, and Corral said he has not had that kind of problem.
Wikle also asked about variation from quarter to quarter in estimated
parameters. Sedransk noted that they have not conducted an examination

MODELING EFFORTS 35
of the parameter estimates, but that her impression is that during equilib-
rium they are relatively stable.
Wikle expressed interest in point estimates and their uncertainties.
Sedransk said that the problem is that the model is rigid and not as respon-
sive to data inputs. Consequently, she noted, it is always biased toward
the time series estimate, which in most cases is the equilibrium model.
Wikle noted the phenomena might be due to the fact that the data are
constrained as opposed to the parameters being constrained. In this bio-
logical world with integrated population models, those constraints would
probably have not been on the data but would have been on the process.
Eric Slud asked about possible changes to the KFM, which is partly
constrained by not accounting for covariance. For example, even without
any new data sources, accounting for the most current revisions to past
quarters’ data should result in an improvement, he suggested. With new
data streams, covariates from other sources could also be used.
Katherine Ensor commented NASS basically has two state-space
models with different formulations. What NASS calls a KFM is a
state-space model with a Kalman filter as a tool, she said, and the SGLM
was not constrained. NASS could potentially add constraints to the
SGLM as a way to merge the two approaches, she suggested. Matthew
Branan asked whether NASS has considered model-averaging approaches.
In his view, KFM is very rigid and SGLM is too responsive to potential
dips in inventory. Perhaps an average would work better, he suggested.


6
Web-Scraping Effects
In Session 5 of the workshop, Yijun (Frank) Wei (National Agri-

cultural Statistics Service [NASS]) described the agency’s preliminary
efforts at web scraping to provide early detections of disease incidence.
The detection modeling effort described in the previous chapter can
detect a shock but with a one-quarter lag, he noted. The hope for web
scraping is to obtain signals of a shock that can be discussed during the
preparation of the initial quarterly estimates. Katherine Ensor (Rice Uni-
versity) moderated the session.
DESCRIPTION OF NASS WEB SCRAPING

Wei introduced web scraping as part of the next step in NASS model-
ing efforts. The approach uses a combination of web scraping and natural
language processing (NLP) for hog disease outbreak detection. Though
disease is just one type of shock, early detection is challenging because
the initial incidents may be small and local. The web-scraping and
NLP approaches are intended to detect the very early signals of a dis-
ease outbreak. It is hoped that web scraping could detect an outbreak and
geo-locate it into states or counties. It has the potential to help predict the
pattern of the spread and rate of spread of a disease.
There are two stages in this approach. The first stage is to detect a hog
disease outbreak using the scraping of disease report repository websites,
such as the Swine Disease Global Surveillance Project (SDGSP)1 and
1SDGSP is a project sponsored by the University of Minnesota Swine Center to monitor hog disease
outbreaks on an international scale. It publishes reports every 2 weeks.
37

the U.S. Department of Agriculture’s Animal and Plant Health Inspec-

tion Service (APHIS). The second stage looks for related news, mostly
from national, state, and local news feeds; extension service websites;
producer organizations’ websites; and blogs. NLP extracts information
from the related news using the four steps of information extraction:
normalize time, normalize word, keyword identification, and named
entity recognition.
Wei provided an example of the first stage, identifying an outbreak
of African swine fever in Vietnam. The initial report was in SDGSP
on March 4, 2019. A summary was extracted from the text noting that
the outbreak affected 96 households/farms in six provinces and cities,
and the Ministry of Agriculture and Rural Development required culling
all those affected, quarantining the outbreak area, and testing all neigh-
boring farms.
Wei’s example of the second phase was the tracking of African swine
fever in China. The first two outbreaks were documented online on the
Pig Site2 in February 2019. The Ministry of Agriculture and Rural Affairs
said the first outbreak was on a farm with 5,600 hogs in the Xushui district
of Baoding City. It reported the farm had been quarantined and the herd
slaughtered. Reuters reported a second outbreak in the remote Greater
Khingan Mountains in Inner Mongolia, where 210 of the 222 wild boar
raised on a farm died and the rest were slaughtered.
In the next step, NLP was used to extract information from the news
reports. Because there is no temporal information included within the
text, time was not normalized. Word normalization changed “raised” to
“raise,” “slaughtered” to “slaughter,” and “quarantined” to “quarantine.”
The key word was defined as “outbreak,” but other key words could
be used. The named entity was the Ministry of Agriculture and Rural
Affairs, location was Xushui district of Baoding City, and disease noted
was swine fever. The summaries of the NLP text processing of these
reports is shown in the following two bullets:
• Noun: ‘outbreak.’ Source: The Ministry of Agriculture and Rural
Affairs,’ Location: ʻa farm in the Xushui district of Baoding
City,’ Stats: ‘has 5,600 hogs.’
2The Pig Site is a knowledge-sharing platform with premium news, analysis, and resources for the
global pig industry. For more information see https://thepigsite.com.

WEB-SCRAPING EFFECTS 39
• Noun: ‘outbreak.’ Source: ‘Reuters,’ Location: remote Greater

Khingan Mountains in Inner Mongolia,’ Stats: ‘210 of the 222
wild boar died.’
Wei summarized the potential for this project. It provides informa-
tion at a fine geographic scale (state or county) that will be potentially
useful in spatial disease modeling and mapping, it provides informa-
tion to understand the time course of the spread, and it provides external
documentation confirming disease and response to the outbreak. It could
provide information to the pre-board, the Agricultural Statistics Board,
or other experts. It could also provide information to incorporate into the
modeling system. One advantage of web scraping is that it can be done
without the time limitations of the production system.
DISCUSSION
Chris Wikle asked about the need for a text corpus as a training
sample for the algorithm to understand grammar. He noted that the results
of NLP can be sensitive to the training sample used. Wei replied that he
used Python trained from Wikipedia-like text data.
Andrew Lawson asked whether Wei had any U.S. examples of web
scraping for disease. Wei replied that he did not because there is no cur-
rent disease outbreak occurring within the United States. Porcine Epi-
demic Diarrhea virus (PEDv) occurred 6 years ago, and news about it
has disappeared. Nell Sedransk added that web scraping began at NASS
within the past 6 months and is in preliminary form. Most of the sources
that would have carried the news about PEDv have been archived.
Lawson said that his group has a project on ontology based on scrap-
ing abstracts from the National Library of Medicine. One element of
NLP is understanding what is meant. That can be difficult with super-
ficial scraping, he noted, and there can be interpretational issues in web
scraping. For example, there could be very fuzzy statements that say “this
might be an epidemic,” when it is not.
Kamina Johnson (APHIS) reported that APHIS had a similar effort
15 years ago but using what would now be considered archaic or ancient
systems. APHIS developed an algorithm to filter the information that
came in, setting a wide net, with a human analyst to review and catalog
the information. Web scraping is not a perfect science, but a multistep
process, she emphasized. She said that Wei might use the Seneca Valley

virus to see if his approach would pick up on that disease. She also sug-
gested testing the system by searching for disease outbreaks that do not
involve swine, such as virulent Newcastle disease, currently occurring in
California, or low path avian influenza in the fall. These two would test
for detection of diseases with lower levels of reporting. High path influ-
enza gets a lot of attention when discovered.
She also recommended the inclusion of potentially new sources in
web scraping. APHIS uses the reporting from SDGSP and instant email
notifications from ProMED. Additionally, the World Health Organization
for Animal Health (OIE) sends out instant notifications about outbreaks.
The OIE and ProMED reports are released in a very distinct structured
format that would be easy to use in a web-scraping tool, she noted. The
OIE identifies diseases that it tracks, so its reports are disease specific,
while ProMED also includes non-OIE-reportable diseases.
Lee Schulz asked about the accuracy of news as a variable when it
is always changing, being updated, and occasionally redacted. He won-
dered whether it could be used to construct a variable accurate enough
for possible input to a model, referring to the discussion of the accuracy
issues related to trade expectations. Wei responded that the project is
still in a preliminary stage, and NASS is exploring what can be done with
the information.
Linda Young expressed doubt that any board number would be
changed based on web scraping, but it might give an early alert to some-
thing happening that would then need to be confirmed to be useful. Dan
Kerestes agreed with Young’s comment. The board is looking for more
information. If the information can be used, perhaps in conjunction with
comments sent in by the regional offices, it might add to the discussion.
Analysis of the project has not yet been carried out.
Schulz asked about the current process for experts to become
informed and whether web scraping might help fill a gap by speeding up
the process. Kerestes replied its main attribute will be as a confirmation
of other information.
Travis Averill (NASS) observed that the estimation process is focused
on the survey and auxiliary data for a reference period, the first of March,
June, September, and December. This process also results in comments
and other input from respondents and regional offices that are difficult to

WEB-SCRAPING EFFECTS 41
analyze and use. Web scraping has the potential to make NASS aware of
possible confirmatory information that might help to understand the situ-
ation in the field and the potential impact of events.
Wikle asked about the potential for others to manipulate this type
of information, especially if NASS scrapes blogs and sites where people
might report incorrect information once they know how it is being used.
He asked about a mechanism for detecting false placement of key indi-
cators. He also questioned using web-scraped data as input to a spatial
epidemic model. He cautioned there is a big step between taking the
information and using it as input for a model.


7
Modeling Swine Population Dynamics
Luca Sartore (National Institute of Statistical Sciences and National

Agricultural Statistics Service [NASS]) explained NASSʼs most recent
hog inventory model. It is an innovation that incorporates swine popula-
tion dynamics. Ron Plain (University of Missouri–Columbia) served as
moderator for the session. The presentation was followed by questions
and answers from the audience.
CURRENT MODELING INNOVATIONS

Sartore explained that NASS is interested in a model that pro-
duces estimates at a finer temporal and spatial resolution than currently
used models. In particular, although NASS publishes quarterly esti-
mates, the new model, called Satorie, Wei, Abayomi, Riggins, Corral,
Sendransk (SWARCS) after its developers, is designed to prepare monthly
estimates for pig crop and sows farrowing to better account for swine
population dynamics. Other variables use a quarterly model linked to the
monthly model. NASS would also like the model to prepare both state-
and national-level estimates, to be flexible enough to capture shocks, and
to combine several sources of information (survey data, historical NASS
published estimates, and state recommendations). The model must also
produce estimates within 2 hours. Sartore described the monthly models
at the national level, along with discussion of other ways under consid-
eration for making the model flexible enough to capture shocks, such as
using nonconstant survival rates.
43

When the survey data are available and aggregated, and auxiliary
data are prepared, the model is run. The information (data as well as
model estimates) is considered during the pre-board meeting. The pre-
board determines alternative inputs to the model, and the model is rerun
using those inputs. The result is provided to the Hog Board as one of the
scenarios for consideration. As described in Chapter 4, the Hog Board
determines the initial official NASS hog inventory estimates.
CALIBRATION
Calibration is used to correct for biases in survey data prior to input
into the model. Sartore described the estimation of ratios to adjust for
survey undercoverage. There is a time series of quarterly ratios defined
as the ratio of the official U.S.-level board estimate for variable k at time
t to the U.S.-level survey estimate for variable k at time t. One objective
is to model and predict these ratios. The actual procedure is more com-
plex, because a different ratio can be computed for each potential revision
of the initial published estimate (1) to the final revision (5). NASS applies
a neural network model that uses hidden layers to capture the optimal
adjustment for a quarter. The monthly adjustment ratios are set equal to
the relevant quarterly ratio.
This process results in estimated national totals. The state-level
recommendations must also be adjusted to add to national totals. A
Lagrange multiplier approach is used to produce an estimated ratio to
apply to each state recommendation. The calibrated monthly and quar-
terly historical and current data are used as input to the model.
The new model has two distinct parts: hog production, tracking
breeding through birth and weaning (a monthly model); and hogs to
market, reflecting the growth of hogs through weight groups to slaugh-
ter (a quarterly model). According to Sartore, each litter results in about
10 weaned piglets. This number changes with time. The pig crop is made
up of the piglets born during the month (quarter) that are still alive on
the first day of the quarterly survey reference period. The pig crop in the
hog production cycle enters the hogs to market cycle in the two smallest
weight groups. Hogs grow through the weight groups until they are large
enough to go to slaughter.
There are monthly models for the pig crop and sows farrowed (the
only variables with monthly data). The log of each of those two variables

MODELING SWINE POPULATION DYNAMICS 45
is assumed to follow an seasonal autoregressive integrated moving aver-

age (SARIMA) (2, 1, 2) × (2, 1, 2)12 model. The model also includes a
contemporaneous time-varying linear relationship (no constant) between
the monthly pig crop and monthly sows farrowed at time t with no error
term; and a linear relationship (no constant) between monthly sows far-
rowed and the quarterly breeding herd inventory 2 months earlier plus an
error term.
Parameters are estimated by minimizing the sum of squared residu-
als plus a penalty function involving the product of a positive parameter
“delta” and the sum of the absolute values of the AR and MA parameters.
Estimation uses an iterative EM algorithm. Initial values of parameters
are set to zero, except for the farrowing rate, which is set to 1/6. The
selection of parameter delta involves a LASSO regression and cross val-
idation. The LASSO penalty shrinks the parameters of the time series
models to zero, allowing an automatic model selection.
Sartore showed a plot of the SARIMA parameters as a function of
delta, pointing out that when delta is 1, the estimated parameters are essen-
tially zero; as delta becomes smaller the parameters estimates approach
the least squares values. He observed that the graph illustrates that four
parameters are significant. These four are selected for inclusion in the
model and estimation is rerun.
Sartore provided the equations that relate the quarterly market hogs
by weight group as a function of the monthly pig crop data plus an error
term. For example, the smallest weight group is a linear combination
of the pig crop for the most recent 3 months (t-1, t-2, and t-3), each multi-
plied by a different survival rate. Only a fraction of the pig crop in month
t-3 enters the smallest weight category; the rest enter the next weight
category. The next weight category is a linear combination of the monthly
pig crop for months t-3, t-4, and t-5, again each multiplied by its own
survival rate. Only a fraction of the pig crop in t-5 enters this weight
group, the rest enter the next (second-largest) weight category. The piglets
in the second-largest weight category are those born in months t-5
and t-6, each multiplied by a survival rate. A fraction of the pig crop in
month t-6 goes into the largest weight category. The largest weight cate-
gory is a linear combination of the pig crop at months t-6 and t-7. Only a
fraction of the pig crop at time t-7 enters the largest weight category. The

parameters to be estimated are the survival rates and the transition param-
eters between weight groups.
Sartore showed a graph illustrating estimated survival rates over time.
The estimated survival rates show some seasonality and a decrease during
the years of the epidemic. These survival rates are fairly stable, tending to
be around 0.95 before the epidemic, and a dip during the epidemic to about
0.88. Since the epidemic, survival rates have returned to about 0.92. The
quarterly estimates of survival rate and weight group transitions are esti-
mated by minimizing the sum of squared errors subject to two constraints:
The survival rates are constrained to be close to 1 and to form a smooth
function over time. Estimation uses the Broyden-Fletcher-Goldfarb-
Shanno iterative algorithm with initial survival rates set to 1, transition
rates for the first two weight categories set to 0.25, and for the last two
weight categories set to 0.75.
Sartore said that the model is estimated in stages. The first stage
involves combining the historic data, the latest survey indications, and
the state recommendations. The second stage involves updating the infor-
mation from stage 1 with estimates from the pre-board to account for
additional information from experts.
Sartore compared the new model and the Kalman filter model (KFM).
In particular, for all nine published estimates, the root mean square errors
(RMSEs) and mean percent errors (MPEs) were produced between model
estimates and both the initial and the final board estimates. The MPE
provides a measure of the relative bias while the RMSE provides a mea-
sure of variation due to both variance and bias.
For both models, MPEs were similar (same direction, similar magni-
tude) for both initial and final board estimates. The MPEs were fairly
small, less than 1 percent except for breeding herd inventory, for which
the MPE was slightly greater in magnitude than negative 2 percent
for the new model, and about negative 1 percent for the KFM. RMSEs
from the initial estimates tended to be somewhat smaller for the new
model than for the KFM, except for pig crop, sows farrowed, and breed-
ing herd. RMSEs from final estimates were somewhat smaller than those
of the KFM for all variables except breeding herd.
Sartore showed graphs illustrating the time series of initial board
estimates, the final board estimates, and the two model-based estimates
for pig crop and for total hogs. Figure 7-1 shows the plot for total hog

FIGURE 7-1 Pig crop estimates.

NOTE: KFM = Kalman filter model; SWARCS = Satorie, Wei, Abayomi,
Riggins, Corral, Sendransk.
SOURCE: Prepared by Luca Sartore for presentation at the workshop.
estimates over time. Light blue represents the initial board estimate, dark
blue represents the final board estimate, light green represents the KFM,
and dark green represents the new model.
The figure illustrates that during the epidemic and recovery, the ini-
tial estimates, KFM estimates, and new estimates were fairly similar.
Only the final board estimates revealed the full impact of the epidemic in
December 2013 and fully illustrated the recovery during 2014 and 2015.
In summary, Sartore said the new model dynamically accounts for
the hog lifecycle, incorporates a flexible description of equilibrium dis-
ruption, uses an external accounting relationship, and can adjust to shocks
though perhaps not to the extent desired. He said that an issue remains
with estimates for the breeding herd. In the future, he said, NASS would
like to extend the model to provide state-level estimates; provide a time
series model for survival rates instead of using a spline-based approach;
account for data quality at the operation level; and incorporate web-
scraping information for disease outbreak detection.
DISCUSSION
Matthew Branan (U.S. Department of Agriculture) observed that both
models seem much closer to initial estimates than to final estimates. He
asked whether Sartore has discussed with board members how they are

incorporating new information in those final estimates. Sartore replied

that there has been some discussion. New information comes primarily
from new slaughter data that are available regularly at a national level but
are not yet available at the time the initial estimates are made.
Katherine Ensor asked about any out-of-sample performance metrics
for the model, such as where the model is trained on one part of data and
evaluated on another. She noted when using LASSO or similar methods, it
is usually important to evaluate out-of-sample performance to avoid over-
fitting. Sartore has not done out-of-sample comparisons but explained that
he is using LASSO in a different way. With every model run, the parame-
ters selected may be different. If it is a stable period, the same parameters
are usually selected, but during times of shock, the parameters selected
may be different. Ensor suggested changing the parameters, not just pop-
ping variables in and out of the model during a shock. Linda Young asked
whether the model should have time-varying parameters. Ensor said that
time-varying parameters should be evaluated because dynamics in the
current model for pig crop and sows farrowed are captured by variables
that go in and out, with new parameter estimates each time.
In answer to a question from Ensor, Sartore said monthly data are
collected in the quarterly survey. For example, an operation that responds
to the survey as of March 1, reports sows farrowed and piglets born in
February, January, and December.
Christopher Wikle said an impressive aspect about state-space models
is the clear distinction between data and process. He observed that the
distinction is lost in this new work and questioned how uncertainty is
propagating. He asked whether Sartore has made estimates of the cover-
age probability1 of his forecast, or the state estimates relative to known
conditions. Wikle said that he did not see much difference between the
KFM results versus the new model in the plots, saying without uncer-
tainty bands, the difference may not be significant.
Sartore said he has not yet looked at the variance estimates from the
models. Wikle suggested Sartore may be accounting for some uncer-
tainty but not in a model-based way. Sartore said that he is accounting for
variation by the curvature of the objective functions that he is optimiz-
ing. Wikle said that the problem with using optimization instead of a
1The coverage probability is the proportion of the time that an interval contains the true value
of interest.

model-based approach is there is no way to quantify the effect of uncer-

tainty on estimates. Sartore said that recent research shows the variance
can be approximated, but it might be worthwhile to look at the differences
between an optimization approach and a model-based approach.
Young reminded the audience about Eric Slud’s earlier question about
whether variance estimates could be published sometime after the official
estimates are released. She and Dan Kerestes determined that if NASS
has good point estimates, there would not be a problem in delaying the
publication of variances if necessary. She said quantification of vari-
ances is one of the things NASS needs to do no matter what model is used.
NASS has two key issues: developing variance estimates and develop-
ing state-level estimates. The NASS KFM previously produced state-level
estimates, but they were not considered useful. Whether NASS can do
better with the current model remains to be seen, she noted.
Wikle said that he finds the exercise of comparing models without a
measure of uncertainty almost ill-posed. Young said that she appreciated
his perspective, but added that industry looks at the point estimates, and
they might look at an estimate for a day’s slaughter data for comparison.
Slud commented the argument about what users want is similar to
arguments the Census Bureau hears. Users want the point estimates, and
it is not always clear that they use the variance estimates. He asked for
clarification about NASS strategy for variance estimation. He said look-
ing at curvatures of objective functions is somewhat analogous to looking
at the curvatures of likelihoods, so it would be like modeled parameter
estimates. However, there are also survey variances that should be propa-
gated. In ordinary combinations of those approaches, survey variances
may be used for certain aggregated quantities and then a parametric boot-
strap used to understand the role of the model. Sartore replied that the
work on variances is not really done, but they have tried to combine
the variance from the survey with those from the likelihood. This would
be an approximation since there are so many parameters.
Slud said that the only way he knows to combine survey variances,
such as those obtained from a bootstrap or jackknife, with model vari-
ances is to do parametric model bootstrap loops within survey-bootstrap
loops. Sartore replied that it has to be done that way.
Ensor asked about the choice of LASSO to select ARIMA param-
eters. She said that the challenge is that it does not lead to a stable

process over time. Sartore replied that he initially did an analysis on a

spectrum level, looking at autocorrelations and partial autocorrelations
of the monthly estimates for the pig crop and sows farrowed. Ensor noted
that those models include first and seasonal differences and asked about
the importance of the seasonal difference. She observed that the seasonal
difference is a strong operator, essentially accounting for a correlation of
1.0 between observations 12 months apart. Sartore replied that it was to
account for the obvious annual seasonality. He observed that the KFM
also uses first and seasonal differences, but for quarterly data.
Slud asked whether at the end of the model fitting, there is a
well-defined model from which to microsimulate a dataset to use for test-
ing model consistency. Sartore replied that it would be possible to do.
Michael Schweinberger (Rice University) asked whether NASS has
considered splitting the estimation problem into two parts to get around
the 2-hour time constraint. The first part would be the estimation stage
during which all the historical data would be processed and parameters
estimated. This could be done well before current data are available and
well before the time crunch. The second part would involve updating
equations to account for current data as part of a sequential approach.
Sartore said although he had considered this approach, he has not tried
to implement it because it only takes 15 minutes to run his model, with-
out preparing uncertainty estimates. As he adds complexity to the model,
however, he may need to consider a sequential approach. Schweinberger
added a Bayesian approach would support quantification of the uncer-
tainty about estimates by looking at the posterior distribution. He asked
whether NASS has considered a sequential sampling approach with an
initial sample of perhaps 100 operators. Once the data are in, a decision
could be made as to how much additional information is needed and, if
determined, another 100 operators could be surveyed. Young said that
NASS has not considered sequential sampling.
Ensor noted that NASS maintains a history of operators, including
non-respondents, as well as information about the quality of the data
they provide. She wondered whether that history could be built into the
modeling. Making estimates as the data come in, perhaps having a rea-
sonable estimate with early responders that provide good data, may be

helpful. Young replied the agency would need to check on the feasibility
of this approach. The field offices collect the data, prepare estimates, and
send them to headquarters.
Plain asked a final question about the comparison of RMSEs between
the two models and initial estimates for all published variables. He
observed that for the pig crop, the old model seemed to perform better;
however, for market hogs under 50 lbs., the new model seemed to perform
better. This is interesting because there is a very high overlap between the
hogs in those two categories, he observed. He questioned why one model
would predict better than the other. Sartore responded he can improve the
new model for pig crop and sows farrowed.


8
Discussion of Detection and Monitoring
In this session of the workshop, issues related to detection and moni-

toring of swine disease. The session started with a joint presentation
by Matthew Branan and Kamina Johnson (Animal and Plant Health
Inspection Service [APHIS]) about the work they do on monitor-
ing, surveillance, and modeling for swine disease. Their presenta-
tion was followed by a panel of discussants, Lee Schulz (Iowa State),
Andrew Lawson (Medical University of South Carolina), and Michael
Schweinberger (Rice University), and an open discussion moderated by
Eric Slud (University of Maryland and U.S. Census Bureau).
SWINE DISEASE SURVEILLANCE, MONITORING,

DETECTION, AND MODELING
Branan gave a general overview about the APHIS systems and pro-
grams that are in place to surveil and monitor swine disease. Types
of surveillance include active or routine, observational or passive, and to
support trade. Active surveillance involves purposeful types of sampling
and testing for specific diseases that APHIS has authority to monitor or
are part of an eradication program. Passive surveillance involves opportu-
nistic sampling and testing for specific diseases, such as foot-and-mouth
disease. Surveillance to support trade might be part of a voluntary type of
system such as the trichinellosis certification program. There is no require-
ment to test for trichinellosis, but some producers test to demonstrate that
their hogs are free of disease in order to open or retain a market.
53

Branan reviewed APHIS information sources. Slaughter surveillance,

either part of active or passive surveillance, can come from Food Safety
and Inspection Service (FSIS) slaughter condemnations. Samples are
sent to a lab and tested for foot-and-mouth disease or pseudorabies, for
example. On-farm surveillance may be targeted to operations that are
deemed to be high risk for certain diseases such as classical swine fever.
These involve diagnostic laboratory submissions. Producers may send
in voluntary sick-pig submissions to be tested for various agents or dis-
eases. On-farm investigations are also related to foreign animal disease
investigations. For example, Seneca Valley virus has a lesion that looks
like foot-and-mouth disease. Presence of such a lesion may trigger an
on-farm investigation. If an operator wants to move animals between
states or out of the United States, a certificate of veterinary inspection is
needed. Animals coming into the United States are required to be
inspected at the border.
APHIS has eradication programs for diseases of concern, such as
swine brucellosis and pseudorabies. There are surveillance programs for
other diseases, such as African swine flu and classical swine fever, even
though the latter has been eradicated in the United States. The mandatory
reporting for swine coronavirus diseases such as Porcine Epidemic Diar-
rhea virus (PEDv) and Porcine Deltacoronavirus ended in 2018. The three
high-profile foreign animal diseases now are African swine fever, classical
swine fever, and foot-and-mouth disease. Branan explained that there are
other reportable domestic diseases identified from a list provided by the
World Organization for Animal Health (OIE). If a case of such a disease is
detected, APHIS informs OIE.
Branan next described three APHIS systems. First, the National
Animal Health Reporting System (NAHRS) is used for diseases that are
considered reportable, which includes the list from the OIE and the National
List of Reportable Animal Diseases. APHIS gets the data through con-
fidential monthly submissions from the states that indicate presence of
disease for livestock, poultry, aquaculture, and so on. This information
can feed into disease response activities at the state level, the national
level, and to support trade.
Second, the National Wildlife Disease Program within the Wild-
life Services Group monitors the effects of wildlife on agriculture.
For example, there is a program to identify and quantify the types of

DISCUSSION OF DETECTION AND MONITORING 55
damage caused by feral swine. Feral swine can damage crops with their
wallowing, tree rubbing, and root-finding activities, and they can also be
reservoirs for diseases that can spread into domestic swine or cattle popu-
lations. The program has mapping activities to track where feral swine
are and where they are spreading, and also where the clusters of pseu-
dorabies virus and swine brucellosis are among feral swine populations.
The program also educates producers on how to prevent the spread of
disease from feral animals to their farms and domestic populations.
Third, the National Animal Health Monitoring System (NAHMS)
conducts national surveys every 5 to 10 years to collect information
from operators about animal health and management factors. Surveys
are cooperative efforts between APHIS and NASS. There are typically
two-phase national-level studies for the swine population. NASS admin-
isters a questionnaire during the first phase; the second phase includes a
questionnaire, administered by APHIS animal health experts, and bio-
logical sampling of swine feces and blood to test for specific agents or
diseases. APHIS uses NASS data on inventories and cost, and asks about
marketing, production practices, movement of animals, and biosecurity.
A sample question may be, “How do you keep your pigs sequestered from
wildlife?” APHIS collects biological samples and in the past has tested
for a variety of diseases including Porcine Respiratory and Reproductive
Syndrome (PRRS), trichinellosis, salmonella, E. coli, and enterococcus.
These provide estimates of prevalence and presence of diseases at the
national level.
For each of these programs and systems, regulatory authorities give
APHIS the ability to collect or act on data. The Animal Health Protec-
tion Act provides general authority for APHIS to control, prevent, and
monitor various diseases. The Swine Health Protection Act is focused
on garbage feeder operations. Other authorities such as quarantine laws
allow an operation to be quarantined if certain diseases are found.
In the next part of the presentation, Johnson noted the following
diseases have specific active or passive surveillance in the United
States: classical swine fever, African swine fever, foot-and-mouth dis-
ease, pseudorabies, swine brucellosis, and influenza A in swine. Most of
the surveillance programs have objectives related to rapid detection
of disease occurrences in the United States and/or a demonstration of
freedom from disease to support trade. Seneca Valley virus (SVV) is a

disease with similar clinical signs to foot-and-mouth disease. While the

foot-and-mouth disease surveillance system is passive, it has been used to
examine SVV cases carefully to rule out foot-and-mouth disease. She said
that she watched how the reporting for swine enteric coronavirus disease
(SECD) came about, was released online, and became widely used. She
noted that SVV may be an example of an emerging disease in swine that
NASS could consider using in testing its web scraping (see Chapter 6).
One other system that APHIS tracks and monitors uses data from
samples of condemnations collected by FSIS. FSIS condemnation data
are analyzed on a weekly basis by APHIS. The staff use models to try
to figure out which spikes might represent a possible disease event that
needs to be investigated.
Most APHIS surveillance systems are set up for active or passive
surveillance. Two sets of documents are available online for response
to emergency outbreak situations. First, the disease response plans or
Red Books provide guidance for responding to a disease outbreak. They
have surveillance plans for classical swine fever and foot-and-mouth
disease during an outbreak. The second set of documents are disease
response strategies, which are available for a few diseases, such as African
swine fever.
She reported that oral fluids testing has become widely used in indus-
try for private sale or contracts. Oral fluids testing is done by dropping
ropes into a pen of swine. The hogs are trained to chew on the rope. Fluids
are later extracted from the rope and sent in for testing. The result is an
aggregate sample of pigs in the pen and is much cheaper than individ-
ual testing. This testing process is time efficient and cost effective. The
process is widely used for PRRS and Coronavirus diseases. It is being
explored for use with foreign animal diseases because of the potential
cost savings.
Johnson noted that the modeling group with whom she currently
works consists mostly of epidemiologists. She is currently the only econo-
mist. The group does epidemiologic and economic modeling of potential
disease outbreaks in the United States. They focus on transboundary
or foreign animal diseases, such as foot-and-mouth disease, African swine
fever, and classical swine fever. The models make extensive use of NASS
information and APHIS data to estimate locations of affected operations.
They use National Animal Health Monitoring System (NAHMS) data for

contact points and prevalence estimates in certain areas to try to deter-

mine how diseases spread. They also use some information about indirect
contact rates from literature.
APHIS economic impact modeling is a quarterly partial equilibrium
model that relies heavily on population data. Johnson showed a graph
related to foot-and-mouth disease vaccination scenarios. When they study
a disease, they look for different options that can be used to contain it.
One goal is to minimize outbreak spread. They also try to develop cost-
effective approaches to stopping the spread of disease, including identifi-
cation of potential producer decisions and incentives. Some producers are
not directly affected by an outbreak, but an outbreak typically results in
price changes that do impact them. It is a national model, but they are able
to develop some regional strategies.
APHIS uses a variety of information sources to develop three types
of shock trajectories to explore disease scenarios. First is a production
shock scenario: for example, the epidemiological model might determine
how many hogs were culled or depopulated because of a disease out-
break. The removal of the animals moves through the model across time.
If piglets are depopulated, for example, slaughter numbers will be smaller
at the time those piglets would have been big enough to go to market. The
second type of trajectory is a trade shock. If there is a disease outbreak,
trading partners may impose trade embargos. In this scenario, the ques-
tion is how much of a flow of live animals and products will be cut off and
for how long. The third type of shock is consumer response. For example,
a consumer might decide not to eat chicken in response to news about an
outbreak of bird flu. This scenario would remove some purchase and con-
sumption of chicken from the marketplace. She noted that APHIS has not
used consumer response with a swine disease, but the feature is available
in the model.
The model uses the concept of a baseline—the status quo without
disease. They use the shocks for each scenario to disrupt market condi-
tions in the baseline of the model and to determine how long and what
path it takes to return to equilibrium. A loss or dip is usually seen, fol-
lowed by a return to baseline or a little above on its way to equilibrium.
Johnson concluded with an example of a different approach that used
the same model to look at swine dysentery, a production-level disease.
APHIS used prevalence information about swine dysentery in NAHMS.

Otherwise there is no regular federal monitoring for this disease. In

her analysis she used the model to measure the impact of reduction of
the prevalence of swine dysentery on the marketplace. Although swine
dysentery affects a fair number of producers with a fair number of hogs,
she found that reducing its prevalence would have little impact on the
market prices for hogs. The market found its equilibrium relatively
quickly with low-level impacts on a national scale. She said that the
impact of eliminating the disease on producers that experienced the dis-
ease would have been about a 10 percent increase in net returns. Other
producers were not impacted.
PANEL DISCUSSION
Schulz asked whether APHIS and NASS could collaborate and
share information in real time, such as the results from web scraping.
Johnson replied APHIS would be interested in how to format FSIS infor-
mation to make it most useful to NASS. She added that APHIS did more
web scraping in the past. She agreed with the importance of collaboration
and noted APHIS already has a close working relationship with NASS.
Linda Young added efforts are being made to share information across
U.S. Department of Agriculture agencies. She said she planned to explore
potential collaborations.
Slud noted that APHIS has streams of disease monitoring, each of
which might be in the form of occasional presence/absence, but many
do not have a regular format for reporting. For input into a model,
the information would need to be recoded into a general indicator. It
might not report all streams all the time, but might indicate when some-
thing crosses a threshold. It does not necessarily need to be directly pre-
dictive, he said, but to be correlated with important variables and to have
a spatial quality.
Johnson noted that APHIS has a risk identification group that looks
at international outbreak situations to try to assess threat or risk levels
for incursion into the United States. It also tracks and monitors situations
happening within the country. It sends reports to the secretary of agricul-
ture’s office. NASS may be another potential USDA customer for their
products and reports. Young noted that often NASS can more easily share
information with other USDA agencies than outside the department.

Katherine Ensor asked whether NASS and APHIS could develop a

regularly shared shock indicator suitable for input to models. For exam-
ple, if the agencies jointly identified the current or potential shocks of
interest with some type of modeling to provide the probabilities of occur-
rence, this effort might be a spin-off from what the agencies are already
doing. Lawson replied that there have been similar indicators developed
on the human side, a few of which he developed. For example, because of
the interest in biosecurity after 9/11, Lawson and colleagues developed
Bayesian models and posterior functionals that identify changes in the
system, such as shocks. They developed a surveillance conditional pre-
dictive ordinate, which allows for early detection of shocks. In the realm
of syndromic surveillance, Kullback Leibler measures have been tested
in a number of papers. They can be incorporated in Kalman filter models
and are available in Bayesian models. They are relatively easy to com-
pute, he added, and are based on predictive distributions from previous
times to current times.
Researchers have been modeling human population epidemics to
consider two components, Lawson said. One is an endemic component,
which is essentially what NASS is calling equilibrium; the other is an epi-
demic component, which is the shock. The two submodels run together,
and they are added together, each weighted by its posterior probability.
They provide information for detecting when the epidemic component
starts, carries on, and ends. These are state-space models.1
Lawson referred to comments in earlier sessions about going down
to the unit level in spatial modeling. In his view, the most sensitive mod-
eling of disease spread might be at the unit level. However, state-level
modeling is still possible. It would require modeling the neighborhoods,
the states that are neighbors to a state, and the cross-boundary flows in a
fully developed spatial model.
Slud asked whether the weekly accession counts could be developed
into a leading indicator of an epidemic, perhaps by way of an interac-
tion term or something that indicates the accession counts are important
only if something else happens. There is a spatial distribution to acces-
sion count reporting. Lawson responded that he was talking about condi-
tional predictive ordinates. These ordinates are used to predict what will
1Lawson provided these references: Corberan-Vallet (2012); Lawson, Onicescu, and Ellerbe
(2011); Corberan-Vallet and Lawson (2014); and Held, Hofmann, and Hohle (2006).

happen during the next time period, and that prediction is compared to the
observed value at that time period. Accessions might provide input to a
model, he said.
Ensor asked whether APHIS has modeled the patterns of outbreaks
for swine. If there are typical patterns, she queried whether there is sta-
tistical information in the patterns themselves. Johnson replied that
epidemiologists have explained that any pattern depends on where the
outbreak starts, both in terms of geography and the type of operation, as
well as direct and indirect contact points, how hogs are moving, and how
feed trucks are moving on and off operations. An important consideration
is compartmentalization, for example, how feed truck networks move
through the system. An outbreak in a commercial operation raising pig-
lets in North Carolina will have a very different outbreak response from a
backyard operation in Colorado. Patterns depend roughly on location and
type of operation.
Lawson discussed foot-and-mouth disease in the United Kingdom, in
which a very large outbreak occurred in one area. A vehicle ban was put
in place to stop the spread. It completely knocked down the original epi-
demic but then the disease jumped to other places. Predicting the jumps
is extremely difficult because it is purely random, as occurs with other
diseases such as HIV. That said, some patterns are fairly predictable.
There is typically a shock with a long tail. He noted that Bayesian models
have been used to predict the ends of epidemics, as well as the starts. Pre-
dicting the end could be very important from a veterinary point of view.
Schweinberger provided a related idea. Instead of spatial structure,
he asked about network structure. He noted two questions to answer: (1)
Is there a shock? (Schweinberger said that Lawson had described some
Bayesian approaches for answering that question); and (2) If there is a
shock, how large will its impact be? For example, a shock in the form
of an epidemic could in principle wipe out the entire U.S. population of
hogs. However, this possibility is very unlikely because infectious dis-
eases are transmitted by contact and the network of contacts among hogs
constrains the spread of infectious diseases. For example, an outbreak of
an infectious disease on a farm in Colorado could spread directly to all
farms that purchased hogs from the affected farm, but it could not spread
directly to farms that did not purchase hogs from the affected farm. The
structure of the contact network places hard constraints on how infec-

tious diseases can spread, which is relevant for determining how large a
shock in the form of an epidemic can be and how much it can reduce the
U.S. population of hogs.
This structure provides a way of assessing the impact of an epidemic,
Schweinberger suggested. A survey with two questions could be used to
collect additional data: (1) Is there a hog-related disease on your prop-
erty? and (2) Did you sell hogs, and if so, to whom? Answers to these
questions could help to assess the extent of a problem, where disease
might have traveled, and which farms to monitor. Understanding the
network structure might help in assessing how large the impact of an epi-
demic might be.
Johnson noted epidemiologists in her group look at different scenarios
with different start points. Simulation models help predict the proportion
of the population affected through space and time. She said APHIS also
manages the Emergency Management Response System (EMRS). The
EMRS maintains outbreak response information, including asking ques-
tions to farmers about where they source and sell hogs. They commonly
do tracebacks with cattle, which tend to be sold in smaller lots than hogs.
That information is used during an outbreak to predict spread. They also
compare different response strategies to curb disease spread. Strategies
include enforcement of boundary stop points for quarantine and deploy-
ment of vaccines.
In answer to a question from Schweinberger about the epidemiologi-
cal models used, Johnson replied that the main model for swine diseases
is InterSpread PLUS, developed in New Zealand. APHIS has adapted it
for the United States and uses it to prepare national and regional analyses.
It started with classical swine fever and have recently adapted the param-
eters for African swine fever, because those diseases are similar in many
ways. APHIS has also been developing an animal disease spread model,
but it is not yet operational.
Schweinberger asked about use of a susceptible-infectious-recovered
model. This model is applied to determine how infections spread in a
population such as a herd. These kinds of models make the implicit
assumption that the disease could travel from each unit to every other
unit in the population. More metric-based approaches are also available,

and they are more realistic in that they acknowledge a disease can only
spread from an infectious unit to other units that are in contact with that
infectious unit.
Johnson said they use herd-level models; if the herd tests positive,
not all animals on the site will be infected at that point in time. Predicting
the spread from one herd to another herd is the aim to looking at spread
across multiple sites as opposed to within the herd or flock. The model has
a spatial component.
Ron Plain noted the focus on big shocks and new diseases, but he
asked about data to estimate death loss for chronic diseases, such as
PRRS or swine dysentery, to determine whether levels are high, low, or
typical. Even though they may not be big shocks, it would be interesting
to see whether the deviation is enough to impact pork production, he said.
Branan said FSIS condemnation rates capture part of that information.
Codes indicate why particular animals are condemned at slaughter and
could track deviation from baseline.
Johnson added that for many production diseases, the prevalence esti-
mates are done through NAHMS studies. The studies provide an update
every 5 years to see how the prevalence level is changing. She asked
about industry tracking, perhaps by the Swine Health Information Center.
Schulz said a private/public partnership at the University of Minnesota,
called the Swine Health Monitoring Project (SHMP), started with PEDv
and now monitors PEDv and PRRS. Johnson observed SHMP provides
more frequent reporting of disease than the NAHMS study that comes
out every 5 years.
Yijun Wei asked whether the simulation study mentioned by Johnson
was an epidemiological or another type of model. Johnson said that she
and Branan are not experts in the epidemiological model, but they would
be happy to work with NASS after the workshop and bring in experts
with more in-depth information.

9
Discussion of Modeling
This workshop session focused on strengths and weaknesses of cur-

rent modeling efforts for hog production: state-space models and the new
approach to modeling of biological processes. An example of a state-
space model is the Kalman filter model (KFM), while the new approach,
presented in Chapter 7, is the Sartore, Wei, Abayomi, Riggins, Corral,
Sedransk (SWARCS) model. Formal discussants were Katherine Ensor
(Rice University) and Christopher Wikle (University of Missouri). The
session included open discussion. The moderator was Eric Slud (Univer-
sity of Maryland and U.S. Census Bureau).
MODELING CHALLENGES
Ensor acknowledged the difficulty fully understanding the challenges
of the project and how all the pieces fit together. She pointed to promising
results from the state-space modeling perspective. State-space models
allow bringing in prior information and are frequently computationally
manageable. She supported Andrew Lawson’s suggestion (see Chapter
8) about developing some type of switching between endemic and epi-
demic time periods. This approach allows the dynamics of the model to
change, for example by using hidden Markov models. She also observed
that collaboration with the Animal and Plant Health Information Service
(APHIS) might help the National Agricultural Statistics Service (NASS)
develop an epidemic model to use with a switching approach.
She added she strongly favors a more model-based approach because
of apparent data limitations. She asked how many time points are used
63

for fitting the models that have been described. She reported that the
more empirical-based approaches using LASSO and seasonal time series
models will be very unstable unless a long time series is used in estima-
tion. Understanding and measuring uncertainty are also very important.
Wikle said he appreciated the complexity of the problem and praised
the modeling efforts described. He said the problem reminded him of
new work in ecological modeling called integrated population models
(IPMs), which take many different data sources in a population-based
model that accommodates biological dynamics (referred to as constraints
by NASS) in a spatial setting. IPMs seem to provide an ideal framework
for NASS, he suggested. They are often fully Bayesian, and there are
frequentist versions.
Wikle also said NASS would benefit by incorporating spatial con-
siderations into its modeling (see Chapter 10 for further elaboration). He
expressed interest in modeling in both space and time and remarked that
combining them requires more thought than back-engineering a time
series model with spatial items. He observed that dynamics occur on
scales that are not currently incorporated in NASS modeling. NASS is
modeling biological constraints in the data, which is important, but the
real biological dynamics occur on an individual and maybe even a herd
scale. He encouraged thinking about the dynamics and where they occur.
He added that while most users care about point estimates, a coherent
decision-making process requires that the model produce a point estimate
that has quantifiable uncertainty. That means that an estimate is needed
of the reliability of the point estimate. Reliability may need to be esti-
mated through simulation or microsimulation. This is especially impor-
tant if parameters are re-estimated at every time period, he stressed.
Ensor agreed, noting a concern that parameter uncertainty estimates
have not been examined. Such estimates provide information about how
well the model is working. In answer to her question about how many data
points are used to fit models, Luca Sartore replied the SWARCS model
uses the entire time series of monthly and quarterly data from 2008 to
the current quarter to prepare model estimates for the current quarter.
His approach has so many parameters to estimate that the amount of
data points is not sufficient to get stable estimates. He used the LASSO
approach with penalties to make the process stable. Ensor replied that
using the whole history eliminates the opportunity to capture dynamics.

DISCUSSION OF MODELING 65
Sartore elaborated that the sequential generalized linear model uses

a 4-year moving window and the KFM uses the entire time series, as his
model does. Ensor urged the importance of recognizing the number of
data points in a model and how they are used. Some of these methods are
highly variable in their estimation. Even 10 years of data, either 40 or 120
data points, is quite small from a time series perspective, she said. How-
ever, if the model can capture the dynamics, she would favor a switching
model or a model with dynamic parameters.
Ensor and Wikle agreed that pursuing state-level models over time
and rolling them up to the national level (bottom-up) might give better
results than the current top-down approach. Ensor noted that it is impor-
tant to recognize that because of data limitations, the bottom is the state
and not the producers. Lee Schulz asked about the potential implications
of a bottom-up approach given that external slaughter data, not available
at the state level, are used to correct for biases in the survey data. Ensor
agreed that this poses a constraint.
AUXILIARY INFORMATION
Slud asked about auxiliary information available at the state level,
noting pork check-off data as one source. Ron Plain explained the manda-
tory check-off program. When animals go to slaughter, the packers make
an assessment of 0.4 percent of market value. That money goes to the
National Pork Board, part of which is allocated to the state where
the animal was raised. The Pork Board publishes those data monthly. The
data show how the dollars are allocated and how many hogs come from
each state. They look like very good data when compared to the national-
level slaughter data, he said. However, one issue is that many pigs change
states: For example, they are born in North Carolina and are moved to
another state to be fed and raised, but ownership does not change. If there
is no ownership change, then no check-off is assessed.
Ensor asked whether the problems with the check-off data are known
well enough so that they might be useful in modeling. Slud asked about
other data that might illuminate the shortcomings—for example, survey
data on state transfers of hogs for raising or state-to-state transfer infor-
mation. Plain responded that in the past, state veterinarians’ offices had
information on the shipment of hogs for nonslaughter purposes. The
data were based on health certificates needed for interstate hog transfer

reported to the state departments of agriculture. Dan Kerestes noted that

NASS makes use of veterinarian data. He reported the quality of those
data vary among states and have deteriorated over the years, but the data
are used to prepare production and disposition reports to account for pigs
in every state and come up with cash receipts.
Nancy Kirkendall summarized discussion about check-off data that
the committee had before the workshop, asking Plain to provide any
needed corrections. She said that the check-off data report sales in three
categories of hogs. The most relevant for current modeling purposes
might be sales of market hogs because those animals are most likely
intended for slaughter. That category of sale might be correlated with
slaughter data by state for that size hog, possibly with a relatively short
lag between the sale and the slaughter. As with any potential new data to
be used in modeling, there would need to be some evaluation of correla-
tions to see how the data might be useful.
Matthew Branan said APHIS works with the veterinarian data to
provide an indication of animal movement. It also collects some infor-
mation from National Animal Health Monitoring System studies. These
are smaller than the National Animal Health Reporting System stud-
ies, so aggregate information is limited to regional or national-level
groupings, such as a slaughterhouse-based level estimate rather than a
state-level estimate.
Ensor asked about opportunities for a spatial temporal modeling
approach with simultaneous estimation of state and national information.
Branan recognized the value, but noted supporting data may not be avail-
able. The goal of the animal disease traceability program is to get that
level of information, but for now modeling may be restricted by the data
that are available. Ensor noted with state-level state-space models, esti-
mate uncertainties could be used as variance estimates in the measure-
ment equation. At the national level, having the dynamic hidden Markov
model switching between endemic and epidemic, and building up the
epidemic side with information from APHIS, would likely provide value.
Young said use of the KFM would require adaption to give state-level
estimates. Ensor suggested NASS explore borrowing strength, looking at
space-time and incorporating uncertainty into the modeling.

DISCUSSION OF MODELING 67
Slud noted further discussion of state-level estimates and borrowing

strength later in the workshop (see Chapter 10). He observed that switch-
ing could be incorporated in several ways. First, if an indication came in
about a shock, it could be used to switch models. It could also be used as a
covariate in a generalized linear model framework with other covariates.
In either case, he said, past data from regions where that indicator variable
was present would be needed to estimate the parameters of the model.
That situation suddenly knocks down the length of the time series avail-
able for estimation. How to get enough data for estimation is a puzzle,
he commented.
Sartore agreed, adding that the Markov switching model might also
be useful to directly model the monthly estimates for pig crop and sows
farrowing and also for modeling survival rates. Survival rates depend on
the disease, when the outbreak starts, and how long it lasts. This situation
is more complex at the state level because it depends on which states are
near each other, where the disease starts, and how it spreads. All these
can be modeled by the hidden Markov model, but the underlying process
is more complex than is visible from the survey data.
Slud observed that Sartore’s model is already over-parameterized
and adapting to shocks would result in a very small training dataset.
Sartore agreed, saying that NASS would like to keep the model and
estimation frequentist, which makes it difficult to account for missing
information. He said perhaps a Bayesian approach could help because
prior information from experts, the literature, or past time periods could
be used. He said that the question is how to get stable, viable estimates
quickly. Slud said although his inclination was to start with a frequentist
model, this discussion made him consider Bayesian approaches.


10
Discussion of State-Level Estimation
This session was organized by the planning committee to discuss the

modeling of state-level hog inventory data. Gavin Corral (National Agri-
cultural Statistics Service [NASS]) presented on data available for state-
level modeling, followed by Luca Sartore (NASS) on extending the new
NASS model to make state-level estimates, Gauri Datta (U.S. Census
Bureau and University of Georgia) on Fay-Herriot models, and Eric Slud
(University of Maryland and U.S. Census Bureau) on a shares approach
to making state-level estimates. The session concluded with open discus-
sion. Chris Wikle was the moderator.
DATA AVAILABILITY
Corral summarized the data available to support modeling at the state
level. First, there are 30 states for which survey estimates for all inventory
items are available for all quarters beginning in 2008. For the remaining
20 states, survey estimates are available annually in December. In addi-
tion, state recommendations are developed with state-level survey data
adjusted by regional offices. For these data sources, variables available
include sows farrowed and pig crop (available monthly) and breeding
herd, inventory by weight group, first and second intentions, death loss,
and total hogs.
Corral referred to the earlier discussion about monthly check-off data
(see Chapter 9). U.S. pork producers and importers pay $0.40 per $100.00
of value when pigs are sold and when pigs or pork products are brought
69

into the United States. The National Pork Board uses the funds for spe-
cific program areas such as promotion, research, and education.
NASS has annual national import and export data. Some data are
available annually on state-level in-shipments and out-shipments from
the state veterinarian offices. Slaughter data are available monthly at the
national level. Data on number of hogs slaughtered in a state are available
monthly based on slaughterhouse location, but those data do not reflect
where the hog was raised so are not comparable to NASS hog inventory
data. They are also available by slaughterhouse location. Another poten-
tial source of state-level information is the web-scraping data described
in Chapter 6.
EXTENDING THE NEW NASS MODEL TO STATE ESTIMATES

Sartore described his ideas for preparing state-level estimates. He
suggested a hidden Markov model probably as a replacement for the cur-
rent national-level model. Estimates would be prepared at the state level
and aggregated to the national level. The model should respect the rela-
tionship between dynamics of state and national survival rates, he said, as
well as incorporate in-shipment and out-shipment of hogs across borders.
These movements need to be accounted for because inventory is defined
as the number of animals in the state at the survey reference period.
Clearly, he said, additional sources of data are needed. Finally, he pointed
to the 2-hour time requirement for computation.
Sartore showed a node graph with one node for each of the 16 states
for which survey estimates are prepared quarterly, plus a 17th node that
represents all other states. The chart also shows lines connecting the
nodes. There is a set of state-level data for each node. The lines indicate
interstate variables: spatial distance, disease spreading, transportation.
One question is how to extend this network to all 50 states surveyed in
December. The model needs to incorporate spatial-temporal relationships
to support a formal way to borrow information, he said. For example,
one might consider estimating the survival rate for a state at time t as a
weighted average of survival rates in neighboring states at time t. With
this formulation, it is not yet clear either how to define “neighboring
states” or how to determine the appropriate weighting. He gave a simi-
lar example of how to estimate a transition matrix for the percentage of
exported (or non-exported) hogs in a state.

DISCUSSION OF STATE-LEVEL ESTIMATION 71
Sartore concluded by summarizing the computation issues. He is con-

sidering using an optimization based on gradient descent methods rather
than simulations to reduce memory requirements. Additionally, a unified
framework for simultaneously computing all estimates would be desir-
able, and he has to consider the 2-hour limitation on run time. Among the
questions that remain are how to simplify the model and what can be done
if transportation data are not available. Another key question is how to
compute variance estimates in this complex process with multiple types of
error: modeling error, sampling error, imputation, and judgment.
SMALL AREA ESTIMATION

Before moving on to the two presentations about small area estima-
tion, Slud asked about the sample sizes in small and large states to get a
better idea about the range of sampling errors. Although no one had good
examples, Matthew Branan referred to sampling weights for Colorado and
Iowa (see Chapter 3) that give some indication. Slud noted that the sam-
pling weights of 1 are for the very large producers that dominate. They
are self-representing and do not contribute to the variance estimate. Nell
Sedransk clarified the sample sizes in Colorado, saying the medium- size
group has dwindled postepidemic and is in the double digits, maybe as
low as 30. The numbers in the large group have increased slightly, and the
numbers in the very small group have increased a lot. As noted in Chap-
ter 3, a huge percentage of hogs are held by very large operators, but a
huge percentage of operations are very small. Sampling proportions are
set to achieve set coefficients of variation and, consequently, both size and
number of operators figure into those computations.
Slud observed information about the population and sample is con-
sistent with the Census Bureau’s economic surveys. He noted small area
estimation has primarily been used with household surveys that have
very different properties from economic surveys.
Fay-Herriot Model
Datta, in a joint presentation with Slud, started by introducing the
Fay-Herriot model, a popular model for producing small area estimates.
In the case discussed at this workshop, the goal is to estimate ϴi, the hog
production for state i, for i = 1 to m. A state-level estimate, Yi is known and
prepared from the sample that is a direct estimate of ϴi. However, direct

estimates are often not reliable, especially for small states. To develop a
reliable estimate for states, Fay and Herriot (1979) proposed an approach
to borrow strength from other data sources. It uses two models. The sam-
pling model describes the sample estimate as providing an unbiased esti-
mate for state production, ϴi, with an additive noise term. The noise term
has zero mean and is normally distributed, with variance equal to the
sampling variance. There is no correlation between estimates for differ-
ent states. The second model is called the linking model. It connects ϴi to
covariates Xi via a linear regression relationship with the covariates. This
model also has an additive independent identically distributed noise term.
The linking model will be best with good covariates, he said. Disease
indicators might serve as useful covariates for the topics discussed in this
workshop. Other covariates may include parts of the growth models, such
as the relationship between pigs born and sows farrowed. Once param-
eters are estimated, the approach results in two estimates for ϴi , with one
from the survey and one from the regression. The approach yields a third
estimate for ϴi called the shrinkage estimator. The extent of shrinkage
depends on the variance of the sampling error equation and the variance
of the linking equation. If the variance of the survey estimate is small, as
it might be for the largest hog-producing states, the shrinkage estimate
will be close to the direct estimate. If the variance of the survey esti-
mate is large, as it might be for smaller hog-producing states, the shrink-
age estimate will be very close to the prediction of the regression linkage
model.Fay-Herriot models can be treated and estimated either in a fre-
quentist or Bayesian mode.
Datta described benchmarking work done for small area models. He
noted a Bayesian solution to the benchmarking problem in some of his
own work (Datta et al., 2011). With this approach, the small area model is
used to get state-level estimates, which are summed and should agree with
the national-level estimate. If they do not, then a benchmarking approach
is used to change the estimate from the model slightly so that the new esti-
mate satisfies the benchmark.
He noted that another approach uses two-step modeling: a small area
model and then a time series or state-space model. The Fay-Herriot
approach provides a way to combine these estimates. He noted that
Pfeffermann and Tiller (2006) described benchmarking approaches
that help to account for changes due to shocks, such as disease. The

model helps by providing a corrected set of estimates instead of showing

a big divergence immediately after the shock.
Datta said that Ghosh and colleagues (1996) published a multivari-
ate cross-sectional time series small area Fay-Herriot model. This model,
for example, could estimate hogs and pigs for different weight groups
including their dynamics. For some particular time points, there could be
a multivariate response and multivariate covariates, he added. Some of the
covariates may be subject to sampling error, which must be included in
the model. If a frequentist approach is used, then a jackknife or bootstrap
is used to measure the uncertainty. A fully Bayesian approach provides
variance estimates.
He observed that some of these approaches also work with short time
series, such as the 10 years of quarterly data discussed. Many survey and
nonsurvey variables can be put together in a model to borrow strength
over both time and small areas. That is a very established area of research
in small area estimates, an example is Ghosh and colleagues (1996). Datta
concluded that this work extends the Fay-Herriot model by adding a time
component and fixed regression coefficient. Hierarchical-Bayesian model-
ing is generally used for these models. This version of the model would
also need benchmarking, and there are other papers on Bayesian bench-
marking by Ghosh.
Linda Young said that NASS uses the Fay-Herriot model for crop pre-
diction models and has been quite happy with it. She asked how to include
the biological growth processes in this model. What NASS likes about the
current approach, she said, is that it models the weight classes and moves
through time with those restrictions in place. She asked how that could be
accommodated in a Fay-Herriot model.
Wikle said that the random effect in Datta’s model is time varying. It
could also be spatial or multivariate. There is no reason why it cannot also
be a dynamical process, he said. He pointed to Datta’s model where the
coefficient on the covariates is assumed to follow a random walk. Many
other options are possible, he said.
Shares Approach
Slud used Figure 10-1 to introduce a shares model sometimes used in
small area contexts when data are meager. The approach concentrates on
developing national estimates for hog inventory classes, and it models the

FIGURE 10-1 Small area method with benchmarking beyond synthetic

estimation: A shares approach.
SOURCE: Prepared by Eric Slud for presentation at the workshop.
proportion that would be attributed to each state as a share of that total. In
Figure 10-1, domain i is the geographical state and domain t is time. The
vector ρt,i is the fraction of total inventory across each inventory class at
time t in state i.
The fraction of U.S. hog production in each state is estimated directly.
This requires alternative estimates for the covariates Xt,i. There are already
a few estimates for the ρ's. First is the fraction of the total in state i accord-
ing to the survey data this is represented by ρt,i (left-hand side of the
equation in Figure 10-1). Other estimates for ρ, the Xt,i on the right-hand
side of the equation, are determined from covariate data, perhaps state
recommendations from the field offices or the pork check-off data. These
might be measured with error, in which case a measurement error model
should be included. This is another way to borrow strength that has
worked in other small area contexts using survey data.
This process also requires benchmarking. The sum of state shares
must sum to 1. The estimates from these types of models would not
satisfy that requirement. That is why a post adjustment called bench-
marking is needed, he said, which could be done with either a frequentist
or Bayesian approach. Finally, he referred to Rao and Molina (2015)
as one of the best current references for small area estimation. The chal-
lenge is that all small area methods need good covariates to work well,
he concluded.

PANEL AND OPEN DISCUSSION
Ron Plain asked about data available at the state level. He asked
whether large producers are reporting in the smaller 20 states (those states
for which the sampled operators are only required to report in Decem-
ber). In other words, he asked, does I.A. Smithfield, a very large producer,
have hogs in 1 of the 20 states that are not surveyed quarterly? If so, that
information might be useful for modeling. Dan Kerestes said there are
not enough instances of this occurring to be useful, noting that the states
surveyed annually contribute less than 0.5 percent of the national pig crop.
Plain referred to Sartore’s discussion about estimating pig sur-
vival rates and death loss by state. He asked whether death loss is more
correlated to size of operation than to the location of an operation. Death
loss for a smaller operation might be more similar to other small opera-
tions elsewhere than to larger operations in the same state. Nell Sedransk
said pig crop per sow farrowed varies more by operation size than it does
by state. It also varies a bit seasonally. In the northern tier of states with
harsh winters, there is more seasonality than in the southern tier states.
There are differences that are regional. She said that she has seen the rela-
tionship of death loss to size of operation.
Plain asked whether NASS has found a way to incorporate situations
such as a severe winter in the Lake States with litter size. Kerestes said
that question was looked at about 5 years ago as part of an imputation
study. The study found many hogs raised indoors, so weather did not
have a big impact. The study indicated that records from operators from
across the United States could be used in imputation without much differ-
ence in the quality of the imputation. Size of operation is different. Death
loss is different for large operators. They have more economies of scale.
Kerestes discussed shocks such as flooding. He observed, for example,
that most barns are built on higher ground, so during many floods, the
barns are fine. Initial predictions after a hurricane in North Carolina in
the early 2000s were a huge loss of hogs. In fact, the loss was less than
50,000 head. It is easy to overstate the impact of shocks, he warned, which
is why NASS has to be careful with the data used. Sedransk noted that
even if loss due to flooding is minor at the national level, it might be very
important to a smaller state.

Kerestes commented that if the pork check-off data were accurate

and timely, NASS would not need its surveys. NASS makes extensive use
of administrative data because it saves money and reduces respondent
burden. For example, he said that it mostly uses Agricultural Marketing
Service price data and re-summarizes those data to meet its needs. He
observed that many valuable points were made that afternoon, which will
need to be considered after the workshop.
Slud commented on Sedransk’s idea of banding states. He said that
Pfeffermann and Tiller (2006) made their small area estimations work in
a time series context by grouping states. It is possible that the variabil-
ity of the regional groupings will be reasonable for combining data and
may result in survey data with greater accuracy, he suggested. He also
commented on the pork check-off data within a small area model. The
approach would not assume that the pork check-offs are exactly predic-
tive of state totals, just that the pork check-off fractions by state are rele-
vant. Again, because there are state differences in terms of out-shipments
and in-shipments for feeding and finishing, perhaps appropriate grouping
would be possible to make those shares more relevant.
Andrew Lawson pointed to the term at,i in Figure 10-1. As written,
he said, this term is temporal. However, the model might also include
spatial terms. That is a standard way to include space and time. When
a model includes both space and time, he said, it is also important to
include terms to explain space-time interactions, particularly if there are
shocks. He added that NASS might consider Conditional Autoregressive
(CAR) models for spatial-temporal approaches. CAR models can be fitted
almost instantly with modern computers. As a result, NASS could have a
model with space-time interactions that could be fitted very quickly.
Lee Schulz asked whether the process that transforms the survey
data into state recommendations is consistent across states, model-based,
and uses expert judgment. Kerestes replied 12 regional offices set the
state-level recommendations. Within a regional office, one or two people
establish the state recommendations. They consider the current survey
indications, previous official estimates, and administrative data. They use
a computer tool that displays the current, past, and auxiliary data in charts.
The tool allows them, for example, to redo a chart with certain observa-
tions eliminated: for example, if there were a shock in the state during
some period of time. They examine and evaluate the various data relation-

ships. Some of the regional offices have what they call mini-boards that
assemble two or three other people from the regional office to discuss and
review recommendations for all states in that region. Information is then
sent to headquarters where Seth Riggins as head statistician reviews and
evaluates it before further work is completed.
Katherine Ensor observed there seems to be a lot of uncertainty in
that process. Kerestes said that he thinks that after 150 years, the process
itself is pretty certain. Statistical procedures are standardized across the
NASS regional field offices. The computer tool is standardized, although
there is judgment in the process.
Datta said that the state recommendations can be a covariate in the
model. Young replied that NASS uses state recommendations as input to
many of its models.
Plain commented that NASS works very hard with the national esti-
mates, truing them up to slaughter data. However, if he hears a state is
down by 10 percent, he said he is inclined not to believe it because it may
be that the previous year’s number was too high by 10 percent and the
number was corrected. He said from his perspective, without a way to true
a state number for revisions, he has limited confidence in the number. He
asked if NASS has considered how to get state-level data right.
Kerestes agreed an estimate is only as good as the data that go into it.
For example, if a large operator is a nonrespondent one quarter and the
operator’s data must be estimated, there is a possibility that the resulting
estimate may contain greater error. He said that they try to monitor rela-
tionships from year to year when they do revisions. When the Census of
Agriculture becomes available, 5 years of past data are evaluated and pos-
sibly revised.
Kerestes remarked that when a change to state-level data is due to a
change in a large producer’s practices, NASS cannot explain the changes
to the public because of the confidentiality pledge given to the operator.
He added that NASS listens when industry says that there is something
wrong with their numbers. The agency evaluates the data and may even
follow up with key producers to verify their data. In his view, the national
number is the most important because it is used to establish trade param-
eters. Individual state numbers are very important to the people in those
states. What NASS tries to do, however, is to capture shifts in the hog
industry at the national level.

Slud asked whether information in the Census of Agriculture could

be mined, for example, for information from operators about shipments
of their hogs to other states or other information to make a more com-
plete picture of the hog industry for modeling purposes. Kerestes said the
Hog Survey and the census are conducted differently. The Hog Survey
asks large operators to report for all the locations in which they oper-
ate because it is more efficient and they only need state-level detail. The
census asks all farms, whether they own hogs or contract to raise hogs, to
respond. The main reason is to get county-level data, the real strength of
the census.
Slud asked about adding a question to the census about shipments.
Young replied that the comment period for the 2022 census is still open,
but the challenge is that to add a question, another question must be
removed to keep the form to 24 pages in length. While not impossible, it
would not be easy to add a question about transportation, she said.
Ensor asked how often hog operations come in and go out of busi-
ness during a 5-year period and whether dramatic shifts in the overall
structure of hog production occur from one 5-year period to the next.
Young referred to previous discussion (see Chapter 4) that the industry
has become much more concentrated. Kerestes added that the census
illuminated what has happened to mid-sized producers, and NASS sees
those changes in its quarterly surveys. If a large producer changes (e.g., is
bought, sold, or goes bankrupt, or buys or sells another operation), NASS
tries to incorporate that change immediately in the quarterly survey oper-
ations.
Ensor noted that the models are fit using data under the assumption
that there have been no changes over time. It is important to account for
these changes over time either in the modeling or in the estimation of
parameters. Perhaps parameters should evolve over time, she suggested.

11
Discussion of Visions for the Future
The final session of the workshop was a summary of the views

by committee members and invited discussants about the modeling
approaches they think National Agricultural Statistics Service (NASS)
might consider.
Katherine Ensor said that she favors a space-time state-space evolv-
ing system with (at least) a two-state process to capture various types
of shocks. The challenge is that there is little to no data for estimating
the state during times of shock, so it would be a modeling exercise. She
is not positive that evolving parameters are needed, but the option should
be considered.
Lee Schulz asked about the process of determining whether a
model is good enough and the evolution of that process. It is always pos-
sible to improve a model, he noted. He asked in what situations NASS
would switch to a new model and what would prove to them that the new
model works.
Linda Young reminded the audience that Gavin Corral presented four
model criteria (see Chapter 5). She said that in the end, the model has to
show over time—and it can be on historical data over time—that it does a
good job of predicting the final estimate after revisions. The model should
be closer to the final estimate than the initial estimate is now. The demon-
stration that it meets that goal is it must use data since 2008 and include
challenging situations such as natural disasters and disease. In particular,
she said, NASS would like to predict the final estimate sooner than the
79

Hog Board does now. The board achieves the final estimate through its
revision process better than any current model does.
Schulz asked whether this process would be internal or a competi-
tive bid process where consultants or contractors could be tasked to build
models for a competition. Young replied a competitive process might be
possible, but the data used by any such model are not publicly available.
A public competition would require that NASS simulate the data in some
way. This is difficult, Young said, because the industry is so concentrated
in some states that high-quality simulated data might reveal confidential
information. She welcomed suggestions for solving that problem. She
concluded that the increasing concentration is making large producers
more vulnerable to disclosure, which NASS cannot risk.
Returning to the question of how NASS would adopt a new model,
Young said that the Kalman filter model (KFM) is the best they have now
and is used to provide model-based results to the board. If NASS finds a
better model, it would also be used to provide results to the board. There
would not be an automatic switch-over, she said. Results would be tracked
until the board is comfortable that the new model provides an improve-
ment. It is an evolving process.
Luca Sartore asked about the Pfeffermann example described by
Gauri Datta that used groupings of states. If the model accounts for sev-
eral groups of states, he asked, is an individual state in only one group or
can it be in more than one group? Datta replied the states were grouped
into nine geographic regions, with each state in one region. The exam-
ple used a state-space time series approach to borrow strength, and the
authors introduced benchmarking to make regional predictions to add to
the total. Sartore wondered whether the model could allow each state to
appear in more than one group. Datta said he was not sure about the mod-
eling part, but the benchmarking process would not work. Pfeffermann
benchmarked the regional estimates, not state estimates. Eric Slud noted
that the Pfeffermann model also did not consider multivariate outcomes.
Slud further observed that if NASS feels confident that certain
models are doing well during different periods, such as during equilib-
rium, going into shocks, or coming out of shocks, it would be possible
to consider a composite estimate with weights that change based on
the current situation. He observed NASS seems to have used the same
40 data points many times in reaching conclusion. Young said NASS uses

DISCUSSION OF VISIONS FOR THE FUTURE 81
the information available and has not thought much about a composite
estimator. It has talked about switching and the possibility of developing
an indicator that it is time to switch models. She asked about possible
fresh approaches.
Nell Sedransk noted that using the 40 quarters of data at the state level
might help in defining these transitions because the dynamics and timing
were different in different states. For example, developing a model for
coming out of shocks will be best done using state-level data that exhibits
this transition. Also, based on different producer populations, individual
states each come out of the shock with a somewhat different trajectory.
Chris Wikle reiterated his suggestion that NASS consider integrated
population models that are being used in the ecological literature. They
consider multiple types of data, including survey or sampling-based data
with state-space models. The models include the potential that the bio-
logical dynamics are changing. Ecological modeling frequently considers
changes in habitat, but it could be time. The question is whether NASS
has enough data to inform the model. One advantage of this modeling
approach, he said, is that it results in uncertainty bounds for estimates.
Young asked whether software has been developed for these models.
Wikle said that software exists, but it may not be appropriate for the
NASS application. He offered to follow up after the workshop.1
Dan Kerestes noted that the largest challenge in modeling shocks is
that every shock, and every disease, is different. If it is a new disease, it
is unclear how soon a cure will be identified and what impact the cure
will have. If the modelers use Porcine Epidemic Diarrhea virus (PEDv)
as the example to follow, the next disease may not have the same impact.
When PEDv came about and a vaccine was developed, its effect on the
1Wikle, in collaboration with Mitch Weegman (MUSE School of Natural Resources), provided
the following discussion and references: (1) The best general introductory overview to integrated
population models (IPMs) is Zipkin and Saunders (2018). There is also some introductory and appli-
cations material, see https://academic.oup.com/aosjournals/pages/integrated_population_models. (2)
Chandler and Clark (2014) provides examples of emerging of spatially explicit IPMs. At present, these
are seriously limited in space because of the amount of information required for estimation. (3) Most
applications include time-dependent parameters. The link in (1) above has examples. (4) In terms of
software, there is no R package for these models yet. They are typically run in JAGS (http://mcmc-
jags.sourceforge.net/) or STAN (https://mc-stan.org/). There are cookie-cutter likelihoods to match
particular datasets and research questions (e.g., state-space Cormack-Jolly-Seber (CJS) likelihood for
individual capture histories, but multinomial likelihood for summarized [m-array] capture histories or
multistate models). There is a lot of code online bolting various likelihoods together.

sows being farrowed and the litter rates was dramatic. A new disease
may be different.
Ensor wondered whether NASS is asking too much for the dynamics
of an epidemic to be picked up by a model. Perhaps the decision that there
is a shock might be better guided by expert opinion. The current view is
that people-derived decisions are too subjective, but maybe there is not
enough information to capture all possible dynamics, and expert opinion
may be useful. Lawson said, for example, training the model on PEDv
would likely make it too specific. It should be as general as possible so that
the dynamics for a new disease can be learned. While it is always difficult
to predict something new, there are things to try, such as using a general
descriptive model to capture dynamics and test on different datasets.
Kamina Johnson noted that PEDv was an emerging disease, and
APHIS has established models for foreign animal diseases. It has many
known parameters, perhaps not completely known but with much
less uncertainty. In the emerging disease area, little is known. It may be
even more important to capture uncertainty in these situations—both in
model-based estimates and expert judgment. Experts in her office dis-
cuss alternative scenarios about what the specific disease might be and
its potential impact. They provide a range of possibilities for situations
when there is no scientifically justifiable process to quantify a parameter.
It may be that NASS would also benefit from a mix of data-based esti-
mates and expert judgment, each accounting for uncertainty.
Andrew Lawson said that the foot-and-mouth disease outbreak in
the United Kingdom was highly publicized and modeled in real time
by people at Imperial College. He referred to Lawson and colleagues
(2011) that came out a few years later comparing the modeling efforts
during that outbreak. The predictive capability of the models was incred-
ibly low despite all the information and news, which may be a warning
about the capabilities of modeling.
Lawson also suggested thinking about modeling at different levels
and combining them in joint models. This approach may not be able to
predict very accurately at a fine scale but could predict quite well at a
coarser level. Doing the modeling together might help in making sensi-
ble predictions. With that in mind, estimating national- and state-level
models jointly might be a good idea because it can borrow strength from
the levels, he said.

DISCUSSION OF VISIONS FOR THE FUTURE 83
Nancy Kirkendall said that one advantage of the state-space approach

is that there are two kinds of equations: (1) the state equation that describes
the process and how it changes over time and (2) the observation or mea-
surement equation that describes the relationship of the data to the state.
The ratio between the variance of the state equation and the variance of
the measurement equation determines the model’s adaptability: That is,
how much weight is applied to the previous state versus how much weight
is applied to the new observation. One of the reasons why the KFM is so
stable is likely because the variance of the state equation is small relative
to the survey error, she commented. When there is a shock such as a dis-
ease, the state is changed because of the disease and the variance of the
state equation will likely become larger. She suggested using this concept
to further evaluate the KFM, to whether increasing the variance of the
state equation would make the model more adaptable during shocks.
Kirkendall agreed with others that identifying the start time of a
disease shock is likely to be difficult and expert judgment may be needed.
Expert judgment will be best if the experts have the necessary informa-
tion. Young replied that she thinks that NASS has explored the variance
of the state equation, and in equilibrium it is small. Kirkendall suggested
finding the right value to use that it performs well when the model needs
to be adaptable. This would essentially provide two model-based esti-
mates: one from the equilibrium model and one from the adaptable model.
Lawson added that when there is switching, very often epidemic
models switch from being descriptive with a particular variance to some-
thing that is dependent on previous values, so the epidemic component is
highly auto-correlated. With switching there will be a variance change,
but other features of the model change as well.
Slud observed that even though shocks may result in changes to the
optimal model, an acceptable solution might be derived as simply increas-
ing the variance of the state equation in the KFM. To capture some of
Lawson’s thoughts, Slud suggested an alternative might be to look at how
the time series parameter estimates vary when the KFM was estimated
during the epidemic and recovery periods of the PEDv epidemic when the
model should be most adaptable. It may be worth trying some of these
simple things to see whether they help the KFM adapt to a shock.
Sedransk observed that the PEDv epidemic started in nine states and
expanded to virtually all states; thus, Slud’s comment about a composite

makes sense because the epidemic does not happen everywhere at the
same time. She suggested the need for a dynamic piece instead of jump-
ing to an entirely new universal model that may only apply when all states
are affected, or may never apply if states are in different epidemic phases
or transitions at each point in time. Slud commented that a composite esti-
mator is not the model class that he most favors, but he does like the idea
of models that apply at different points in time (e.g., equilibrium, start of
disease, recovery). A composite based on them might be useful, he added,
and it might be interesting to examine the residuals at the state level to see
whether they suggest model deficiencies.
Ensor concluded by acknowledging how impressive the NASS
presentations and modeling work have been. She noted the job of the
committee and discussants was to come up with ideas for other approaches.

References
Busselberg, S. (2013). The use of signal filtering for hog inventory estima-
tion. Proceedings of the Federal Committee on Statistical Method-
ology (FCSM) Research Conference. Available: https://nces.ed.gov/
FCSM/pdf/G2_Busselberg_2013FCSM_AC.pdf.
Chandler, R.B., and Clark, J.D. (2014). Spatially explicit integrated popu-
lation models. Methods in Ecology and Evolution, 5, 1351–1360.
Corberan-Vallet, A. (2012). Prospective surveillance of multivariate spa-
tial disease data. Statistical Methods in Medical Research, 21(5),
457–477. doi: 10.1177/0962280212446319.
Corberan-Vallet, A., and Lawson, A.B. (2014). Prospective analysis
of infectious disease surveillance data using syndromic informa-
tion. Statistical Methods in Medical Research, 23(6), 572–594. doi:
10.1177/0962280214527385.
Datta, G.S., Ghosh, M., Steorts, R., and Maples, J. (2011). Bayesian
benchmarking with applications to small area estimation. TEST, 20,
574–588.
Fay, R.E., and Herriot, R.A. (1979). Estimates of income for small places:
An application of James-Stein procedures to census data. Journal of
the American Statistical Association, 74, 269–277.
85

Ghosh, M., Nangia, N., and Kim, D.H. (1996). Estimation of median
income of four-person families: A Bayesian time series approach.
Journal of the American Statistical Association, 91, 1423–1431.
Held, L., Hofmann, M., and Hohle, M. (2006). A two-component
model for counts of infectious diseases. Biostatistics, 7(3), 422–437.
doi:10.1093.
Kedem, B., and Pan, L. (2015). Time Series Prediction of Hog Inventory.
Unpublished internal document. U.S. Department of Agriculture
National Agricultural Statistics Service.
Lawson, A.B., Onicescu, G., and Ellerbe, C. (2011). Foot-and-mouth
disease revisited: Re-analysis using Bayesian spatial susceptible-
infectious-removed models. Spatial and Spatio-temporal Epidemiol-
ogy, 2, 185–194.
Pfeffermann, D., and Tiller, R. (2006). Small-area estimation with state-
space models subject to benchmark constraints. Journal of the Ameri-
can Statistical Association, 101, 1387–1397.
Rao, J.N.K., and Molina, I. (2015). Small Area Estimation (second ed.)
Hoboken, NJ: John Wiley & Sons.
Wang, X., Shojaie, A., and Zou, J. (2019). Bayesian hidden Markov models
for dependent large-scale multiple testing. Computational Statistics &
Data Analysis, 136, 123–136. doi: https://doi.org/10.1016/j.csda.
Zipkin, E.F., and Saunders, S.P. (2018). Synthesizing multiple data types
for biological conservation using integrated population models. Bio-
logical Conservation, 217, 240–250.

Appendix A
Agenda and List of Participants
AGENDA: USING MODELS TO ESTIMATE HOG

PRODUCTION: A CNSTAT WORKSHOP
WEDNESDAY, MAY 15, 2019
National Academy of Sciences Building, Members’ Room

2101 Constitution Avenue NW, Washington, DC 20418
The USDA’s National Agricultural Statistics Service (NASS) has

developed hog models that combine information from surveys, admin-
istrative data, and expert opinion to produce the requisite estimates
with the goal of replacing current methods that lack transparency and
measures of uncertainty. The purpose of the workshop is to discuss the
appropriateness of these models and possible improvements or alterna-
tive approaches. Issues for discussion at the workshop include the appro-
priateness of the modeling techniques developed by NASS, the extent to
which the modelʼs assumptions can be validated, the robustness of the
estimates to a failure of one or more assumptions, and other technical
issues of model specification. In addition, the workshop will consider the
suitability of the data that feed into the model and the properties of desir-
able estimates of uncertainty.
87

8:30 - 9:00 WELCOME AND INTRODUCTIONS

8:30 Eric Slud, planning committee chair and
workshop moderator
8:40 Brian Harris-Kojetin, director of CNSTAT
8:50 Linda Young, chief mathematical statistician and direc-
tor of research and development, NASS
9:00 - 9:40 SESSION 1 Motivation and Overview

Moderator Kathy Ensor
9:00 Nell Sedransk, NISS, Overview and Introduction to Hog
Inventory Models
9:30 Discussion
9:40 - 10:05 SESSION 2 Survey Processes, Data Sources

Moderator Ron Plain
9:40 Emilola Abayomi, NASS, The Hog Inventory Survey
9:55 Discussion
10:05 - 10:30 SESSION 3 Setting Official Estimates: The Hog

Board Moderator Chris Wikle
10:05 Seth Riggins, NASS, Quarterly Hogs and Pigs Report
(via webcast)
10:20 Discussion
10:30 - 10:40 BREAK
10:40 - 11:20 SESSION 4 Modeling Efforts

Moderator Lee Schulz
10:40 Gavin Corral, NASS, Statistical Modeling Efforts
11:10 Discussion
11:20 - 11:45 SESSION 5 Web-Scraping Efforts

Moderator Kathy Ensor
11:20 Yijun Wei, NASS, Web-Scraping Efforts to
Identify Disease
11:35 Discussion

APPENDIX A 89
11:45 - 12:45 LUNCH
12:45 - 1:45 SESSION 6 Model Innovations

Moderator Ron Plain
12:45
Luca Sartore, NASS, Modeling Swine
Population Dynamics
1:15 Discussion
2:00 - 2:45 SESSION 7 Discussion: Detection and Monitoring

of Outbreaks
Moderator Eric Slud
2:00 Matthew Branan and Kamina Johnson, APHIS Monitor-
ing, Surveillance, and Modeling for Swine Diseases.
2:20 Panel Discussion (Lee Schulz, Andrew Lawson,
Michael Schweinberger)
2:35 Discussion
2:45 - 3:00 BREAK
3:00 - 3:45 SESSION 8 Discussion: Current Modeling Efforts:

State-Space Models, Models of Biological Processes
Moderator Eric Slud
3:00 Panel Discussion (Kathy Ensor, Chris Wikle)
3:20 Discussion
3:45 - 4:30 SESSION 9 Discussion: State-level Models and Data

Sources
Moderator Chris Wikle
3:45 Panel Discussion (Eric Slud, Gauri Datta, Ron Plain)
4:05 Discussion
4:30 - 5:00 SESSION 10 Discussion: Visions for the Future

Moderator Eric Slud
4:30 Open Discussion
5:00 Adjournment

PLANNING COMMITTEE FOR THE WORKSHOP ON USING

MODELS TO ESTIMATE HOG PRODUCTION
Eric V. Slud (Chair), U.S. Census Bureau and University of Maryland,
College Park
Katherine Bennett Ensor, Rice University
Ronald L. Plain, University of Missouri–Columbia (emeritus)
Lee L. Schulz, Iowa State University
Christopher K. Wikle, University of Missouri
STAFF
Nancy J. Kirkendall, Study Director
Anthony Mann, Program Associate
Brian Harris-Kojetin, Director, Committee on National Statistics
INVITED DISCUSSANTS
Matthew Branan, Animal and Plant Health Inspection Service, USDA
Gauri Datta, U.S. Census Bureau
Kamina Johnson, Animal and Plant Health Inspection Service, USDA
Andrew Lawson, Medical University of South Carolina
Michael Schweinberger, Rice University
NASS PRESENTERS
Emilola Abayomi, Research and Development Division, NASS, USDA
Gavin Corral, Research and Development Division, NASS, USDA
Seth Riggins, Statistics Division, NASS, USDA
Luca Sartore, National Institute of Statistical Science and Research and
Development Division, NASS, USDA
Nell Sedransk, National Institute of Statistical Science and Research and
Development Division, NASS, USDA
Yijun (Frank) Wei, National Institute of Statistical Science and Research
and Development Division, NASS, USDA
Linda Young, Director of Research, NASS, USDA

APPENDIX A 91
ATTENDEES
Travis Averill, Livestock Branch, NASS, USDA
Jeff Bailey, Branch Chief, Methodology Division, NASS, USDA
Valbona Bejleri, Research and Development Division, NASS, USDA
Lu Chen, Research and Development Division, NASS, USDA
Cynthia Clark, COPAFS; former Administrator of NASS, USDA
Nathan Cruze, Research and Development Division, NASS, USDA
Lindsay Drunasky, Methodology Division, NASS, USDA
Lori Harper, Methodology Division, NASS, USDA
Dan Kerestes, Director of Statistics Division, NASS, USDA
Kay Turner, Research and Development Division, NASS, USDA


Appendix B
Biographical Sketches of Planning Committee
Members and Speakers
PLANNING COMMITTEE MEMBERS

ERIC V. SLUD (Chair) is professor in the statistics program of the
Department of Mathematics at the University of Maryland, College
Park, and area chief for mathematical statistics in the Center for Statisti-
cal Research and Methodology at the U.S. Census Bureau. His areas of
expertise include small area estimation via the Fay-Herriot model (espe-
cially the Census Bureau’s Small Area Income and Poverty Estimates
model); demographic modeling of nonresponse to national surveys with
particular application to weighting adjustment and small area estimation;
and large-scale data problems. He has an A.B. in mathematics from Har-
vard University and a Ph.D. in mathematics from the Massachusetts Insti-
tute of Technology.
KATHERINE BENNETT ENSOR is Noah G. Harding professor of sta-

tistics in the George R. Brown School of Engineering at Rice University,
where she also serves as director of the Center for Computational Finance
and Economic Systems. Currently, she also oversees the development
of the Kinder Institute Urban Data Platform, a resource for the greater
93

Houston area. Her research interests include methods for dependent data
including time series/spatial and spatial-temporal; unique applications of
Bayesian hierarchical modeling and approximate Bayesian computation;
and stochastic process modeling and information integration. She is a
fellow of the American Statistical Association and the American Asso-
ciation for the Advancement of Science and has been recognized for her
leadership, scholarship, and mentoring. She serves as vice president of
the American Statistical Association and as a member of the National
Academies Committee on Applied and Theoretical Statistics. She has a
B.S.E. and an M.S. in mathematics from Arkansas State University and a
Ph.D. in statistics from Texas A&M University.
RONALD L. PLAIN is professor emeritus in the Department of Agri-

cultural and Applied Economics at the University of Missouri–Columbia.
Prior to retiring, he served as D. Howard Doane professor and extension
economist. His areas of expertise include livestock marketing, farm busi-
ness management, and swine production. He is a frequent contributor to
the National Hog Farmer and has received its Master of the Pork Industry
Award. He has made more than 2,100 presentations to farm audiences
and authored more than 500 published materials. He has served as presi-
dent of the Extension Section of the American Agricultural Economics
Association and has had agricultural experience in 16 foreign countries.
Awards include the Governorʼs Award for Quality and Productivity, Out-
standing State Extension Specialist by the College of Agriculture, and
other honors. He served as first director of the Agricultural Leaders of
Tomorrow Program in Missouri and was selected as agricultural leader
of the year in 1999. He has a B.S. and an M.A. in agricultural education
from the University of Missouri and a Ph.D. in agricultural economics
from Oklahoma State University.
LEE L. SCHULZ is associate professor in the Department of Economics

at Iowa State University and serves as the statewide specialist on livestock
economics and markets. His integrated extension, research, and teaching
program provides leadership in the study of, and educational programming
for, critical problems facing the livestock and meat industry, including
marketing and risk management, agricultural and trade policies, animal
health and biosecurity, and production, management, and regulatory

APPENDIX B 95
issues. He has published in professional journals, extension publications,

and the popular press and spoken at numerous professional and agricultural
conferences. He has been recognized by the Agricultural & Applied
Economics Association and by Iowa State University for early achieve-
ment in extension and outreach programming, and he received premier
forecaster awards from the Agricultural & Applied Economics Associa-
tion Extension Section. He has a B.S. in agricultural business from the
University of Wisconsin–River Falls, an M.S. in agricultural economics
from Michigan State University, and a Ph.D. in economics from Kansas
State University.
CHRISTOPHER K. WIKLE is Curators’ distinguished professor and

department chair in the Statistics Department at the University of
Missouri. He also serves as professor in the Truman School of Public
Affairs at the University of Missouri. His research interests include
spatio-temporal models, dynamical models, Bayesian hierarchical meth-
ods, and environmental and ecological statistics. He serves as fellow of
the American Statistical Association and was elected fellow of the Inter-
national Statistical Institute in 2018. He also serves in an editorial capacity
for numerous journals including Science, Journal of Time Series Analysis,
and Statistica Sinica. He has authored and/or coauthored numerous
articles on topics including Bayesian models and spatio-temporal statisti-
cal models. He has a B.S. and an M.S. in atmospheric science from the
University of Kansas, an M.S. degree in statistics from Iowa State Univer-
sity, and a Ph.D. in statistics/meteorology from Iowa State University.
INVITED DISCUSSANTS
MATTHEW BRANAN is a mathematical statistician working for the
USDA-Animal and Plant Health Inspection Service-Veterinary Ser-
vices in the National Animal Health Monitoring System. He is currently
involved in all stages of implementing national studies related to animal
health in a variety of industries including aquaculture, swine, cattle,
sheep, goat, beef cow-calf, and dairy cattle, with particular focus on the
design of the studies and the analysis of study results.

GAURI S. DATTA is professor of statistics at the University of Georgia.

He is also a part-time mathematical statistician at the U.S. Census Bureau.
His research interests include Bayesian statistics, small area estimation,
survey sampling, and statistical syndromic surveillance. He is a fellow
of the American Statistical Association and the Institute of Mathemati-
cal Statistics. He received his Ph.D. in statistics from the University
of Florida.
KAMINA K. JOHNSON is an agricultural economist who has worked

for the USDA-Animal and Plant Health Inspection Service-Veterinary
Services for 15 years. She has used modeling methods to estimate the cost-
effectiveness of surveillance, response cost for diseases, economic impact
of disease introduction and prevalence reduction, export market trade
recovery, and measuring profitability differences in using technology.
ANDREW LAWSON is professor of biostatistics in the Division of

Biostatistics and Bioinformatics, Department of Public Health Sciences,
College of Medicine, Medical University of South Carolina (MUSC) and
MUSC distinguished professor and American Statistical Association
fellow. He was previously professor of biostatistics in the Department
of Epidemiology & Biostatistics, University of South Carolina. He has
published more than 175 journal papers on spatial epidemiology, spatial
statistics, and related areas. He is also the author of 10 book chapters and
books in areas related to spatial epidemiology and health surveillance. He
serves as associate editor on a variety of journals and is founding editor
of Spatial and Spatio-temporal Epidemiology. He has delivered many
short courses on Bayesian disease mapping with OpenBUGS and INLA,
spatial epidemiology and disease clustering and surveillance. He has a
Ph.D. in spatial statistics from the University of St. Andrews.
MICHAEL SCHWEINBERGER is assistant professor in the Depart-

ment of Statistics at Rice University. His research focuses on statistical
analysis of complex, dependent, and high-dimensional data, especially
network data. He has published papers concerning the Internet and
social networks and applications in public health (e.g., epidemics),
national security (e.g., insurgencies, terrorist networks), economics (e.g.,

APPENDIX B 97
financial markets), sociology (e.g., criminal networks), and engineering

(e.g., power networks). He has a Ph.D. in statistics from the University of
Groningen, in the Netherlands.
NASS PRESENTERS
EMILOLA J. ABAYOMI is a mathematical statistician at USDA’s
National Agricultural Statistics Service. Her current research at the agency
focuses on elements of respondent burden and impacts of data quality.
She has an M.S. and a Ph.D. in biostatistics from Florida State University.
GAVIN CORRAL is a mathematical statistician at USDA’s National

Agricultural Statistics Service. Since 2016, he has been in charge of run-
ning the current hog inventory model and delivering those estimates to
the Livestock Branch. His research areas include propensity modeling
for nonresponse and out-of-business records and using machine learning
algorithms as a tool for survey cost reduction. He has an M.S. in statistics
and a Ph.D. in forest biometrics from Virginia Tech University.
DAN KERESTES is director of the Statistics Division at USDA’s

National Agricultural Statistics Service. He is responsible for the agricul-
tural statistics used by the Agricultural Statistics Board in establishing
estimates and forecasts of the nation’s agriculture; evaluating commodity
statistics; determining needs and implementing proper statistical plans in
support of the crop and livestock programs; and ensuring that appropriate
methods and procedures are used in all phases of the agency’s statisti-
cal program. He has a B.S. in resource economics from the University
of Minnesota and an M.S. in agricultural economics from North Dakota
State University.
SETH RIGGINS has worked for USDA’s National Agricultural Sta-

tistics Service since 2002. He has administered national surveys for
the Agricultural Chemical Use survey and the Post-Harvest Chemi-
cal Use survey; run the sampling and summarization programs for the
Puerto Rico field office; and coordinated the livestock data analyses for
the 2007 Census of Agriculture and the 2012 Census of Agriculture.
He has been involved with the Hogs and Pigs Program since 2014. In

December 2016, he became the statistician leading the national Hogs and
Pigs Program. He holds an M.S. in agricultural economics from the Uni-
versity of Kentucky.
LUCA SARTORE is a research statistician at the National Institute of

Statistical Sciences (NISS) and USDA’s National Agricultural Statisti-
cal Service. Prior to NISS, he was a research fellow for the European
Centre for Living Technology. His research primarily focuses on nonstan-
dard regression techniques and spatio-temporal models. He has a B.S. in
statistics and computer science for management, an M.S. in statistics for
business from the Caʼ Foscari University of Venice, and a Ph.D. in statis-
tical sciences from the University of Padua.
NELL SEDRANSK is director of the National Institute of Statistical

Sciences. She previously held the post of chief, Statistical Engineering
Division, National Institute of Standards and Technology. Earlier, she was
a professor at Case Western Reserve University and the State Univer-
sity of New York. Her research in statistical theory and methodology has
focused on design of complex experiments, Bayesian inference, spatial
statistics, and topological foundations for statistical theory. Her collabora-
tions are diverse, with joint research in engineering, education, medicine,
and agriculture. She has served on multiple task forces for the American
Statistical Association (ASA) and on the Scientific Advisory Commit-
tee to the Canadian Statistical Sciences Institute. She is an elected fellow
of the American Association for the Advancement of Science, Institute
of Mathematical Statistics, and ASA. She has a Ph.D. in statistics from
Iowa State University.
YIJUN WEI is a research statistician at the National Institute of Statis-

tical Sciences and USDA’s National Agricultural Statistics Service. His
research interests are machine learning and deep learning, including nat-
ural language processing, computer vision, and deep reinforcement learn-
ing, especially for solving practical problems. He is a Ph.D. candidate at
George Mason University with a focus on artificial intelligence and he is
expected to graduate in 2019.

APPENDIX B 99
LINDA J. YOUNG is chief mathematical statistician and director of

research and development of USDA’s National Agricultural Statistics Ser-
vice. She oversees efforts to improve the methodology underpinning the
agency’s collection and dissemination of data on every facet of U.S. agri-
culture. She has served on the faculties of Oklahoma State University,
University of Nebraska, and University of Florida. Her recent research
has focused on the use of web scraping and capture-recapture methods in
surveys and on linking disparate datasets and the subsequent analysis of
these data using spatial statistical methods. She has been the editor of the
Journal of Agricultural, Biological and Environmental Statistics. She has
served in a broad range of offices within professional statistical societ-
ies, including as president of the Eastern North American Region of the
International Biometric Society, vice president of the American Statisti-
cal Association, and chair of the Committee of Presidents of Statistical
Societies, among others. She is a fellow of the American Statistical Asso-
ciation and the American Association for the Advancement of Science
and an elected member of the International Statistical Institute. She has a
Ph.D. in statistics from Oklahoma State University.

COMMITTEE ON NATIONAL STATISTICS

The Committee on National Statistics was established in 1972 at the
National Academies of Sciences, Engineering, and Medicine to improve
the statistical methods and information on which public policy decisions
are based. The committee carries out studies, workshops, and other activ-
ities to foster better measures and fuller understanding of the economy,
the environment, public health, crime, education, immigration, poverty,
welfare, and other public policy issues. It also evaluates ongoing statis-
tical programs and tracks the statistical policy and coordinating activi-
ties of the federal government, serving a unique role at the intersection
of statistics and public policy. The committee’s work is supported by a
consortium of federal agencies through a National Science Foundation
grant, a National Agricultural Statistics Service cooperative agreement,
and several individual contracts.

01 Using Models To Estimate Hog and Pig Inventories Proceedings of A Workshop - 25526

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

01 Using Models To Estimate Hog and Pig Inventories Proceedings of A Workshop - 25526

Uploaded by

Copyright:

Available Formats

THE NATIONAL ACADEMIES PRESS

This PDF is available at http://nap.edu/25526 SHARE

Using Models to Estimate Hog and Pig Inventories:

110 pages | 6 x 9 | PAPERBACK

National Academies of Sciences, Engineering, and Medicine 2019. Using Models to

– Access to free PDF downloads of thousands of scientiﬁc reports

Copyright © National Academy of Sciences. All rights reserved.

Nancy J. Kirkendall, Rapporteur

Committee on National Statistics

Division of Behavioral and Social Sciences and Education

Copyright National Academy of Sciences. All rights reserved.

THE NATIONAL ACADEMIES PRESS 500 Fifth Street, NW Washington, DC 20001

International Standard Book Number-13: 978-0-309-49572-1

Copyright 2019 by the National Academy of Sciences. All rights reserved.

Printed in the United States of America

Suggested citation: National Academies of Sciences, Engineering, and Medicine.

Copyright National Academy of Sciences. All rights reserved.

The National Academy of Sciences was established in 1863 by an Act of Con-

The National Academy of Medicine (formerly the Institute of Medicine) was

Copyright National Academy of Sciences. All rights reserved.

Consensus Study Reports published by the National Academies of Sciences, En-

Proceedings published by the National Academies of Sciences, Engineering, and

Copyright National Academy of Sciences. All rights reserved.

PLANNING COMMITTEE FOR THE WORKSHOP ON

ERIC V. SLUD (Chair), U.S. Census Bureau and University of Maryland,

NANCY J. KIRKENDALL, Project Director

Copyright National Academy of Sciences. All rights reserved.

COMMITTEE ON NATIONAL STATISTICS

ROBERT M. GROVES (Chair), Georgetown University

BRIAN HARRIS-KOJETIN, Director

Copyright National Academy of Sciences. All rights reserved.

This Proceedings of a Workshop has been reviewed in draft form by

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

In 2014, the National Agricultural Statistics Service (NASS) engaged

PLANNING THE WORKSHOP

Copyright National Academy of Sciences. All rights reserved.

2 USING MODELS TO ESTIMATE HOG AND PIG INVENTORIES

THE WORKSHOP AND STRUCTURE

Copyright National Academy of Sciences. All rights reserved.

Copyright National Academy of Sciences. All rights reserved.

4 USING MODELS TO ESTIMATE HOG AND PIG INVENTORIES

NASS modeling efforts. Chapter 6 describes NASS web-scraping efforts.

Copyright National Academy of Sciences. All rights reserved.

In the first technical session, Nell Sedransk, director of the National

OVERVIEW OF HOG MODELS

Copyright National Academy of Sciences. All rights reserved.

6 USING MODELS TO ESTIMATE HOG AND PIG INVENTORIES

MODEL COMPONENTS AND AVAILABLE DATA

Copyright National Academy of Sciences. All rights reserved.

MOTIVATION AND CHALLENGES 7

decreasing in spring months. The seasonality is more pronounced in the

Copyright National Academy of Sciences. All rights reserved.

8 USING MODELS TO ESTIMATE HOG AND PIG INVENTORIES

Copyright National Academy of Sciences. All rights reserved.

MOTIVATION AND CHALLENGES 9

Copyright National Academy of Sciences. All rights reserved.

10 USING MODELS TO ESTIMATE HOG AND PIG INVENTORIES

Copyright National Academy of Sciences. All rights reserved.

MOTIVATION AND CHALLENGES 11

according to a prespecified schedule. From the day sampling starts until