Professional Documents
Culture Documents
DETAILS
CONTRIBUTORS
GET THIS BOOK Nancy J. Kirkendall, Rapporteur; Committee on National Statistics; Division of
Behavioral and Social Sciences and Education; National Academies of Sciences,
Engineering, and Medicine
FIND RELATED TITLES
SUGGESTED CITATION
Visit the National Academies Press at NAP.edu and login or register to get:
Distribution, posting, or copying of this PDF is strictly prohibited without written permission of the National Academies Press.
(Request Permission) Unless otherwise indicated, all materials in this PDF are copyrighted by the National Academy of Sciences.
This activity was supported by contracts between the National Academy of Scienc-
es and the National Agricultural Statistics Service (Agreement 58-3AEU-4-005, as
amended). Any opinions, findings, conclusions, or recommendations expressed in this
publication do not necessarily reflect the views of any organization or agency that
provided support for the project.
Additional copies of this publication are available from the National Academies Press,
500 Fifth Street, NW, Keck 360, Washington, DC 20001; (800) 624-6242 or (202) 334-
3313; http://www.nap.edu.
The National Academy of Engineering was established in 1964 under the charter
of the National Academy of Sciences to bring the practices of engineering to ad-
vising the nation. Members are elected by their peers for extraordinary contribu-
tions to engineering. Dr. John L. Anderson is president.
The three Academies work together as the National Academies of Sciences, En-
gineering, and Medicine to provide independent, objective analysis and advice
to the nation and conduct other activities to solve complex problems and inform
public policy decisions. The National Academies also encourage education and
research, recognize outstanding contributions to knowledge, and increase public
understanding in matters of science, engineering, and medicine.
Learn more about the National Academies of Sciences, Engineering, and Medi-
cine at www.nationalacademies.org.
For information about other products and activities of the National Academies,
please visit www.nationalacademies.org/about/whatwedo.
vi
Acknowledgments
vii
Contents
1 Introduction 1
2 Motivation and Challenges 5
3 The Quarterly Hog Inventory Survey 13
4 Setting Official Estimates: The Hog Board 19
5 Modeling Efforts 27
6 Web-Scraping Efforts 37
7 Modeling Swine Population Dynamics 43
8 Discussion of Detection and Monitoring 53
9 Discussion of Modeling 63
10 Discussion of State-Level Estimation 69
11 Discussion of Visions for the Future 79
References 85
Appendixes
A Agenda and List of Participants 87
B Biographical Sketches of Steering Committee
Members and Speakers 93
ix
1
Introduction
BOX 1-1
Statement of Task
The National Research Council’s Committee on National Statis-
tics (CNSTAT) will convene a steering committee to organize a public
workshop for the National Agricultural Statistics Service (NASS) on
model-based methods for producing estimates of hogs with measures of
uncertainty. Hog models being developed at NASS combine informa-
tion from surveys, administrative data, and expert opinion to produce
the requisite estimates with the goal of replacing current methods that
lack transparency and lack measures of uncertainty. The purpose of the
workshop is to provide feedback on the appropriateness of these models
and to suggest improvements or possible alternative approaches. Issues
for the workshop to consider include the appropriateness of the modeling
techniques developed by NASS, the extent to which model assumptions
can be validated, the robustness of the estimates to failure of one or more
assumptions, and other technical issues of model specification. In addi-
tion, the workshop will consider the suitability of the data that feed into
the model and the properties desired in the estimates of uncertainty.
farmers and other auxiliary data sources; (3) the functioning of the Agri-
cultural Statistics Board; (4) the models that have been developed and
pursued; (5) current web-scraping efforts; (6) current modeling efforts;
(7) discussions on detection and monitoring of disease; (8) discussions
of NASS modeling efforts; (9) discussions of extending the models to
provide state-level estimates; and (10) concluding thoughts about future
directions. The planning committee conducted all of its preparatory work
by email and teleconference.
The Workshop on Using Models to Estimate Hog Production took
place at the National Academies of Sciences, Engineering, and Medicine
in Washington, DC, May 15, 2019. The audience included present and
past NASS leadership and technical staff.
INTRODUCTION 3
data sources and the technical basis for models developed by NASS for
estimation of national-level hog inventories and to discuss additional
data sources, model improvements, and extensions such as responding
better to shocks and developing state-level models. The topics, he noted,
call in a variety of statistical and agricultural knowledge well represented
by the participants.
Brian Harris-Kojetin, CNSTAT director, highlighted Principles and
Practices for a Federal Statistical Agency, the flagship CNSTAT publi-
cation.1 CNSTAT was pleased to work with NASS on the organization of
the workshop, he said, reminding the audience that it was not a consensus
study, but a workshop in which individual participants present their ideas
to NASS.
Linda Young, NASS director of research and development, said the
agency thought it had a good model 5 years ago. However, the model per-
formed well during times of equilibrium, not during shocks such as dis-
ease or flooding, which is when the NASS Agricultural Statistics Board
could most use the help of a good model. She expressed hope that the
workshop would provide NASS with insights to help solve the problem
of identifying shocks, monitoring data, and using models to combine
data sources to publish high-quality estimates with standard errors, even
during times of shock. She said that NASS wants to identify the presence
of a shock in real time and determine its impact on the processes NASS
is monitoring without having to wait for final definitive data to become
available. She said that she was looking forward to the help the commit-
tee and invited participants could provide to solve this difficult problem.
After the welcoming remarks and introductions, six sessions provided
background to the problem, describing the data, estimation, and model-
ing that has been done by NASS, with each presentation followed by a
question-and-answer period. They were followed by four discussion
sessions (see Appendix A for the agenda and a list of participants). The
remainder of this proceedings reflects the six NASS presentation ses-
sions and the concluding four discussion sessions. Chapter 2 provides
background and challenges for the effort. Chapter 3 describes survey
processes and data sources. Chapter 4 describes the Hog Board and its
role in setting official estimates. Chapter 5 describes previous and current
1National Academies of Sciences, Engineering, and Medicine. (2017). Principles and Practices
for a Federal Statistical Agency: Sixth Edition. Washington, DC: The National Academies Press.
doi: https://doi.org/10.17226/24810.
2
Motivation and Challenges
birth), pig crop, market hogs by four weight groups, and breeding stock.
A key derived variable is litter rate (pig crop divided by sows farrowed).
Sedransk noted that before about 2008, NASS did not have a model.
At that time, the survey estimates and state-level recommendations
provided by the NASS state offices were reviewed and evaluated by an
expert group (now called the pre-board) within NASS, revised if neces-
sary, and provided to the Hog Board, one of the Agricultural Statistics
Boards (ASB). Using survey estimates, state-level recommendations,
revised estimates from the expert group, and auxiliary information, the
ASB made final determinations and set the official NASS estimates for
U.S. totals and then the state-level estimates for selected states (for more
information on the ASB, see Chapter 4). State estimates were constrained
to add to the determined U.S. totals.
Since modeled results have become available, this process has
adapted but is largely unchanged. Present-quarter survey data are a key
input to the model. Model-based estimates are first provided to the pre-
board along with survey estimates and state recommendations. Based
on any revisions from the pre-board, the model may be rerun to obtain
model-based estimates for presentation to the Hog Board. The timing of
this sequence is such that run time for the model can be no more than 2
or 3 hours.
Sedransk described the value that could be added to the process by
good models. First, a model would provide statistical estimates with stan-
dard errors. Estimates and standard errors would be based on survey data,
as well as reflect the internal relationships in the data between and within
quarters. It could reflect long-term trends and seasonal patterns, localized
geographic events or differences, and composition of inventory based
on biology (breeding, growth, death, market). Models may be predic-
tive, and differences between the data and the prediction might help to
identify change points. A model might help to allocate national numbers
to state numbers.
defined and stable, though there are some slight seasonal differences in
the growth rates by region. The system is so stable it leads analysts to rely
heavily on past data to evaluate reasonableness of survey results. This
process works well in times of equilibrium, she said.
DISEQUILIBRIUM
Sedransk explained some of the challenges with modeling and
accounting for shocks are that the impact depends on the event (disease
may kill pigs of a specific age group, may result in smaller litters, or may
result in culling to prevent the spread of disease), as well as an opera-
tion’s response that can be localized and dynamic. Natural disasters tend
to have localized impacts, slaughterhouse capacity limitations may be
regional, and market forces can be national, regional, or localized.
Sedransk showed a number of charts illustrating the impact of the
Porcine Epidemic Diarrhea virus (PEDv) that began in 2013. The first
set of charts (not pictured here) showed time series for U.S. Total Hogs,
Iowa Total Hogs, and Colorado Total Hogs from March 2012 through
November 2013 to illustrate the start of the epidemic. The second set (see
Figure 2-1) showed the same time series but from March 2012 through
November 2015 to illustrate both the epidemic and recovery. Looking at
the data through November 2013, the U.S. Total and Iowa Totals showed
virtually no impact of the disease. Sedransk noted that Iowa is dominated
by very large operators, which also dominate the U.S. total. In contrast,
Colorado showed a clear drop in inventories. The Colorado plot might
have been useful as an early indication of PEDv. However, the decline
in Colorado was not enough to show up in national totals because the U.S.
Total is not responsive to what happens to small states or small opera-
tions. Figure 2-1 illustrates both the initiation of PEDv and the recovery.
The U.S. level illustrates the decline and a relatively quick recovery. Like
the U.S. Total, the decline in Iowa was profound, but the recovery was
quite quick because of large operators’ ability to recover more quickly.
In contrast, the downturn in Colorado persisted longer. This illustrates
the differences among states, large and small operators, and the spatial
component of the modeling challenge, she said.
She noted that emerging disequilibriums are challenging to detect.
They likely have a spatial component; they can be localized and may or
may not be reflected in national totals. They can be spatially dynamic.
FIGURE 2-1 Total hog inventory, United States, Iowa, and Colorado,
March 2012–November 2015.
SOURCE: Prepared by Nell Sedransk for presentation at the workshop.
They may impact large and small operators differently. The approaches
NASS has pursued to confirm or detect a shock include data diagnostics
(using existing data) and web scraping.
MODEL DECISIONS
Sedransk observed that a number of approaches to modeling can
work, and NASS needs to decide on one approach. One of the fundamen-
tal decisions is whether to pursue one comprehensive model or several
models that may be linked, switched, or compounded. If there is an equi-
librium model, diagnostics are needed to help identify departures from
equilibrium. Another question is whether the model should be top down,
starting at the national level and partitioning down to the state level, or
bottom up, starting at the operator level or state level and aggregating up
to the national level. Good data are key to a successful modeling effort,
she stressed. Data are needed at the appropriate level of detail that reflect
important aspects of the process. One question is how to incorporate new
types of data, such as spatial imaging. Sedransk noted that imputation for
nonresponse of large operators is important. If based solely on past data,
the imputation may damp out the impact of a shock, making the shock
more difficult to detect.
She noted some technical issues with modeling to account for shocks,
such as how to make inferences for nonsampled operations. If an opera-
tion has an outbreak of disease, it is not necessarily true that all other
operations in the state are at risk, but those closer to the affected farm are
at greater risk. One question is how to account for this spatial component
of disease spread. Another question has to do with estimating uncertainty
for hybrid models with mixed components. Errors can be due to model fit,
model specification, or sampling variability. She reminded the audience
that computations must be done within 2 or 3 hours, a point expanded
upon during the discussion.
DISCUSSION
Ensor started the discussion by asking about the computing-time
constraint of 2 to 3 hours, given the amount of data to assimilate. Linda
Young (NASS) replied the constraint relates to the role of the NASS
Hogs and Pigs Report as a Federal Principal Economic Indicator. As
such, one of the directives is that estimates are released to the public
3
The Quarterly Hog Inventory Survey
puter software editing system). Blaise uses logical edits to ensure that the
relationship between responses are consistent. Each record (response) that
fails this edit will be marked as “dirty” and subject for additional review.
“Dirty” records are reviewed by a statistician to try to determine whether
the information provided was accurate or adjustments will be made.
Revised records are evaluated through Blaise. Once all data are deemed
clean, they move to the data analysis phase. During data analysis, the stat-
istician reviews outliers and influential records in comparison to current
and past data. Once data analysis is completed, all data are deemed clean
and the process moves to a summary stage.
DISEASE SPREAD
Abayomi described shocks, followed by challenges for modeling
given the way disease spreads and can be reported in the data.
She defined a shock as any event that can cause a sudden change in
hog inventories, noting that a shock can take various forms. Examples
include natural disasters such as Hurricane Florence (North Carolina)
and Hurricane Michael (the Carolinas, Florida, Georgia) in 2018; and the
flooding in 2019 in Nebraska and Iowa.
A shock can also take the form of a disease, such as PEDv. As
described in Chapter 2, the disease, first detected in the United States
in 2013, kills young pigs. Disease spread is another challenge in looking
at the impact of shocks. PEDv illustrates disease spread. A network of
Animal and Plant Health Inspection Service (APHIS) laboratories began
to collect information about this virus in 2013 and shared the information
among themselves. Eventually, they voluntarily gave this information to
APHIS headquarters, which began to produce weekly reports on PEDv
accessions and the number of positive PEDv samples that were identi-
fied. Abayomi showed that in July 2013, PEDv was detected in 9 states.
As of November 2013, 17 states had positive detections of the virus. The
virus had gained momentum and began to spread to neighboring states.
In March 2014, 28 states had positive detections, and by June 2014, 32
states had positive detections.
In summary, she said there was a short time period for the spread
of this virus. Although, it started with nine states, that number almost
quadrupled within a short time span. This example shows the relationship
DISCUSSION
Ron Plain asked whether response rates are greater among large
operations or small operations. Lori Harper (Methodology Division,
NASS) replied that they tend to get data from the largest operations
because NASS makes extra efforts to reach them and, if efforts fail,
NASS imputes for them. The middle-sized operations, with hog inven-
tories ranging from 5,000 to 20,000, have the highest nonresponse rates.
These operations are typically a challenge to reach via telephone if they do
not respond by mail. The smaller operators are contacted less frequently
than larger operations, tend to be at home in the evening, and answer
NASS calls.
Katherine Ensor asked whether NASS has analyzed the operator-
level individual time series data. She said it seems like a lot of data are
rolled up into summary information. Linda Young replied that the state
statisticians spend a lot of time looking at operator-level time series and
trends as part of their analysis. Headquarters statisticians also have the
operation-level data. Ensor noted that operator-level data might be par-
ticularly useful to try to understand the spread of disease.
Nell Sedransk observed that not every operator reports four times a
year, usually only two or three. To try to identify and characterize pro-
duction patterns, NASS has restricted attention to those operators with
a long enough time series to see how the operations have changed over
time. She noted that very small operations, with fewer than 100 hogs,
tend to be very individual. NASS has had better luck focusing on the
middle-sized and the larger operations. She said in equilibrium, the major
producers have a very smooth throughput with few unpredicted shifts.
The middle-sized operators show a much greater impact of shocks.
Ensor commented the individual operator time series provide a rich
dataset to understand equilibrium as well as shocks, and asked whether
more could be done with these data. Dan Kerestes said quite a bit has
been done. As part of the processing and editing of the survey data, ana-
lysts use a software tool that displays the company-level time series (cur-
rent and historical data) for all inventory categories. This use also illus-
trates the importance of high response rates, especially among larger
companies, he noted. Chris Wikle (University of Missouri) suggested
that it might also be useful to examine the spatial aspects as well as the
time series aspects.
Eric Slud asked whether the unit-level time series are used to do
the imputation for unit-level nonresponse. Harper replied this is done
for extreme operations. The statisticians in the field offices examine an
operation historically, looking at trends and growth to come up with
the best possible values to impute. For nonrespondents not considered
extreme operations,3 NASS uses an adjusted estimator, an extension
of a re-weighted estimator to include information about operator status
(e.g., whether still in the hog business). Slud asked how NASS adjusts
for item nonresponse or incomplete forms. Harper replied that for par-
tial responses, she thinks NASS uses manual imputation using the data
reported, as well as historical data provided by that operator.
Lee Schulz (Iowa State) asked about the potential for bias in the impu-
tation process, particularly in detecting future shocks. He asked whether
PEDv resulted in any adjustments to the process or model. Gavin Corral
reported updates in the way NASS looks for shocks (described in Chapter
5) but no changes to the Kalman filter model. The current approach to
detecting shocks would identify PEDv, but with a one-quarter lag. That
model basically gives a red flag for an unusual quarter. The red flag would
be reported to the pre-board and Hog Board (described in Chapter 4). The
pre-board might use that information to develop one or two scenarios for
consideration by the board.
Nancy Kirkendall (National Academies of Sciences, Engineering, and
Medicine) provided an example of using time series models with an adjust-
ment to reflect market changes to impute for nonresponse. She reported
that the Energy Information Administration used time series models for
each respondent to a survey. The models were used to project the current
report. If a company did not respond, that projected value was adjusted by
the ratio of the sum of all reported values divided by the sum of the pro-
jected values for the companies that reported. The adjusted number was
used as an imputed value.
3Extreme operators are the largest operators in a state, that is, those assigned to stratum 98 as il-
lustrated for two states in Tables 3-1 and 3-2.
4
Setting Official Estimates: The Hog Board
explained. The data are then submitted to him as the headquarters statis-
tician. He reviews the state-level estimates and prepares for a pre-board
meeting (usually held the following day).
He said that the pre-board members compare preliminary national
estimates with the state recommendations. The primary data relationships
reviewed are the ratio of the current quarter to previous year for the same
quarter expressed as a percent, the inventories of sows farrowed and pig
crop in conjunction with the two small weight groups, and the previous-
quarters’ pig crop inventories with the current 50–119 lbs., 120–179 + lbs.,
and 180 + lbs. weight groups. They examine national-level balance sheets
for the current data and the balance sheets for the two most recent past
quarters and the past year that have been updated to include updated
slaughter data, death loss estimates, and imports and exports.
Riggins explained that the balance-sheet approach results in residuals
that are carefully examined. For example, at the 3-month level, the previ-
ous quarter’s inventory estimates plus the pig crop, plus imports minus
the death loss, minus exports, and minus slaughter are compared to the
current quarter’s estimate. The difference is the residual.
One of the important things reviewed by the pre-board members is
the percentage changes of the two small weight groups from the previ-
ous quarter moving to the larger two groups plus part of the 50–119 lbs.
weight group to make sure the current inventory number includes the
smaller pigs reported in the previous quarter. They next look at a com-
parison of the model-based numbers with the survey numbers for totals
and by various inventory categories. They use this information to develop
two or three pre-board scenarios to present to the ASB the following
morning. One scenario is based on the model-based estimates. Another
scenario might be to adjust the weight group inventories to cover the
expected growth of small pigs reported in the previous quarter to larger
weight categories.
1Only respondents who participated in the current quarter, previous quarter, and past year are in-
cluded in the ratio.
DISCUSSION
Ron Plain observed that the reference date for the survey data is
the first day of the month. The data are released about the 28th day of the
month, and approximately three-quarters of the 180 + lbs. weight group
will have been slaughtered by the release date. He asked whether NASS
looks at slaughter over the past 28 days and uses it to adjust the 180 + lbs.
weight group inventory.
Riggins replied that at the time a state has to send in its estimates,
only 2 weeks of preliminary daily slaughter data are available. At that
point, slaughter is a weak indication compared to the survey data.
While there are more data by the 28th of the month, there is no time to
adjust the 180 + lbs. group. Moreover, he added, the slaughter data are
still fairly preliminary. NASS takes the first 2 weeks of slaughter data
into consideration but does not let those data influence how they look at
survey estimates.
Lee Schulz referred to the trade expectations, noting a balance sheet
approach is used. He asked about the possibility of NASS using the trade
expectations data. Riggins replied that the preliminary slaughter data are
informative, but much can happen in 2 weeks. The trade expectation data
are only available a day before release, Riggins said, although Schulz
commented the data may be prepared earlier. Dan Kerestes, who is an
NASS member of the ASB, commented that trade can go any direction,
as seen in Figure 4-2. He noted NASS must approach trade data with
5
Modeling Efforts
PURPOSES OF A MODEL
As context, he said, the purpose of a hog inventory model is to pro-
duce estimates for the hog inventory categories described in earlier chap-
ters. For purposes of evaluation, NASS compares the model estimates to
the initial official estimate released by the Hog Board. It would like
to have the difference between the two estimates within about 470,000
hogs, or approximately 1 day of slaughter. It also compares the model
estimates to the final revised estimate, available 1 year later. During times
of equilibrium, the initial and the final official estimates tend to be very
close, and NASS usually uses the initial official estimate for comparison
because it is available sooner. During times of shock, however, the initial
27
and final estimates diverge. NASS uses the final official estimate for com-
parison because it is more accurate.
NASS wants model-based estimates to be as efficient as possi-
ble, as measured by coefficients of variation (CVs). The models should
respect the interrelationships between categories of hog inventory over
time, referred to as satisfying biological constraints. NASS wants the
number of hogs in the system at one point in time to make sense with
the number of hogs slaughtered at another point in time. The survey
results alone may fail to do this. There tends to be a downward bias to the
survey results that may not reflect the hog growth lifecycle.
NASS seeks model-based estimates that provide accurate estimates
of inventory during times of shock. While disease (i.e., Porcine Epi-
demic Diarrhea virus [PEDv]) was a key issue in 2013 to 2015, NASS
would like to make accurate estimates during all shocks, whether disease,
natural disasters, tariffs, or other causes. Sometimes changes in the
industry before or after a shock also affect hog inventories, which are
important to track.
Corral discussed the KFM and SGLM against four criteria. First,
how well does each model capture inventories during times of equilib-
rium? Second, how well does it detect and adapt for shocks, such as the
PEDv? Third, how well does it account for the biological considerations
of the hog lifecycle? And, finally, how well does it satisfy the balance
sheet constraints that incorporate slaughter data, imports, and exports?
MODELING EFFORTS 29
This model has a number of constraints built into the state equations
to reflect the biological considerations and balance sheet relationships. For
example, there is a limit on the ratio of death loss (of pigs weaned) to pig
crop, and the annual increase of pig crop must be greater than the annual
increase in market weight groups. There is a weight group transition, with
an assumption about the growth of pigs within weight classes. The annual
increase in slaughter is equal to the annual increase in births for the two
preceding quarters. The total number of market hogs in a quarter should
equal the combined total slaughter numbers for the next two quarters. This
is related to the 6-month time period for hogs going from weight group
one to slaughter. There is a constraint that relates market hogs more than
180 lbs. to slaughter during the estimation quarter but after the reference
quarter. Although the quarter is in progress, daily slaughter information
is still available. At the time of the board meeting, 2 full weeks of daily
slaughter information is available. Another constraint is that sows far-
rowed make up one-half of the previous quarter’s breeding herd. Finally,
he noted, the KFM includes a constraint for a constant survival rate
across all weight classes (not considered in the new model, as discussed
in Chapter 7).
Corral illustrated KFM performance as measured against initial and
final official NASS estimates for total hogs. Figure 5-1 shows a time plot
from 2013 to 2017. Three estimates are shown: The initial estimate in
black, the final estimate in red, and the KFM estimate in blue. In the
epidemic years of 2013 to 2015, distance between the final and the ini-
tial estimates is seen. The KFM also missed the final estimate, Corral
pointed out. It tracked the initial estimate fairly closely, had some trouble
coming out of the shock, then started tracking reasonably well and rela-
tively quickly.
Figure 5-2 plots the differences between the final estimate for total
hogs and the KFM (blue) and initial estimates (black). This plot illustrates
the challenges the KFM had during the epidemic years. The differences
spike in March 2014 and December 2015. The KFM then underestimated
from late 2015 to March 2016. At the largest spike, the KFM was off by
almost 3 million hogs.
The difference between the final and initial estimates has a general
decreasing pattern, Corral noted, which illustrates that the KFM strug-
gles to get close to the final estimates after times of shock.
MODELING EFFORTS 31
FIGURE 5-1 Total hogs, comparing initial, final, and Kalman filter
model estimates.
SOURCE: Prepared by Gavin Corral for presentation at the workshop.
FIGURE 5-2 Difference between initial and Kalman filter model esti-
mates with final estimates.
SOURCE: Prepared by Gavin Corral for presentation at the workshop.
FIGURE 5-3 Total hogs, comparing initial, final, and sequential gener-
alized linear model estimates.
SOURCE: Prepared by Gavin Corral for presentation at the workshop.
MODELING EFFORTS 33
final in red, and SGLM in green. Notable is that during epidemic years,
the SGLM did a fairly good job of getting close to the final estimate.
The problem arises when coming out of the shock, he said.
Figure 5-4 is a companion to Figure 5-2. It shows the difference
between the final board estimate and the SGLM estimate, and between the
final and the initial estimates. It confirms the fact that the model did well
going into the epidemic, but was not good coming out of the epidemic.
In summary, referring to the four criteria, Corral noted that the
SGLM, like the KFM, captures equilibrium well. It is good at adapting
to shocks but not at adjusting during recovery. It does not account for
biological constraints and does not satisfy balance-sheet requirements.
This summary illustrates the strengths and weaknesses of the KFM and
SGLM. Corral said NASS hopes that the model described in Chapter 7
makes progress toward meeting the four criteria that he set forth earlier
in this chapter.
DETECTING DISRUPTIONS
Corral briefly described a third, new model to identify shocks devel-
oped by Wang and colleagues (2019). It is a Bayesian, hidden Markov
model that captures the dependence structure in the data. The model uses
a Dirichlet mixture model with an unknown number of distributions for
the non-null hypothesis. The algorithm allows for an optimal false nega-
tive rate, while controlling the false discovery rate. As input, it uses a
variety of variables, including sows farrowed, pig crop ratios, and dif-
ferences in revisions. Corral runs the model quarterly and provides its
indication of a shock (if any) to the Hog Board. The only challenge NASS
has with this model is that it is not good at detecting a shock that begins
in the current quarter.
Corral concluded by saying that the KFM model is the most useful
tool for NASS right now and the one currently used. Its shortcomings
arise during shock periods. The Wang et al. diagnostic tools are useful
and provide needed information, but have a lag in detecting a shock.
DISCUSSION
Schulz asked whether the KFM constraints are dynamic or static.
He noted in considering hogs moving between weight groups or going
to market, there is potential for operators to speed or slow that process,
MODELING EFFORTS 35
of the parameter estimates, but that her impression is that during equilib-
rium they are relatively stable.
Wikle expressed interest in point estimates and their uncertainties.
Sedransk said that the problem is that the model is rigid and not as respon-
sive to data inputs. Consequently, she noted, it is always biased toward
the time series estimate, which in most cases is the equilibrium model.
Wikle noted the phenomena might be due to the fact that the data are
constrained as opposed to the parameters being constrained. In this bio-
logical world with integrated population models, those constraints would
probably have not been on the data but would have been on the process.
Eric Slud asked about possible changes to the KFM, which is partly
constrained by not accounting for covariance. For example, even without
any new data sources, accounting for the most current revisions to past
quarters’ data should result in an improvement, he suggested. With new
data streams, covariates from other sources could also be used.
Katherine Ensor commented NASS basically has two state-space
models with different formulations. What NASS calls a KFM is a
state-space model with a Kalman filter as a tool, she said, and the SGLM
was not constrained. NASS could potentially add constraints to the
SGLM as a way to merge the two approaches, she suggested. Matthew
Branan asked whether NASS has considered model-averaging approaches.
In his view, KFM is very rigid and SGLM is too responsive to potential
dips in inventory. Perhaps an average would work better, he suggested.
6
Web-Scraping Effects
37
WEB-SCRAPING EFFECTS 39
DISCUSSION
Chris Wikle asked about the need for a text corpus as a training
sample for the algorithm to understand grammar. He noted that the results
of NLP can be sensitive to the training sample used. Wei replied that he
used Python trained from Wikipedia-like text data.
Andrew Lawson asked whether Wei had any U.S. examples of web
scraping for disease. Wei replied that he did not because there is no cur-
rent disease outbreak occurring within the United States. Porcine Epi-
demic Diarrhea virus (PEDv) occurred 6 years ago, and news about it
has disappeared. Nell Sedransk added that web scraping began at NASS
within the past 6 months and is in preliminary form. Most of the sources
that would have carried the news about PEDv have been archived.
Lawson said that his group has a project on ontology based on scrap-
ing abstracts from the National Library of Medicine. One element of
NLP is understanding what is meant. That can be difficult with super-
ficial scraping, he noted, and there can be interpretational issues in web
scraping. For example, there could be very fuzzy statements that say “this
might be an epidemic,” when it is not.
Kamina Johnson (APHIS) reported that APHIS had a similar effort
15 years ago but using what would now be considered archaic or ancient
systems. APHIS developed an algorithm to filter the information that
came in, setting a wide net, with a human analyst to review and catalog
the information. Web scraping is not a perfect science, but a multistep
process, she emphasized. She said that Wei might use the Seneca Valley
virus to see if his approach would pick up on that disease. She also sug-
gested testing the system by searching for disease outbreaks that do not
involve swine, such as virulent Newcastle disease, currently occurring in
California, or low path avian influenza in the fall. These two would test
for detection of diseases with lower levels of reporting. High path influ-
enza gets a lot of attention when discovered.
She also recommended the inclusion of potentially new sources in
web scraping. APHIS uses the reporting from SDGSP and instant email
notifications from ProMED. Additionally, the World Health Organization
for Animal Health (OIE) sends out instant notifications about outbreaks.
The OIE and ProMED reports are released in a very distinct structured
format that would be easy to use in a web-scraping tool, she noted. The
OIE identifies diseases that it tracks, so its reports are disease specific,
while ProMED also includes non-OIE-reportable diseases.
Lee Schulz asked about the accuracy of news as a variable when it
is always changing, being updated, and occasionally redacted. He won-
dered whether it could be used to construct a variable accurate enough
for possible input to a model, referring to the discussion of the accuracy
issues related to trade expectations. Wei responded that the project is
still in a preliminary stage, and NASS is exploring what can be done with
the information.
Linda Young expressed doubt that any board number would be
changed based on web scraping, but it might give an early alert to some-
thing happening that would then need to be confirmed to be useful. Dan
Kerestes agreed with Young’s comment. The board is looking for more
information. If the information can be used, perhaps in conjunction with
comments sent in by the regional offices, it might add to the discussion.
Analysis of the project has not yet been carried out.
Schulz asked about the current process for experts to become
informed and whether web scraping might help fill a gap by speeding up
the process. Kerestes replied its main attribute will be as a confirmation
of other information.
Travis Averill (NASS) observed that the estimation process is focused
on the survey and auxiliary data for a reference period, the first of March,
June, September, and December. This process also results in comments
and other input from respondents and regional offices that are difficult to
WEB-SCRAPING EFFECTS 41
analyze and use. Web scraping has the potential to make NASS aware of
possible confirmatory information that might help to understand the situ-
ation in the field and the potential impact of events.
Wikle asked about the potential for others to manipulate this type
of information, especially if NASS scrapes blogs and sites where people
might report incorrect information once they know how it is being used.
He asked about a mechanism for detecting false placement of key indi-
cators. He also questioned using web-scraped data as input to a spatial
epidemic model. He cautioned there is a big step between taking the
information and using it as input for a model.
7
Modeling Swine Population Dynamics
43
When the survey data are available and aggregated, and auxiliary
data are prepared, the model is run. The information (data as well as
model estimates) is considered during the pre-board meeting. The pre-
board determines alternative inputs to the model, and the model is rerun
using those inputs. The result is provided to the Hog Board as one of the
scenarios for consideration. As described in Chapter 4, the Hog Board
determines the initial official NASS hog inventory estimates.
CALIBRATION
Calibration is used to correct for biases in survey data prior to input
into the model. Sartore described the estimation of ratios to adjust for
survey undercoverage. There is a time series of quarterly ratios defined
as the ratio of the official U.S.-level board estimate for variable k at time
t to the U.S.-level survey estimate for variable k at time t. One objective
is to model and predict these ratios. The actual procedure is more com-
plex, because a different ratio can be computed for each potential revision
of the initial published estimate (1) to the final revision (5). NASS applies
a neural network model that uses hidden layers to capture the optimal
adjustment for a quarter. The monthly adjustment ratios are set equal to
the relevant quarterly ratio.
This process results in estimated national totals. The state-level
recommendations must also be adjusted to add to national totals. A
Lagrange multiplier approach is used to produce an estimated ratio to
apply to each state recommendation. The calibrated monthly and quar-
terly historical and current data are used as input to the model.
The new model has two distinct parts: hog production, tracking
breeding through birth and weaning (a monthly model); and hogs to
market, reflecting the growth of hogs through weight groups to slaugh-
ter (a quarterly model). According to Sartore, each litter results in about
10 weaned piglets. This number changes with time. The pig crop is made
up of the piglets born during the month (quarter) that are still alive on
the first day of the quarterly survey reference period. The pig crop in the
hog production cycle enters the hogs to market cycle in the two smallest
weight groups. Hogs grow through the weight groups until they are large
enough to go to slaughter.
There are monthly models for the pig crop and sows farrowed (the
only variables with monthly data). The log of each of those two variables
parameters to be estimated are the survival rates and the transition param-
eters between weight groups.
Sartore showed a graph illustrating estimated survival rates over time.
The estimated survival rates show some seasonality and a decrease during
the years of the epidemic. These survival rates are fairly stable, tending to
be around 0.95 before the epidemic, and a dip during the epidemic to about
0.88. Since the epidemic, survival rates have returned to about 0.92. The
quarterly estimates of survival rate and weight group transitions are esti-
mated by minimizing the sum of squared errors subject to two constraints:
The survival rates are constrained to be close to 1 and to form a smooth
function over time. Estimation uses the Broyden-Fletcher-Goldfarb-
Shanno iterative algorithm with initial survival rates set to 1, transition
rates for the first two weight categories set to 0.25, and for the last two
weight categories set to 0.75.
Sartore said that the model is estimated in stages. The first stage
involves combining the historic data, the latest survey indications, and
the state recommendations. The second stage involves updating the infor-
mation from stage 1 with estimates from the pre-board to account for
additional information from experts.
Sartore compared the new model and the Kalman filter model (KFM).
In particular, for all nine published estimates, the root mean square errors
(RMSEs) and mean percent errors (MPEs) were produced between model
estimates and both the initial and the final board estimates. The MPE
provides a measure of the relative bias while the RMSE provides a mea-
sure of variation due to both variance and bias.
For both models, MPEs were similar (same direction, similar magni-
tude) for both initial and final board estimates. The MPEs were fairly
small, less than 1 percent except for breeding herd inventory, for which
the MPE was slightly greater in magnitude than negative 2 percent
for the new model, and about negative 1 percent for the KFM. RMSEs
from the initial estimates tended to be somewhat smaller for the new
model than for the KFM, except for pig crop, sows farrowed, and breed-
ing herd. RMSEs from final estimates were somewhat smaller than those
of the KFM for all variables except breeding herd.
Sartore showed graphs illustrating the time series of initial board
estimates, the final board estimates, and the two model-based estimates
for pig crop and for total hogs. Figure 7-1 shows the plot for total hog
DISCUSSION
Matthew Branan (U.S. Department of Agriculture) observed that both
models seem much closer to initial estimates than to final estimates. He
asked whether Sartore has discussed with board members how they are
1The coverage probability is the proportion of the time that an interval contains the true value
of interest.
helpful. Young replied the agency would need to check on the feasibility
of this approach. The field offices collect the data, prepare estimates, and
send them to headquarters.
Plain asked a final question about the comparison of RMSEs between
the two models and initial estimates for all published variables. He
observed that for the pig crop, the old model seemed to perform better;
however, for market hogs under 50 lbs., the new model seemed to perform
better. This is interesting because there is a very high overlap between the
hogs in those two categories, he observed. He questioned why one model
would predict better than the other. Sartore responded he can improve the
new model for pig crop and sows farrowed.
8
Discussion of Detection and Monitoring
53
damage caused by feral swine. Feral swine can damage crops with their
wallowing, tree rubbing, and root-finding activities, and they can also be
reservoirs for diseases that can spread into domestic swine or cattle popu-
lations. The program has mapping activities to track where feral swine
are and where they are spreading, and also where the clusters of pseu-
dorabies virus and swine brucellosis are among feral swine populations.
The program also educates producers on how to prevent the spread of
disease from feral animals to their farms and domestic populations.
Third, the National Animal Health Monitoring System (NAHMS)
conducts national surveys every 5 to 10 years to collect information
from operators about animal health and management factors. Surveys
are cooperative efforts between APHIS and NASS. There are typically
two-phase national-level studies for the swine population. NASS admin-
isters a questionnaire during the first phase; the second phase includes a
questionnaire, administered by APHIS animal health experts, and bio-
logical sampling of swine feces and blood to test for specific agents or
diseases. APHIS uses NASS data on inventories and cost, and asks about
marketing, production practices, movement of animals, and biosecurity.
A sample question may be, “How do you keep your pigs sequestered from
wildlife?” APHIS collects biological samples and in the past has tested
for a variety of diseases including Porcine Respiratory and Reproductive
Syndrome (PRRS), trichinellosis, salmonella, E. coli, and enterococcus.
These provide estimates of prevalence and presence of diseases at the
national level.
For each of these programs and systems, regulatory authorities give
APHIS the ability to collect or act on data. The Animal Health Protec-
tion Act provides general authority for APHIS to control, prevent, and
monitor various diseases. The Swine Health Protection Act is focused
on garbage feeder operations. Other authorities such as quarantine laws
allow an operation to be quarantined if certain diseases are found.
In the next part of the presentation, Johnson noted the following
diseases have specific active or passive surveillance in the United
States: classical swine fever, African swine fever, foot-and-mouth dis-
ease, pseudorabies, swine brucellosis, and influenza A in swine. Most of
the surveillance programs have objectives related to rapid detection
of disease occurrences in the United States and/or a demonstration of
freedom from disease to support trade. Seneca Valley virus (SVV) is a
PANEL DISCUSSION
Schulz asked whether APHIS and NASS could collaborate and
share information in real time, such as the results from web scraping.
Johnson replied APHIS would be interested in how to format FSIS infor-
mation to make it most useful to NASS. She added that APHIS did more
web scraping in the past. She agreed with the importance of collaboration
and noted APHIS already has a close working relationship with NASS.
Linda Young added efforts are being made to share information across
U.S. Department of Agriculture agencies. She said she planned to explore
potential collaborations.
Slud noted that APHIS has streams of disease monitoring, each of
which might be in the form of occasional presence/absence, but many
do not have a regular format for reporting. For input into a model,
the information would need to be recoded into a general indicator. It
might not report all streams all the time, but might indicate when some-
thing crosses a threshold. It does not necessarily need to be directly pre-
dictive, he said, but to be correlated with important variables and to have
a spatial quality.
Johnson noted that APHIS has a risk identification group that looks
at international outbreak situations to try to assess threat or risk levels
for incursion into the United States. It also tracks and monitors situations
happening within the country. It sends reports to the secretary of agricul-
ture’s office. NASS may be another potential USDA customer for their
products and reports. Young noted that often NASS can more easily share
information with other USDA agencies than outside the department.
1Lawson provided these references: Corberan-Vallet (2012); Lawson, Onicescu, and Ellerbe
(2011); Corberan-Vallet and Lawson (2014); and Held, Hofmann, and Hohle (2006).
happen during the next time period, and that prediction is compared to the
observed value at that time period. Accessions might provide input to a
model, he said.
Ensor asked whether APHIS has modeled the patterns of outbreaks
for swine. If there are typical patterns, she queried whether there is sta-
tistical information in the patterns themselves. Johnson replied that
epidemiologists have explained that any pattern depends on where the
outbreak starts, both in terms of geography and the type of operation, as
well as direct and indirect contact points, how hogs are moving, and how
feed trucks are moving on and off operations. An important consideration
is compartmentalization, for example, how feed truck networks move
through the system. An outbreak in a commercial operation raising pig-
lets in North Carolina will have a very different outbreak response from a
backyard operation in Colorado. Patterns depend roughly on location and
type of operation.
Lawson discussed foot-and-mouth disease in the United Kingdom, in
which a very large outbreak occurred in one area. A vehicle ban was put
in place to stop the spread. It completely knocked down the original epi-
demic but then the disease jumped to other places. Predicting the jumps
is extremely difficult because it is purely random, as occurs with other
diseases such as HIV. That said, some patterns are fairly predictable.
There is typically a shock with a long tail. He noted that Bayesian models
have been used to predict the ends of epidemics, as well as the starts. Pre-
dicting the end could be very important from a veterinary point of view.
Schweinberger provided a related idea. Instead of spatial structure,
he asked about network structure. He noted two questions to answer: (1)
Is there a shock? (Schweinberger said that Lawson had described some
Bayesian approaches for answering that question); and (2) If there is a
shock, how large will its impact be? For example, a shock in the form
of an epidemic could in principle wipe out the entire U.S. population of
hogs. However, this possibility is very unlikely because infectious dis-
eases are transmitted by contact and the network of contacts among hogs
constrains the spread of infectious diseases. For example, an outbreak of
an infectious disease on a farm in Colorado could spread directly to all
farms that purchased hogs from the affected farm, but it could not spread
directly to farms that did not purchase hogs from the affected farm. The
structure of the contact network places hard constraints on how infec-
tious diseases can spread, which is relevant for determining how large a
shock in the form of an epidemic can be and how much it can reduce the
U.S. population of hogs.
This structure provides a way of assessing the impact of an epidemic,
Schweinberger suggested. A survey with two questions could be used to
collect additional data: (1) Is there a hog-related disease on your prop-
erty? and (2) Did you sell hogs, and if so, to whom? Answers to these
questions could help to assess the extent of a problem, where disease
might have traveled, and which farms to monitor. Understanding the
network structure might help in assessing how large the impact of an epi-
demic might be.
Johnson noted epidemiologists in her group look at different scenarios
with different start points. Simulation models help predict the proportion
of the population affected through space and time. She said APHIS also
manages the Emergency Management Response System (EMRS). The
EMRS maintains outbreak response information, including asking ques-
tions to farmers about where they source and sell hogs. They commonly
do tracebacks with cattle, which tend to be sold in smaller lots than hogs.
That information is used during an outbreak to predict spread. They also
compare different response strategies to curb disease spread. Strategies
include enforcement of boundary stop points for quarantine and deploy-
ment of vaccines.
In answer to a question from Schweinberger about the epidemiologi-
cal models used, Johnson replied that the main model for swine diseases
is InterSpread PLUS, developed in New Zealand. APHIS has adapted it
for the United States and uses it to prepare national and regional analyses.
It started with classical swine fever and have recently adapted the param-
eters for African swine fever, because those diseases are similar in many
ways. APHIS has also been developing an animal disease spread model,
but it is not yet operational.
Schweinberger asked about use of a susceptible-infectious-recovered
model. This model is applied to determine how infections spread in a
population such as a herd. These kinds of models make the implicit
assumption that the disease could travel from each unit to every other
unit in the population. More metric-based approaches are also available,
and they are more realistic in that they acknowledge a disease can only
spread from an infectious unit to other units that are in contact with that
infectious unit.
Johnson said they use herd-level models; if the herd tests positive,
not all animals on the site will be infected at that point in time. Predicting
the spread from one herd to another herd is the aim to looking at spread
across multiple sites as opposed to within the herd or flock. The model has
a spatial component.
Ron Plain noted the focus on big shocks and new diseases, but he
asked about data to estimate death loss for chronic diseases, such as
PRRS or swine dysentery, to determine whether levels are high, low, or
typical. Even though they may not be big shocks, it would be interesting
to see whether the deviation is enough to impact pork production, he said.
Branan said FSIS condemnation rates capture part of that information.
Codes indicate why particular animals are condemned at slaughter and
could track deviation from baseline.
Johnson added that for many production diseases, the prevalence esti-
mates are done through NAHMS studies. The studies provide an update
every 5 years to see how the prevalence level is changing. She asked
about industry tracking, perhaps by the Swine Health Information Center.
Schulz said a private/public partnership at the University of Minnesota,
called the Swine Health Monitoring Project (SHMP), started with PEDv
and now monitors PEDv and PRRS. Johnson observed SHMP provides
more frequent reporting of disease than the NAHMS study that comes
out every 5 years.
Yijun Wei asked whether the simulation study mentioned by Johnson
was an epidemiological or another type of model. Johnson said that she
and Branan are not experts in the epidemiological model, but they would
be happy to work with NASS after the workshop and bring in experts
with more in-depth information.
9
Discussion of Modeling
MODELING CHALLENGES
Ensor acknowledged the difficulty fully understanding the challenges
of the project and how all the pieces fit together. She pointed to promising
results from the state-space modeling perspective. State-space models
allow bringing in prior information and are frequently computationally
manageable. She supported Andrew Lawson’s suggestion (see Chapter
8) about developing some type of switching between endemic and epi-
demic time periods. This approach allows the dynamics of the model to
change, for example by using hidden Markov models. She also observed
that collaboration with the Animal and Plant Health Information Service
(APHIS) might help the National Agricultural Statistics Service (NASS)
develop an epidemic model to use with a switching approach.
She added she strongly favors a more model-based approach because
of apparent data limitations. She asked how many time points are used
63
for fitting the models that have been described. She reported that the
more empirical-based approaches using LASSO and seasonal time series
models will be very unstable unless a long time series is used in estima-
tion. Understanding and measuring uncertainty are also very important.
Wikle said he appreciated the complexity of the problem and praised
the modeling efforts described. He said the problem reminded him of
new work in ecological modeling called integrated population models
(IPMs), which take many different data sources in a population-based
model that accommodates biological dynamics (referred to as constraints
by NASS) in a spatial setting. IPMs seem to provide an ideal framework
for NASS, he suggested. They are often fully Bayesian, and there are
frequentist versions.
Wikle also said NASS would benefit by incorporating spatial con-
siderations into its modeling (see Chapter 10 for further elaboration). He
expressed interest in modeling in both space and time and remarked that
combining them requires more thought than back-engineering a time
series model with spatial items. He observed that dynamics occur on
scales that are not currently incorporated in NASS modeling. NASS is
modeling biological constraints in the data, which is important, but the
real biological dynamics occur on an individual and maybe even a herd
scale. He encouraged thinking about the dynamics and where they occur.
He added that while most users care about point estimates, a coherent
decision-making process requires that the model produce a point estimate
that has quantifiable uncertainty. That means that an estimate is needed
of the reliability of the point estimate. Reliability may need to be esti-
mated through simulation or microsimulation. This is especially impor-
tant if parameters are re-estimated at every time period, he stressed.
Ensor agreed, noting a concern that parameter uncertainty estimates
have not been examined. Such estimates provide information about how
well the model is working. In answer to her question about how many data
points are used to fit models, Luca Sartore replied the SWARCS model
uses the entire time series of monthly and quarterly data from 2008 to
the current quarter to prepare model estimates for the current quarter.
His approach has so many parameters to estimate that the amount of
data points is not sufficient to get stable estimates. He used the LASSO
approach with penalties to make the process stable. Ensor replied that
using the whole history eliminates the opportunity to capture dynamics.
DISCUSSION OF MODELING 65
AUXILIARY INFORMATION
Slud asked about auxiliary information available at the state level,
noting pork check-off data as one source. Ron Plain explained the manda-
tory check-off program. When animals go to slaughter, the packers make
an assessment of 0.4 percent of market value. That money goes to the
National Pork Board, part of which is allocated to the state where
the animal was raised. The Pork Board publishes those data monthly. The
data show how the dollars are allocated and how many hogs come from
each state. They look like very good data when compared to the national-
level slaughter data, he said. However, one issue is that many pigs change
states: For example, they are born in North Carolina and are moved to
another state to be fed and raised, but ownership does not change. If there
is no ownership change, then no check-off is assessed.
Ensor asked whether the problems with the check-off data are known
well enough so that they might be useful in modeling. Slud asked about
other data that might illuminate the shortcomings—for example, survey
data on state transfers of hogs for raising or state-to-state transfer infor-
mation. Plain responded that in the past, state veterinarians’ offices had
information on the shipment of hogs for nonslaughter purposes. The
data were based on health certificates needed for interstate hog transfer
DISCUSSION OF MODELING 67
10
Discussion of State-Level Estimation
DATA AVAILABILITY
Corral summarized the data available to support modeling at the state
level. First, there are 30 states for which survey estimates for all inventory
items are available for all quarters beginning in 2008. For the remaining
20 states, survey estimates are available annually in December. In addi-
tion, state recommendations are developed with state-level survey data
adjusted by regional offices. For these data sources, variables available
include sows farrowed and pig crop (available monthly) and breeding
herd, inventory by weight group, first and second intentions, death loss,
and total hogs.
Corral referred to the earlier discussion about monthly check-off data
(see Chapter 9). U.S. pork producers and importers pay $0.40 per $100.00
of value when pigs are sold and when pigs or pork products are brought
69
into the United States. The National Pork Board uses the funds for spe-
cific program areas such as promotion, research, and education.
NASS has annual national import and export data. Some data are
available annually on state-level in-shipments and out-shipments from
the state veterinarian offices. Slaughter data are available monthly at the
national level. Data on number of hogs slaughtered in a state are available
monthly based on slaughterhouse location, but those data do not reflect
where the hog was raised so are not comparable to NASS hog inventory
data. They are also available by slaughterhouse location. Another poten-
tial source of state-level information is the web-scraping data described
in Chapter 6.
Fay-Herriot Model
Datta, in a joint presentation with Slud, started by introducing the
Fay-Herriot model, a popular model for producing small area estimates.
In the case discussed at this workshop, the goal is to estimate ϴi, the hog
production for state i, for i = 1 to m. A state-level estimate, Yi is known and
prepared from the sample that is a direct estimate of ϴi. However, direct
estimates are often not reliable, especially for small states. To develop a
reliable estimate for states, Fay and Herriot (1979) proposed an approach
to borrow strength from other data sources. It uses two models. The sam-
pling model describes the sample estimate as providing an unbiased esti-
mate for state production, ϴi, with an additive noise term. The noise term
has zero mean and is normally distributed, with variance equal to the
sampling variance. There is no correlation between estimates for differ-
ent states. The second model is called the linking model. It connects ϴi to
covariates Xi via a linear regression relationship with the covariates. This
model also has an additive independent identically distributed noise term.
The linking model will be best with good covariates, he said. Disease
indicators might serve as useful covariates for the topics discussed in this
workshop. Other covariates may include parts of the growth models, such
as the relationship between pigs born and sows farrowed. Once param-
eters are estimated, the approach results in two estimates for ϴi , with one
from the survey and one from the regression. The approach yields a third
estimate for ϴi called the shrinkage estimator. The extent of shrinkage
depends on the variance of the sampling error equation and the variance
of the linking equation. If the variance of the survey estimate is small, as
it might be for the largest hog-producing states, the shrinkage estimate
will be close to the direct estimate. If the variance of the survey esti-
mate is large, as it might be for smaller hog-producing states, the shrink-
age estimate will be very close to the prediction of the regression linkage
model.Fay-Herriot models can be treated and estimated either in a fre-
quentist or Bayesian mode.
Datta described benchmarking work done for small area models. He
noted a Bayesian solution to the benchmarking problem in some of his
own work (Datta et al., 2011). With this approach, the small area model is
used to get state-level estimates, which are summed and should agree with
the national-level estimate. If they do not, then a benchmarking approach
is used to change the estimate from the model slightly so that the new esti-
mate satisfies the benchmark.
He noted that another approach uses two-step modeling: a small area
model and then a time series or state-space model. The Fay-Herriot
approach provides a way to combine these estimates. He noted that
Pfeffermann and Tiller (2006) described benchmarking approaches
that help to account for changes due to shocks, such as disease. The
Shares Approach
Slud used Figure 10-1 to introduce a shares model sometimes used in
small area contexts when data are meager. The approach concentrates on
developing national estimates for hog inventory classes, and it models the
Ron Plain asked about data available at the state level. He asked
whether large producers are reporting in the smaller 20 states (those states
for which the sampled operators are only required to report in Decem-
ber). In other words, he asked, does I.A. Smithfield, a very large producer,
have hogs in 1 of the 20 states that are not surveyed quarterly? If so, that
information might be useful for modeling. Dan Kerestes said there are
not enough instances of this occurring to be useful, noting that the states
surveyed annually contribute less than 0.5 percent of the national pig crop.
Plain referred to Sartore’s discussion about estimating pig sur-
vival rates and death loss by state. He asked whether death loss is more
correlated to size of operation than to the location of an operation. Death
loss for a smaller operation might be more similar to other small opera-
tions elsewhere than to larger operations in the same state. Nell Sedransk
said pig crop per sow farrowed varies more by operation size than it does
by state. It also varies a bit seasonally. In the northern tier of states with
harsh winters, there is more seasonality than in the southern tier states.
There are differences that are regional. She said that she has seen the rela-
tionship of death loss to size of operation.
Plain asked whether NASS has found a way to incorporate situations
such as a severe winter in the Lake States with litter size. Kerestes said
that question was looked at about 5 years ago as part of an imputation
study. The study found many hogs raised indoors, so weather did not
have a big impact. The study indicated that records from operators from
across the United States could be used in imputation without much differ-
ence in the quality of the imputation. Size of operation is different. Death
loss is different for large operators. They have more economies of scale.
Kerestes discussed shocks such as flooding. He observed, for example,
that most barns are built on higher ground, so during many floods, the
barns are fine. Initial predictions after a hurricane in North Carolina in
the early 2000s were a huge loss of hogs. In fact, the loss was less than
50,000 head. It is easy to overstate the impact of shocks, he warned, which
is why NASS has to be careful with the data used. Sedransk noted that
even if loss due to flooding is minor at the national level, it might be very
important to a smaller state.
ships. Some of the regional offices have what they call mini-boards that
assemble two or three other people from the regional office to discuss and
review recommendations for all states in that region. Information is then
sent to headquarters where Seth Riggins as head statistician reviews and
evaluates it before further work is completed.
Katherine Ensor observed there seems to be a lot of uncertainty in
that process. Kerestes said that he thinks that after 150 years, the process
itself is pretty certain. Statistical procedures are standardized across the
NASS regional field offices. The computer tool is standardized, although
there is judgment in the process.
Datta said that the state recommendations can be a covariate in the
model. Young replied that NASS uses state recommendations as input to
many of its models.
Plain commented that NASS works very hard with the national esti-
mates, truing them up to slaughter data. However, if he hears a state is
down by 10 percent, he said he is inclined not to believe it because it may
be that the previous year’s number was too high by 10 percent and the
number was corrected. He said from his perspective, without a way to true
a state number for revisions, he has limited confidence in the number. He
asked if NASS has considered how to get state-level data right.
Kerestes agreed an estimate is only as good as the data that go into it.
For example, if a large operator is a nonrespondent one quarter and the
operator’s data must be estimated, there is a possibility that the resulting
estimate may contain greater error. He said that they try to monitor rela-
tionships from year to year when they do revisions. When the Census of
Agriculture becomes available, 5 years of past data are evaluated and pos-
sibly revised.
Kerestes remarked that when a change to state-level data is due to a
change in a large producer’s practices, NASS cannot explain the changes
to the public because of the confidentiality pledge given to the operator.
He added that NASS listens when industry says that there is something
wrong with their numbers. The agency evaluates the data and may even
follow up with key producers to verify their data. In his view, the national
number is the most important because it is used to establish trade param-
eters. Individual state numbers are very important to the people in those
states. What NASS tries to do, however, is to capture shifts in the hog
industry at the national level.
11
Discussion of Visions for the Future
79
Hog Board does now. The board achieves the final estimate through its
revision process better than any current model does.
Schulz asked whether this process would be internal or a competi-
tive bid process where consultants or contractors could be tasked to build
models for a competition. Young replied a competitive process might be
possible, but the data used by any such model are not publicly available.
A public competition would require that NASS simulate the data in some
way. This is difficult, Young said, because the industry is so concentrated
in some states that high-quality simulated data might reveal confidential
information. She welcomed suggestions for solving that problem. She
concluded that the increasing concentration is making large producers
more vulnerable to disclosure, which NASS cannot risk.
Returning to the question of how NASS would adopt a new model,
Young said that the Kalman filter model (KFM) is the best they have now
and is used to provide model-based results to the board. If NASS finds a
better model, it would also be used to provide results to the board. There
would not be an automatic switch-over, she said. Results would be tracked
until the board is comfortable that the new model provides an improve-
ment. It is an evolving process.
Luca Sartore asked about the Pfeffermann example described by
Gauri Datta that used groupings of states. If the model accounts for sev-
eral groups of states, he asked, is an individual state in only one group or
can it be in more than one group? Datta replied the states were grouped
into nine geographic regions, with each state in one region. The exam-
ple used a state-space time series approach to borrow strength, and the
authors introduced benchmarking to make regional predictions to add to
the total. Sartore wondered whether the model could allow each state to
appear in more than one group. Datta said he was not sure about the mod-
eling part, but the benchmarking process would not work. Pfeffermann
benchmarked the regional estimates, not state estimates. Eric Slud noted
that the Pfeffermann model also did not consider multivariate outcomes.
Slud further observed that if NASS feels confident that certain
models are doing well during different periods, such as during equilib-
rium, going into shocks, or coming out of shocks, it would be possible
to consider a composite estimate with weights that change based on
the current situation. He observed NASS seems to have used the same
40 data points many times in reaching conclusion. Young said NASS uses
the information available and has not thought much about a composite
estimator. It has talked about switching and the possibility of developing
an indicator that it is time to switch models. She asked about possible
fresh approaches.
Nell Sedransk noted that using the 40 quarters of data at the state level
might help in defining these transitions because the dynamics and timing
were different in different states. For example, developing a model for
coming out of shocks will be best done using state-level data that exhibits
this transition. Also, based on different producer populations, individual
states each come out of the shock with a somewhat different trajectory.
Chris Wikle reiterated his suggestion that NASS consider integrated
population models that are being used in the ecological literature. They
consider multiple types of data, including survey or sampling-based data
with state-space models. The models include the potential that the bio-
logical dynamics are changing. Ecological modeling frequently considers
changes in habitat, but it could be time. The question is whether NASS
has enough data to inform the model. One advantage of this modeling
approach, he said, is that it results in uncertainty bounds for estimates.
Young asked whether software has been developed for these models.
Wikle said that software exists, but it may not be appropriate for the
NASS application. He offered to follow up after the workshop.1
Dan Kerestes noted that the largest challenge in modeling shocks is
that every shock, and every disease, is different. If it is a new disease, it
is unclear how soon a cure will be identified and what impact the cure
will have. If the modelers use Porcine Epidemic Diarrhea virus (PEDv)
as the example to follow, the next disease may not have the same impact.
When PEDv came about and a vaccine was developed, its effect on the
1Wikle, in collaboration with Mitch Weegman (MUSE School of Natural Resources), provided
the following discussion and references: (1) The best general introductory overview to integrated
population models (IPMs) is Zipkin and Saunders (2018). There is also some introductory and appli-
cations material, see https://academic.oup.com/aosjournals/pages/integrated_population_models. (2)
Chandler and Clark (2014) provides examples of emerging of spatially explicit IPMs. At present, these
are seriously limited in space because of the amount of information required for estimation. (3) Most
applications include time-dependent parameters. The link in (1) above has examples. (4) In terms of
software, there is no R package for these models yet. They are typically run in JAGS (http://mcmc-
jags.sourceforge.net/) or STAN (https://mc-stan.org/). There are cookie-cutter likelihoods to match
particular datasets and research questions (e.g., state-space Cormack-Jolly-Seber (CJS) likelihood for
individual capture histories, but multinomial likelihood for summarized [m-array] capture histories or
multistate models). There is a lot of code online bolting various likelihoods together.
sows being farrowed and the litter rates was dramatic. A new disease
may be different.
Ensor wondered whether NASS is asking too much for the dynamics
of an epidemic to be picked up by a model. Perhaps the decision that there
is a shock might be better guided by expert opinion. The current view is
that people-derived decisions are too subjective, but maybe there is not
enough information to capture all possible dynamics, and expert opinion
may be useful. Lawson said, for example, training the model on PEDv
would likely make it too specific. It should be as general as possible so that
the dynamics for a new disease can be learned. While it is always difficult
to predict something new, there are things to try, such as using a general
descriptive model to capture dynamics and test on different datasets.
Kamina Johnson noted that PEDv was an emerging disease, and
APHIS has established models for foreign animal diseases. It has many
known parameters, perhaps not completely known but with much
less uncertainty. In the emerging disease area, little is known. It may be
even more important to capture uncertainty in these situations—both in
model-based estimates and expert judgment. Experts in her office dis-
cuss alternative scenarios about what the specific disease might be and
its potential impact. They provide a range of possibilities for situations
when there is no scientifically justifiable process to quantify a parameter.
It may be that NASS would also benefit from a mix of data-based esti-
mates and expert judgment, each accounting for uncertainty.
Andrew Lawson said that the foot-and-mouth disease outbreak in
the United Kingdom was highly publicized and modeled in real time
by people at Imperial College. He referred to Lawson and colleagues
(2011) that came out a few years later comparing the modeling efforts
during that outbreak. The predictive capability of the models was incred-
ibly low despite all the information and news, which may be a warning
about the capabilities of modeling.
Lawson also suggested thinking about modeling at different levels
and combining them in joint models. This approach may not be able to
predict very accurately at a fine scale but could predict quite well at a
coarser level. Doing the modeling together might help in making sensi-
ble predictions. With that in mind, estimating national- and state-level
models jointly might be a good idea because it can borrow strength from
the levels, he said.
makes sense because the epidemic does not happen everywhere at the
same time. She suggested the need for a dynamic piece instead of jump-
ing to an entirely new universal model that may only apply when all states
are affected, or may never apply if states are in different epidemic phases
or transitions at each point in time. Slud commented that a composite esti-
mator is not the model class that he most favors, but he does like the idea
of models that apply at different points in time (e.g., equilibrium, start of
disease, recovery). A composite based on them might be useful, he added,
and it might be interesting to examine the residuals at the state level to see
whether they suggest model deficiencies.
Ensor concluded by acknowledging how impressive the NASS
presentations and modeling work have been. She noted the job of the
committee and discussants was to come up with ideas for other approaches.
References
Busselberg, S. (2013). The use of signal filtering for hog inventory estima-
tion. Proceedings of the Federal Committee on Statistical Method-
ology (FCSM) Research Conference. Available: https://nces.ed.gov/
FCSM/pdf/G2_Busselberg_2013FCSM_AC.pdf.
Chandler, R.B., and Clark, J.D. (2014). Spatially explicit integrated popu-
lation models. Methods in Ecology and Evolution, 5, 1351–1360.
Corberan-Vallet, A. (2012). Prospective surveillance of multivariate spa-
tial disease data. Statistical Methods in Medical Research, 21(5),
457–477. doi: 10.1177/0962280212446319.
Corberan-Vallet, A., and Lawson, A.B. (2014). Prospective analysis
of infectious disease surveillance data using syndromic informa-
tion. Statistical Methods in Medical Research, 23(6), 572–594. doi:
10.1177/0962280214527385.
Datta, G.S., Ghosh, M., Steorts, R., and Maples, J. (2011). Bayesian
benchmarking with applications to small area estimation. TEST, 20,
574–588.
Fay, R.E., and Herriot, R.A. (1979). Estimates of income for small places:
An application of James-Stein procedures to census data. Journal of
the American Statistical Association, 74, 269–277.
85
Ghosh, M., Nangia, N., and Kim, D.H. (1996). Estimation of median
income of four-person families: A Bayesian time series approach.
Journal of the American Statistical Association, 91, 1423–1431.
Held, L., Hofmann, M., and Hohle, M. (2006). A two-component
model for counts of infectious diseases. Biostatistics, 7(3), 422–437.
doi:10.1093.
Kedem, B., and Pan, L. (2015). Time Series Prediction of Hog Inventory.
Unpublished internal document. U.S. Department of Agriculture
National Agricultural Statistics Service.
Lawson, A.B., Onicescu, G., and Ellerbe, C. (2011). Foot-and-mouth
disease revisited: Re-analysis using Bayesian spatial susceptible-
infectious-removed models. Spatial and Spatio-temporal Epidemiol-
ogy, 2, 185–194.
Pfeffermann, D., and Tiller, R. (2006). Small-area estimation with state-
space models subject to benchmark constraints. Journal of the Ameri-
can Statistical Association, 101, 1387–1397.
Rao, J.N.K., and Molina, I. (2015). Small Area Estimation (second ed.)
Hoboken, NJ: John Wiley & Sons.
Wang, X., Shojaie, A., and Zou, J. (2019). Bayesian hidden Markov models
for dependent large-scale multiple testing. Computational Statistics &
Data Analysis, 136, 123–136. doi: https://doi.org/10.1016/j.csda.
Zipkin, E.F., and Saunders, S.P. (2018). Synthesizing multiple data types
for biological conservation using integrated population models. Bio-
logical Conservation, 217, 240–250.
Appendix A
Agenda and List of Participants
APPENDIX A 89
STAFF
Nancy J. Kirkendall, Study Director
Anthony Mann, Program Associate
Brian Harris-Kojetin, Director, Committee on National Statistics
INVITED DISCUSSANTS
Matthew Branan, Animal and Plant Health Inspection Service, USDA
Gauri Datta, U.S. Census Bureau
Kamina Johnson, Animal and Plant Health Inspection Service, USDA
Andrew Lawson, Medical University of South Carolina
Michael Schweinberger, Rice University
NASS PRESENTERS
Emilola Abayomi, Research and Development Division, NASS, USDA
Gavin Corral, Research and Development Division, NASS, USDA
Seth Riggins, Statistics Division, NASS, USDA
Luca Sartore, National Institute of Statistical Science and Research and
Development Division, NASS, USDA
Nell Sedransk, National Institute of Statistical Science and Research and
Development Division, NASS, USDA
Yijun (Frank) Wei, National Institute of Statistical Science and Research
and Development Division, NASS, USDA
Linda Young, Director of Research, NASS, USDA
APPENDIX A 91
ATTENDEES
Travis Averill, Livestock Branch, NASS, USDA
Jeff Bailey, Branch Chief, Methodology Division, NASS, USDA
Valbona Bejleri, Research and Development Division, NASS, USDA
Lu Chen, Research and Development Division, NASS, USDA
Cynthia Clark, COPAFS; former Administrator of NASS, USDA
Nathan Cruze, Research and Development Division, NASS, USDA
Lindsay Drunasky, Methodology Division, NASS, USDA
Lori Harper, Methodology Division, NASS, USDA
Dan Kerestes, Director of Statistics Division, NASS, USDA
Kay Turner, Research and Development Division, NASS, USDA
Appendix B
Biographical Sketches of Planning Committee
Members and Speakers
93
Houston area. Her research interests include methods for dependent data
including time series/spatial and spatial-temporal; unique applications of
Bayesian hierarchical modeling and approximate Bayesian computation;
and stochastic process modeling and information integration. She is a
fellow of the American Statistical Association and the American Asso-
ciation for the Advancement of Science and has been recognized for her
leadership, scholarship, and mentoring. She serves as vice president of
the American Statistical Association and as a member of the National
Academies Committee on Applied and Theoretical Statistics. She has a
B.S.E. and an M.S. in mathematics from Arkansas State University and a
Ph.D. in statistics from Texas A&M University.
APPENDIX B 95
INVITED DISCUSSANTS
MATTHEW BRANAN is a mathematical statistician working for the
USDA-Animal and Plant Health Inspection Service-Veterinary Ser-
vices in the National Animal Health Monitoring System. He is currently
involved in all stages of implementing national studies related to animal
health in a variety of industries including aquaculture, swine, cattle,
sheep, goat, beef cow-calf, and dairy cattle, with particular focus on the
design of the studies and the analysis of study results.
APPENDIX B 97
NASS PRESENTERS
EMILOLA J. ABAYOMI is a mathematical statistician at USDA’s
National Agricultural Statistics Service. Her current research at the agency
focuses on elements of respondent burden and impacts of data quality.
She has an M.S. and a Ph.D. in biostatistics from Florida State University.
December 2016, he became the statistician leading the national Hogs and
Pigs Program. He holds an M.S. in agricultural economics from the Uni-
versity of Kentucky.
APPENDIX B 99