Finalpaper PDF

The informational content of FOMC
meeting transcripts
A textual analysis analyzing the relationship between the content of FOMC
meetings and changes in the federal funds rate
I. Introduction
Textual analysis applied to economic research has undergone a renaissance of sorts in
recent years. With significant advancements in machine learning and computing power, formerly
qualitative data embedded in texts has now become available for economists to quantify,
classify, and incorporate into their models. These developments have created an opportunity to
answer questions from level of individual actors and their reactions, rather than summary
statistics describing their interactions with markets. Through applying new machine learning
methods, researchers can gain a more granular understanding of how individual reactions affect
different economic outcomes.
In this paper, we explore the reactions of a small set of actors with outsized influence on
economic policy: members of the Federal Open Market Committee (FOMC). We attempt to
understand what concerns are the most important in determining a change in Federal Reserve
policy regarding interest rates. Previous research analyzes quantitative measures, such as
unemployment and inflation, to understand what prompts policymakers to change interest rates.
In undergoing a textual analysis, we can answer this question by looking directly at qualitative
data from these policymakers.
The Federal Reserve releases qualitative information in the form of meeting minutes,
statements, press releases, reports and transcripts. Transcripts of meetings and conference calls
are released to the public with a five-year lag. As FOMC members’ comments are relatively
protected from scrutiny by this lag, we can use these transcripts to understand which issues drive
changes in interest rate policy.
2

Through textual analysis of transcripts of FOMC meetings, can we identify the primary
concerns that cause the Federal Reserve to change the target federal funds rate? Can we do this
by quantifying the content of the meeting? By answering these questions, we can take a closer
look at our traditional economic understanding of the factors that influence the federal funds rate.
We can attempt to understand if the FOMC members’ words reflect the actions of the market.
To achieve this, we first collect, clean and transform data with the goal of isolating
relevant word counts. Our outcome variable is the target federal funds rate. We focus on years
1982 – 2008, as pre-1982 there was no target rate and post-2008 the Federal Reserve switched to
a target range. We began by scraping FOMC transcripts from this time period, in total amounting
to nearly 18,741 pages. Each document is stripped of formatting, converted into text, removed of
stop words, and passed to the Porter algorithm to reduce to stems. We then preselect a list of
relevant word stems to observe, chosen by examining descriptive statistics and economic
intuition. After pre-processing we count words, scale by size of document, and associate the
words with the relevant change in federal funds rate.
We then examine the relationship between these words and our outcome variable. We use
a linear model using least-squares penalty trained with an L1 prior (also known as Lasso), with
iterative fitting along a regularization path to derive our regression coefficients. We select the
best model using a 20-fold cross validation scheme fitted with coordinate descent and least angle
regression. We also compile various summary statistics, examining how the weighted frequency
of these words changes over time, and how they vary across positive and negative changes in
interest rate.
We conduct two separate the regressions on reductions in federal funds rate and increases
in federal funds rate. In both, we do not find no relevant word stems that are statistically
3

significant in determining the federal funds rate. We test our regressions with t-statistics to
confirm our model's result. We then conduct a textual analysis of our sample, revealing a lack of
variability in our outcome variables with regard to reductions in federal funds rate, and
additionally a lack of variability in proportions of word counts across transcripts associated with
increases in federal funds rate. We are able to deduce that FOMC transcripts in our sample
roughly touch the same topics in similar proportion regardless of the outcome of the meeting.
This suggests that there may be unobserved factors that FOMC members do not reveal in
meetings that determine their vote, and that FOMC meetings are more of a discussion of
possibilities with few subtleties revealing voting decision.
The remainder of the paper is organized as follows. Section II outlines relevant work by
other researchers. In Section III we present our methodology and data collection process. We
present the results of our regression and textual analysis in Section IV, and interpret its
significance. Section V concludes.
II. Literature Review

Economic literature attempting to incorporate qualitative data through textual analysis
has evolved through developments in machine learning methods and open-source software
programs that implement them. Prior to these advancements, researchers were limited to pre-
scheduled public announcements with easily quantifiable impact. Methodology centered on
variations of difference-in-differences analysis or manual interpretation of texts. As a result,
analysis was conducted on small samples of data, with limited quantitative analysis.
Present day textual research uses classification and regression analysis, scaling to
thousands of data points. Post-2000, with algorithmic advancements granting the ability to
4

analyze large quantities of data on different types of informational events, the amount of
published textual analysis research has skyrocketed.
The data analyzed has extended to unscheduled announcements, with data from
supplementary texts, news and other media sources collected for analysis. Pre-processing of data
varies across papers, but the most common identification strategies include sentiment analysis
and word-frequency regressions.
This paper primarily builds on various methods used for pre-processing data to organize
word counts, and borrows heavily from advances in word-frequency regressions. These methods
vary by paper across the source of data examined, as each presents unique challenges to pre-
processing. As methodology is fairly new, most research focuses on predicting prices of various
securities.
Regarding stock price analysis, source data for textual analysis research generally
originates from internet stock boards or firm specific news stories. Among research using
internet stock board postings, there is a consensus that sentiment analysis using language-
processing algorithms provide the best measure of stock price (Antweilier and Frank 2004; Das,
Martinez-Jerez and Tufano 2005). There are a variety of modern day computational linguistics
methods that can measure both intensity and dispersion of sentiment. A particular problem in
classification is understanding which method works best. Recent research suggests a voting
scheme between classifiers reduces the number of false positives and results in higher sentiment
accuracy (Das and Chen 2007). This paper uses those insights by running a 20-fold cross
validation scheme, using two different algorithms, least angle regression and coordinate descent,
in each validation step and choosing the optimal.
5

Research using firm specific news stories is similar in that sentiment analysis algorithms
are used, but because these documents are less polarized, hence making classification harder and
limiting the amount of algorithms available. Most make use of bag of words or n-grams in pre-
processing their words and compare them to pre-defined negative and positive classifications
(Tetlock, Saar-Tsechansky, and Macskassy 2008; Fang and Peress 2009; Engelberg and Parsons
2011). A newer pre-processing strategy, developed in 2008 by Stanford’s Natural Language
Processing group, has recently been used to classify relationships between words in a tree
structure (Engelberg 2008). This method was far too complex and costly for the data I was using,
but applying this methodology could produce a more nuanced result to this research.
In research surrounding management statements to predict earnings reports, pre-defining
words with positive and negative sentiment becomes tougher. Management statements are
generally less sentimentally charged and use financial jargon. Therefore linguistic theory is
heavily employed to measure sentiment (Davis, Piger and Sedor 2012). Recent scrutiny of
existing papers’ pre-defined sentiment classifications is leading to better methods, but none have
emerged as the dominant method (Demers and Vega 2012; Loughran and McDonald 2011). In
this paper, we choose the words most relevant beforehand, looking at descriptive statistics to try
and optimize selection.
Textual analysis of macroeconomic announcements have not been explored in the same
depth, and have generally been used to create treasury or relevant asset price predictions (Hafez
and Xie 2013). However, a small subset of research examining FOMC minutes does exist using
sentiment analysis to predict treasury yields and future interest rate predictions (Lucca and
Trebbi 2009; Danker and Luecke 2005). Most notable among them is a paper authored by
Federal Reserve Bank of New York economists Boukus and Rosenberg (2006), examining the
6

relationship between themes in FOMC minutes and US Treasury yields. Using a process called
latent semantic analysis, they categorize minutes into characteristic themes. Their methodology
is mathematically complex, and applying it to FOMC transcripts would be a possibility to further
explore. Our paper draws inspiration from their use of the Porter algorithm to stem words.
This paper adds to the existing corpus by considering FOMC transcript data. Most
literature surrounds private sector documents, and even analysis of Federal Reserve documents is
limited to FOMC minutes. Transcripts represent a considerably larger dataset. While meeting
minutes average 8-10 pages in length, transcripts from meetings and conference calls average
nearly 100 pages each and represent direct communication from economic actors, rather than
edited statements. Through this paper, we attempt to more directly understand the motivations
behind changes in the federal funds rate. Further research could endeavor the compare this
paper’s results with purely quantitative measures to understand which is a better predictor of
changes in federal funds rate.
III. Model, Methods and Data

III.I Model and Methodology
The method we use to examine the relationship between our word count data and the
change in federal funds rate is a Lasso linear model, using least-squares penalty trained with an
L1 prior, with iterative fitting along a regularization path to derive our regression coefficients. I
add several nuances to this general method, including cross validation for selection of the
regularization parameter and different parameter fitting formulas.
Our final equation, shown below, predicts a change in federal funds rate given a vector of
relevant word frequencies. 𝑓" 𝑥 represents the basis point change of the federal funds rate. It is
7

defined by a coefficient matrix 𝜃, which this paper estimates. We will determine two functions,
running our regression twice, separately considering decreases in federal funds rate and increases
in federal funds rate. The function takes a vector of relevant word stem frequencies, 𝑥, defined
below. Each j represents a different word stem, xj represents the frequency associated with that
word stem, and bj represents the derived coefficient parameter for that particular word stem:
𝑓" 𝑥 = 𝜃 ∙ 𝑥 = 𝛽( 𝑥(
(
The relevant word stems that we use in our equation are as follows. These were chosen by
compiling descriptive statistics and using economic intuition on our processed word stem counts
per document:
inflat save continu expect stock profit gain fund

resili househol indic bear distribu custom incom bull
partici oil
d suppli employ confid
t bank forecas price
foreign
p tax stagnat headwin debt wage growth
t unemplo
workfor weak geopoli dramat
d demand labor consum job
y
produc
c risk polici
t strong rate global energi corpor
deficit supplier exchang commod wealth recover condit capit
market lend economi abroad mortgag percent
i lack crisi
We find the value of our coefficient matrix 𝜃 by minimizing the following objective function. 𝜃
is selected as the loss minimizing matrix of all possible coefficient matrix values 𝜃.
1
𝜃 = 𝑎𝑟𝑔𝑚𝑖𝑛" 𝐿𝑜𝑠𝑠 𝑦5 , 𝑓" 𝑥5 + 𝜆𝑅:;<<= (𝜃)
𝑛
5
8

In our loss function, 𝑦5 corresponds to the change in federal funds rate a particular row of
data i. 𝑓" 𝑥5 is our estimate of the change in federal funds rate, defined by the coefficient matrix
we are testing 𝜃, given the input 𝑥5 representing the word frequencies for row i of our data.
𝐿𝑜𝑠𝑠 𝑦5 , 𝑓" 𝑥5 = (𝑦5 − 𝑓" 𝑥5 )A
Our objective function to minimize is comprised of two parts: least squares, shown
above, and the L1 prior as a regularizer, shown below. Least squares minimizes in typical
fashion, by fitting with more sensitivity to outliers as the squared term emphasizes larger
deviations. The L1 prior is used to avoid overfitting, and is particularly useful in estimating
sparse coefficients.
𝑅:;<<= 𝜃 = 𝜃(
(
We choose an L1 prior because has the ability to set coefficients to 0 if the word stem is
not statistically significant in determining change in federal funds rate. We use it to select only
the most informative word stems, giving our regression the ability to exclude unnecessary terms.
As shown above, it is calculated by adding the sum of the absolute value of the selected weights
corresponding to relevant word 𝑗. Because this is added to our loss function, our estimation
method allows us to remove weights.
The Lasso estimate therefore solves the minimization of the least squares penalty with
𝜆𝑅:;<<= (𝜃) added to it, where 𝜆 is the regularization parameter, a constant defining the weight of
the L1 prior. As the Lasso estimate does not have an analytical solution, we must use numerical
methods to approximate. Our software package, scikit learn, uses both coordinate descent and
9

least angle regression algorithms to fit the coefficient matrix 𝜃, and chooses the best estimate as
𝜃.
Lastly, we must find the proper value to assign our regularization parameter 𝜆 that
determines what weight we give to our L1 prior. To compute the optimal value, we use a 20-fold
cross a validation scheme on our training data. This means we randomly split our original
training set into a new training set and validation set with a 70-30 split. We minimize our
objective function on the new training set for a particular value of 𝜆 using both coordinate
descent and least angle regression. We then test our coefficient matrix on the validation set with
that 𝜆, graph the result and repeat the process until we have generated a curve of mean square
error over multiple values of 𝜆. We repeat this process 20 times, selecting the value that
minimizes our averaged error across our curves.
A possible complication in running this pipeline could be rank deficiency in our samples.
In the case of encountering linear dependence or numerical instability due to rounding in our
sample matrix, our coordinate descent and least angle regression fitting methods will result in
two dramatically different answers. Fortunately, our scikit learn package automatically exclude
the row of data if we encounter this problem.
Although incorporating regularization is intended as a robustness check, a potential
weakness in using this methodology could be selection bias. Perhaps there are other word stems
that are more critical to determining a change in federal funds rate that I have not included. To
ensure the stems that are selected do contribute to determining basis point change, I test
individual significance with a t-statistic on each selected coefficient. Further research could use
our methodology but explore changing the relevant words. An interesting area of research could
examine and improve the relevant word selection process.

10

In addition to our regressions, we perform a textual analysis by compiling various
summary statistics. This includes averages of word counts grouped by associated interest rate,
sample size grouped by interest rate, and overall orderings of word stems by the percentage of
the transcript that they make up. We plot the terms with most variability in a time series to
examine how the informational content of transcripts changes during business cycle. We use this
analysis to gain insight on our regression results.
III.II Data
The underlying data used in this paper consists of transcripts from FOMC meetings,
changes in the federal funds rate, and corresponding unemployment, real GDP, and inflation
data. All the data was scraped using BeautifulSoup, requests and os python packages, and
heavily pre-processed before applying it to the above model. The data collection and
transforming process, final processed documents, results of textual analysis and python
implementation of the above regression are open-sourced and available for further reference
here.
III.II.I Independent Variables
Our FOMC transcripts containing our independent variables, relevant word counts, were
taken from the Federal Reserve website. We conducted our study on transcripts from 1982 –
2008, as pre-1982 the federal reserve did not issue a target federal funds rate, and post 2008 the
federal reserve switched to a target range. In total, we scraped 660MB, amounting to 230
documents or 18,741 pages of transcript data. This included all communication of FOMC
members in conference calls before meetings and during the meetings. Each document averaged
81 pages long.
11

We pre-processed each document by the following procedure. We first removed stripped
all the cover pages. We then passed each PDF into a text converter, using the PyPDF2 python
package. We then removed extremely common functionally-neutral words that do not contribute
to our analysis, commonly referred to in machine learning literature as stop words. This includes
pronouns, articles, conjunctions and prepositions such as “I”, “the”, “he”, etc. The remaining
terms are further processed by using the Porter algorithm to stem. We remove the suffixes such
that words that share the same etymological root are mapped to the same stem. For example, our
stemmer maps the terms increase, increased, increases and increasing to the stem increas.
Finally, we count the relevant word stems in each document and weight them using term-
frequency inverse document-frequency (tf-idf).
We choose tf-idf weighting as it is the most common, although a variety of different
weighting schemes are available as noted by Sebastiani (2002). Under tf-idf, the more a term
appears across all documents, the less important it is if it appears in a particular document. For
example, if the word “president” is common among all of the FOMC transcripts, while “labor” is
not, but in a particular document “president” is mentioned only slightly more than “labor”, tf-idf
weighting will correctly reweight the counts such that “labor” is given far more significance in
that particular document.
The tf-idf weighting of each term is done in accordance with the below equation. 𝑡𝑓 𝑡, 𝑑
is the term frequency of the term 𝑡 in document 𝑑. It is multiplied by 𝑖𝑑𝑓(𝑡), which roughly
represents the inverse of the total amount of mentions of the term 𝑡 across all documents. In our
paper, we use the following formula, with 𝑛E representing the total number of documents and
𝑑𝑓 𝑡 representing the number of documents that contain term 𝑡.
12

𝑡𝑓 − 𝑖𝑑𝑓 𝑡, 𝑑 = 𝑡𝑓 𝑡, 𝑑 ∗ 𝑖𝑑𝑓(𝑡)
1 + 𝑛E
𝑖𝑑𝑓 𝑡 = log + 1
1 + 𝑑𝑓 𝑡
We then normalize the resulting word vectors for each document.
𝑡𝑓 − 𝑖𝑑𝑓 𝑡K , 𝑑
𝑡𝑓 − 𝑖𝑑𝑓 𝑡L , 𝑑
𝑣(𝑑) =
…
𝑡𝑓 − 𝑖𝑑𝑓 𝑡N , 𝑑
𝑣(𝑑) 𝑣(𝑑)
𝑓𝑖𝑛𝑎𝑙𝑊𝑜𝑟𝑑𝑆𝑡𝑒𝑚𝐶𝑜𝑢𝑛𝑡𝑠 𝑑 = =
𝑣(𝑑) 𝑣KA + 𝑣LA + ⋯ + 𝑣NA
Finally, we then group together all the final word stem counts in all documents leading up to
the meeting before a change in federal funds rate and all the documents leading up to the meeting
prior if during that prior meeting the FOMC voted to maintain interest rates. We add each term in
the above normalized matrix together, and renormalize the vector. That final vector is associated
with the appropriate change in federal funds rate.
III.II.II Outcome Variables
Our outcome data for the federal funds rate was taken directly from the Federal Reserve
Bank of St. Louis’ FRED portal. In addition to our outcome variables, we collected gross
domestic product, unemployment, and inflation rate data correlated to changes in federal funds
rate for cursory analysis and initial exploration. Our gross domestic product data was from the
US Bureau of Economic Analysis. Unemployment and inflation rate calculated using the
consumer price index were taken from the US Bureau of Labor Statistics.
13

We can identify potential shortcomings in the collection and pre-processing of our data. Our
PDF to text is highly accurate upon cursory observations, but without manual inspection it is
impossible to confirm. Lots of time was spent exploring different packages, many were too
messy and obfuscated results. In addition, we are exploring single choices of words (in machine
learning parlance, Bag of Words encoding). Our analysis might be improved by using n-gram
encoding, where one examines the existence of combinations of words. For example, if choosing
bi-gram encoding, we would count every combination of two words and continue our analysis.
This method might retain valuable context and give different results, which further research
could explore.
IV. Results and Discussion

We explore the relationship between the change in federal funds rate and words said
during FOMC meetings by fitting our data to a Lasso linear model, using least-squares penalty
trained with an L1 prior. Our independent variables are frequencies of relevant words stems,
which we pre-selected based on descriptive statistics and economic intuition. Our outcome
variable was basis point change in federal funds rate. When solving for the relevant estimators,
we separated our sample by considering negative basis point changes (meetings that preceded a
reduction in federal funds rate) separately from positive basis point changes (meetings that
preceded an increase in federal funds rate). For each regression, we split our relevant data at
random into training and test samples at a roughly 75 – 25 ratio respectively. We fitted our
model using coordinate descent and least angle regression on our training data, and evaluated the
accuracy of our model on the test set.
14

Estimators for Reduction in Federal Funds Rate
Word Stem Coefficient T-Statistic STD

bank -0.0004629 2.747 e-05 89.16
price 0.00048339011 3.085e-05 82.92
rate -5.825e-05 -2.641e-06 116.72
percent 0.0001378 4.491e-06 162.35
Average Relevant Word STD: 35.26
Root Mean Squared Error: 0.17295
Here we see our regression results for estimators determining a reduction in federal funds rate.
Of the 64 relevant words, our regression used four as determinants of a change in federal funds
rate. However, as our t-statistics show, none are statistically significant and our root mean
squared error on our test set is relatively high. We do note that the relevant variables left by our
L1 prior do have a significantly higher standard deviation than the other relevant words.
Estimators for Increase in Federal Funds Rate
Word Stem Coefficient T-Statistic STD

polici 7.766e-05 6.212e-06 76.04
inflat 2.626e-06 1.217e-07 131.27
percent -5.544e-05 -2.165e-06 155.76
Average Relevant Word STD: 33.58
Root Mean Squared Error: 0.31653
In our regression results for estimators determining an increase in federal funds rate, we see a
similar result. Our regression used three relevant words as determinants of a change in federal
funds rate. Again, our t-statistics show that none of these variables are statistically significant.
Again, our model selected variables with a notably larger standard deviation than other relevant
words. We now use descriptive statistics and textual analysis of word frequencies to gain a
deeper understanding our regression results.
15

The below graph gives us an idea of how our training data is distributed across outcome
variables. We see that in our sample, reductions in federal funds rate succeeding FOMC
meetings are rarely outside 0.5 or 0.25 basis points. We are therefore limited in the number of
outcome variables considered in our reduction analysis. This suggests that our model did not
have enough variability in outcome data to create significant estimators for our word frequencies.
Figure 1: Variability in Words per Transcript Grouped by Interest Rate, corresponding to our reduction in federal funds rate
analysis.
We again see a similar result when considering increases in federal funds in the graph on
the following page. In our sample, increases in federal funds rate succeeding FOMC meetings
are almost entirely 0.5 basis points. In fitting our regression, regardless of our input, our model
would optimally predict a change of approximately 0.5 basis points as we lack variability in
outcome data. We now take a closer look at our independent variables, word frequencies, and
how their distribution within transcripts changes over time.
16

Figure 2: Variability in Words per Transcript Grouped by Interest Rate, corresponding to our increase in federal funds rate
analysis.
Figure 3: A selection of relevant words and their change in proportion of transcript over time. For reduction in rate analysis.
In the above graph, we observe several word stems and the relative proportions that they
take in FOMC transcript. Although we have 64 relevant word stems, we show the stems with the
most variability above. Upon close examination, we see that the differences in these proportions
are variable. Of all relevant word stems, ‘inflat’, ‘bank’, ‘growth’ show the most variability. This
17

suggests that in our reduction analysis, the primary cause of statistical insignificance regarding
the estimators for model is related to lack of variability of our outcome variables.
Figure 4: A selection of relevant words and their change in proportion of transcript over time. For increase in rate analysis.
In our analysis of increases in federal funds rate, we see a different result. We can
observe that the differences in proportions are surprisingly constant with a few notable outliers,
including ‘polici, ‘inflat, ‘percent’, which our model selected as determinants. This suggests that
in our analysis of increases in federal funds rate, the lack of variation in our data for independent
variables also contributed to statistical insignificance of our estimators. We also note that
increases and decreases in federal funds rate are concentrated around certain periods, as all data
points occur within close time proximity to each other. This reflects economic intuition, as
changes in federal funds rate cluster in accordance with business cycle patterns.
Finally, we will make general observations about our data by examining proportions of
word stems across federal funds rate changes.
18

-0.25 -0.50 0.5 0.25 0.125
1.054 rate 1.037 rate 0.865 rate 0.961 rate 0.996 rate
0.715 market 0.761 market 0.661 inflat 0.838 price 0.887 percent
0.527 growth 0.469 percent 0.612 market 0.778 inflat 0.5 growth
0.518 percent 0.404 risk 0.55 percent 0.713 market 0.413 market
0.491 inflat 0.402 growth 0.501 polici 0.599 percent 0.375forecast
0.479 price 0.391 economi 0.423 growth 0.568 growth 0.348 price
0.428 polici 0.379 polici 0.371 price 0.48 expect 0.295 economi
0.419 economi 0.365 expect 0.337 economi 0.452 polici 0.26 inflat
0.411 expect 0.359forecast 0.301 expect 0.385forecast 0.203 expect
0.405forecast 0.35 bank 0.293forecast 0.324 economi 0.194 fund
0.383 risk 0.341 fund 0.293 continu 0.307 risk 0.188 polici
0.313 fund 0.308 inflat 0.266 bank 0.3 continu 0.173 bank
0.27 bank 0.279 price 0.258 risk 0.254 fund 0.166 continu
0.265 continu 0.221 continu 0.213 fund 0.192 demand 0.15 risk
0.154 weak 0.175 weak 0.179 strong 0.171 labor 0.128 strong
Figure 5: Average appearances of word stem in percentage terms, grouped by change in federal funds rate
In the above table, we observe the average percent that a word stem appears in a FOMC
transcript, grouped by the corresponding change in federal funds rate. We note the similarities in
proportions and differences between proportions across our outcome variables. Similar word
stems, regardless of decrease or increase in federal funds rate, appear in our transcript samples
with nearly no difference in ordering. Although we use this table to make any inferences
regarding our regression, as statistical significance relies on a variety of factors including
variability of word stems that this table does not include, this does suggest the topics covered in
FOMC meetings, regardless of basis point change, remain relatively constant over time. This
initial exploration informed our decision of relevant word stems, as we added the stems with the
highest proportion to our initial list generated using economic intuition, assuming that variability
in the word stems that appear the most might play a role in determining rate changes.
19

Our regression results and textual analysis reveal that the words said during the content of
FOMC meetings are statistically insignificant in determining federal funds rate changes. The
implications of these results provide insight into the structure of FOMC meetings. As we observe
the same words being used in similar frequencies, regardless of the outcome of the meeting, this
suggests that FOMC meetings in this period could be more structured than we previously
assumed, with discussion centering around the same topics. Rather than help, more transcript
data simply showed similar word frequencies and outcomes, instead of uncovering subtleties that
could signal oncoming changes in federal funds rate.
In addition, statistical insignificance of relevant word stems could suggest that changes in
federal funds rate may be driven by internal bias or another unobserved factor inherent to FOMC
members’ backgrounds. Although our research assumed that transcripts would reveal these
motivations, further research on the backgrounds of FOMC members’ and their previous voting
history could prove useful in understanding these factors and their significance in determining
federal funds rate changes.
Alternative explanations of the statistical insignificance calculated in our regression
results could include complications in our data collection. The period considered, 1982 – 2008, is
unique in that Alan Greenspan presided as the Chairman of the Federal Reserve throughout most
of this period (1987-). This could influence the discussion during meetings to be similar, and
hence the content of FOMC transcripts to appear similar in word count, as staff and other
members’ could be like minded and therefore inclined to express similar opinions throughout
this period, or meetings could be structured in a consistent fashion, covering the same topics
without much deviation. In addition, as the economic climate was relatively favorable during this
period, many meetings that resulted in a change in federal funds rate were only associated with a
20

single transcript. Further research could increase the sample size by considering a larger range of
years, under different chairmen or chairwoman. In particular, including the Volcker years could
increase the number of different changes in federal funds rate, providing our model with more
variable outcome data.
The scope research has been limited to FOMC transcript data and its relationship to
changes in federal funds rate. As changes in federal funds rate are not as variable as other
economic indicators, we cannot take the statistical insignificance that we calculated and
extrapolate it to other economic indicators. Our results also do not suggest that other FOMC texts
are insignificant in determining federal funds rate. Other research has shown high levels of
autocorrelation between themes in FOMC meeting minutes, treasury yields and federal funds
rates (Rosa 2013; Neely and Mizrach 2008).
Ultimately, our findings do not support our thesis that textual analysis of transcripts of
FOMC meetings can reveal the primary concerns that cause the Federal Reserve to change the
target federal funds rate. As none of our relevant words were statistically significant in predicting
either an increase or decrease in federal funds rate in our sample, we conclude that the primary
factors in determining these changes are not revealed during the meetings we observed. Instead,
through textual analysis, we uncover that the same general themes are discussed during all
meetings in our sample.
Related research analyzing the content of FOMC minutes suggests a higher correlation
between federal funds rates and prepared FOMC remarks. This shows that contrary to intuition
and despite transcripts having nearly ten times as many words as meeting minutes, prepared
remarks summarizing the meeting after making a rate change decision differ more in tone and
theme (Boukus and Rosenberg 2006). Our findings suggest that prepared post-decision remarks
21

may be a better indicator of economic variables such as federal funds rate, rather than the
meetings themselves that discuss broader aspects of the economy and lead to the decision.
V. Conclusion
Understanding how economic indicators move with regards to FOMC communications
requires mapping complex, qualitative information into quantitative measures. Related research
examining FOMC texts have either centered on meeting minutes, which represent a significantly
smaller sample of data, or relied on indicator variables determined by manual analysis of
documents.
In this paper, we apply pre-processing methods and transformations that lead to statistical
analysis using machine learning methodology to interpret FOMC transcripts. Our model captures
speech content, translates it into related word stem counts, and associates it with the appropriate
change in federal funds rate. We interpret the impact of different frequencies of word stems by
fitting our Lasso linear model using coordinate descent and least angle regression, performing 20
cross validation steps to determine our regularization parameter for our L1 prior, giving us the
ability to exclude words that are insignificant determinants of federal funds rate.
After separating our regressions into positive and negative changes in federal funds rate,
we discover no statistical significance. Limited by a lack of variability in our word stem
frequency and changes in federal funds rate, our model cannot determine consistent estimators.
However our textual analysis reveals insights on the relative frequencies of topics discussed
during FOMC meetings and their change over time. Our research suggests that FOMC meetings
are consistent, covering similar topics regardless of associated basis point change. Our evidence
directs market participants to other forms of data or more comprehensive data collection,
22

suggesting that motivations behind changing the federal funds rate run deeper than the
information content of meeting transcripts.
23

Bibliography
Antweiler, Werner, and Murray Z. Frank. "Is all that talk just noise? The information content of
internet stock message boards." The Journal of Finance 59.3 (2004): 1259-1294.
Boukus, Ellyn, and Joshua V. Rosenberg. "The information content of FOMC minutes." (2006).
Danker, Deborah J., and Matthew M. Luecke. "Background on FOMC meeting minutes." Fed.
Res. Bull. 91 (2005): 175.
Das, Sanjiv, Asís Martínez-Jerez, and Peter Tufano. "eInformation: A clinical study of investor
discussion and sentiment." Financial Management 34.3 (2005): 103-137.
Das, Sanjiv R., and Mike Y. Chen. "Yahoo! for Amazon: Sentiment extraction from small talk
on the web." Management science 53.9 (2007): 1375-1388.
Davis, Angela K., Jeremy M. Piger, and Lisa M. Sedor. "Beyond the numbers: Measuring the
information content of earnings press release language." Contemporary Accounting
Research 29.3 (2012): 845-868.
Demers, Elizabeth A., and Clara Vega. "Textual content in earnings press releases: News or
noise." SSRN eLibrary (2012).
Engelberg, Joseph. "Costly information processing: Evidence from earnings announcements."

(2008).
Engelberg, Joseph E., and Christopher A. Parsons. "The causal impact of media in financial
markets." The Journal of Finance 66.1 (2011): 67-97.
Fang, Lily, and Joel Peress. "Media coverage and the cross-section of stock returns." The Journal
of Finance 64.5 (2009): 2023-2052.
Hafez, Peter Ager, and Junqiang Xie. "Intraday Forex Trading Based on Sentiment Inflection
Points." (2013).
Loughran, Tim, and Bill McDonald. "When is a liability not a liability? Textual analysis,
dictionaries, and 10-Ks." The Journal of Finance 66.1 (2011): 35-65.
Lucca, David O., and Francesco Trebbi. Measuring central bank communication: an automated
approach with application to FOMC statements. No. w15367. National Bureau of Economic
Research, 2009.
Mizrach, Bruce, and Christopher J. Neely. "Information shares in the US Treasury

market." Journal of Banking & Finance32.7 (2008): 1221-1233.
24

Rosa, Carlo. "The financial market effect of FOMC minutes." (2013).
Sebastiani, Fabrizio. "Machine learning in automated text categorization." ACM computing

surveys (CSUR) 34.1 (2002): 1-47.
Tetlock, Paul C., Maytal Saar-Tsechansky, and Sofus Macskassy. "More than words:
Quantifying language to measure firms' fundamentals." The Journal of Finance 63.3 (2008):
1437-1467.
25

Finalpaper PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Finalpaper PDF

Uploaded by

Copyright:

Available Formats

The informational content of FOMC

different economic outcomes.

data from these policymakers.

changes in interest rate policy.

words with the relevant change in federal funds rate.

possibilities with few subtleties revealing voting decision.

significance. Section V concludes.

II. Literature Review

scheduled public announcements with easily quantifiable impact. Methodology centered on

variations of difference-in-differences analysis or manual interpretation of texts. As a result,

published textual analysis research has skyrocketed.

and word-frequency regressions.

in each validation step and choosing the optimal.

2011). A newer pre-processing strategy, developed in 2008 by Stanford’s Natural Language

In research surrounding management statements to predict earnings reports, pre-defining

and optimize selection.

is mathematically complex, and applying it to FOMC transcripts would be a possibility to further

changes in federal funds rate.

III. Model, Methods and Data

regularization parameter and different parameter fitting formulas.

inflat save continu expect stock profit gain fund

𝐿𝑜𝑠𝑠 𝑦5 , 𝑓" 𝑥5 = (𝑦5 − 𝑓" 𝑥5 )A

method allows us to remove weights.

minimizes our averaged error across our curves.

the row of data if we encounter this problem.

Although incorporating regularization is intended as a robustness check, a potential

examine and improve the relevant word selection process.

analysis to gain insight on our regression results.

III.II.I Independent Variables

frequency inverse document-frequency (tf-idf).

We choose tf-idf weighting as it is the most common, although a variety of different

that particular document.

𝑑𝑓 𝑡 representing the number of documents that contain term 𝑡.

We then normalize the resulting word vectors for each document.

with the appropriate change in federal funds rate.

III.II.II Outcome Variables

IV. Results and Discussion

accuracy of our model on the test set.

Word Stem Coefficient T-Statistic STD

Average Relevant Word STD: 35.26

Root Mean Squared Error: 0.17295

Estimators for Increase in Federal Funds Rate

Word Stem Coefficient T-Statistic STD

Average Relevant Word STD: 33.58

Root Mean Squared Error: 0.31653

deeper understanding our regression results.

how their distribution within transcripts changes over time.

word stems across federal funds rate changes.

regarding our regression, as statistical significance relies on a variety of factors including

could signal oncoming changes in federal funds rate.

federal funds rate changes.

Alternative explanations of the statistical insignificance calculated in our regression

variable outcome data.

rates (Rosa 2013; Neely and Mizrach 2008).

meetings in our sample.

smaller sample of data, or relied on indicator variables determined by manual analysis of

we discover no statistical significance. Limited by a lack of variability in our word stem

information content of meeting transcripts.

Engelberg, Joseph. "Costly information processing: Evidence from earnings announcements."

Mizrach, Bruce, and Christopher J. Neely. "Information shares in the US Treasury

Sebastiani, Fabrizio. "Machine learning in automated text categorization." ACM computing

You might also like