Can Natural Language Processing Unlock Signals in Central Bank Minutes?
Natural language processing is already reshaping equity research and macro analysis. But can it generate an edge in fixed income markets? Specifically, can algorithms that analyze central bank language help predict the next move in the yield curve?
For fixed income investors, anticipating changes in curve shape is central to duration positioning, curve trades, and key rate exposure. Even incremental improvements in forecasting whether the curve will steepen, flatten, or shift in parallel can affect portfolio outcomes.
Central bank minutes are not just summaries of past decisions. They are structured communications designed to guide expectations. If their language contains systematic patterns that precede particular yield curve movements, then NLP becomes more than a research tool. It becomes a potential source of predictive signal.
This analysis tests that proposition using Brazilian central bank minutes and yield curve data. I trained machine learning classifiers to map textual features to subsequent curve configurations, including parallel shifts, flattenings, steepenings, and other standard forms. The findings suggest that systematic text analysis can improve classification accuracy beyond discretionary interpretation.
How Important Are Yield Curve Movements?
Consider a five-year bond with a $1,000 face value and a 10% annual coupon. At purchase, the yield curve is upward sloping, rising from 15.5% at one year to 17.5% at five years. Discounting the cash flows at those rates produces a present value of $768.64.
One year later, if the yield curve remains unchanged, the bond has four years to maturity but is priced using the same term structure. Under this constant-curve assumption, its value rises to $799.41.
Now assume instead that the yield curve shifts upward in parallel. The bond’s credit risk and cash flows are unchanged, yet higher discount rates reduce its value to $776.62. Relative to the constant-curve scenario, the investor incurs a $22.79 loss solely because the yield curve moved higher.
The implication is straightforward. Bond returns depend not only on credit risk but on changes in the level and shape of the yield curve. Upward shifts hurt bondholders; downward shifts benefit them. The magnitude of the effect depends on maturity exposure, captured by key rate, or partial duration.
Both the literature and the CFA curriculum identify 11 standard yield curve movements, including bear flattening, bear steepening, bull flattening, bull steepening, parallel shifts, and butterfly structures. If these movements can be forecast with reasonable accuracy, investors can adjust duration and curve positioning to improve portfolio outcomes.
Theories and Models of the Yield Curve
A wide range of economic theories and econometric models have attempted to explain and forecast yield curve movements. In Economics, the unbiased expectations theory links the term structure to anticipated future short rates. Liquidity preference and preferred habitat theories introduce risk and term premiums. Segmented market theories emphasize supply and demand dynamics across maturities.
Econometric approaches turned these ideas into mathematical forecasts. Models such as Cox–Ingersoll–Ross (CIR), Vasicek, and later arbitrage-free frameworks attempt to describe the stochastic behavior of interest rates and calibrate the curve to observed market prices. These models focus on the dynamics of rates themselves.
This study takes a different perspective. Rather than modeling interest rate processes directly, it examines whether central bank communication contains measurable signals about subsequent yield curve movements. NLP allows policy minutes to be converted into structured inputs that can be tested statistically.

The Power of NLP
Before AI became widely discussed in public discourse, NLP was already in active development, mostly translating text or fixing spelling and grammar writings. With the power of AI, NLP enables the transformation of unstructured text into structured, analyzable data.
So far, NLP has been applied mostly to economic analysis and equity research. Algorithms can “read” economists’ publications and equity research reports and evaluate whether those narratives were effective in anticipating inflation, GDP growth, or stock price movements.
This research extends NLP’s applications to fixed income markets. I used 4,000 days of Brazilian yield curve data, most with 16 vertices, along with 273 Brazilian central bank minutes (“Atas do COPOM”) available since 2000. The objective is to build a machine learning model that reads each minute, maps the most frequent words, compares it to past minutes, and estimates the probability that the next yield curve movement will be a butterfly, bear flattening, humpback, or another standard configuration.
Empirical Findings from the Brazilian Case Study
The model produced several observable patterns in both market behavior and language structure. These findings illustrate how text-based signals align with subsequent yield curve movements.
Market Structure and Curve Dynamics
First, short-term volatility in the Brazilian fixed income market is higher than long-term volatility. This contrasts with traditional theory and suggests that, in emerging markets, investors react more strongly to short-term news and policy signals. Long-term instruments appear to trade with comparatively lower volatility, reflecting the dominance of institutional investors at longer maturities.
In addition, 84% of daily yield curve movements fall into four of the eleven standard configurations identified in the literature, with parallel upward and parallel downward shifts among the most frequent (also confirming this short term volatility flavor). This concentration highlights the importance of correctly classifying a small set of dominant curve dynamics.
Extracting Signal from Language
To prepare the text data, common words such as “committee,” “scenario,” “billions,” and “prices” were removed as stop words, as they do not contribute to classification. Word frequencies were then mapped for each yield curve movement category, allowing comparison of language patterns across different curve configurations.
Seasonality in Curve Movements
When examining the language associated with specific movements, a seasonal pattern emerged. For example, bear flattening movements were frequently associated with references to August, September, and October, while bull flattening movements were more often linked to January, February, and March. A chi-squared test provided statistical evidence of seasonality across several yield curve movements.
Model Performance
Four classification algorithms were tested: Naïve Bayes, Logistic Regression, and Random Forest (with and without PCA). Model performance was evaluated using Accuracy, F1 score, Cohen’s Kappa, and Log Loss. Random Forest without PCA produced the strongest results. Its predictive accuracy was materially higher than that of discretionary interpretation, indicating that systematic text analysis can extract signal from central bank communication beyond subjective reading of the minutes.
Extensions and Implications
The framework can be extended in several ways. Future work may explore improved class balancing techniques, alternative algorithms such as SVM or XGBoost, cross-validation procedures, or richer language embeddings including Word2Vec and BERT.
While these refinements may enhance predictive performance, the central finding remains: central bank communication contains quantifiable information about subsequent yield curve movements. In markets where policy signals materially influence expectations, systematic text analysis offers a structured complement to discretionary interpretation.
Data science does not replace judgment. It provides a disciplined way to extract meaning from complex and noisy information. The Brazilian case study illustrates how this approach can be applied to fixed income markets.


