Researchers using machine learning on corporate disclosures rely mostly on hand-crafted sentiment indices and conventional supervised models rather than embedding-based or end-to-end deep learning, while studies are concentrated geographically and lack common benchmarks — a fragmentation that constrains robust conclusions about how disclosure tone predicts firm financial outcomes.
<title>Abstract</title> Machine learning methods have been widely used to predict stock prices using technical indicators and sentiment features, mostly extracted from social media and news. However, less attention has been given to how sentiment-based textual features obtained from corporate reports are integrated into machine learning pipelines to predict firms' financial outcomes. To examine this issue, we conducted a systematic review of 42 studies published between 2014 and 2025. The review examines how datasets are constructed, how sentiment representations are defined, and how predictive models combine textual features with financial variables. Most studies focus on the U.S. stock market and rely on feature-engineered sentiment indices derived from lexicons or sentence-level classification. Regression-based and other supervised learning approaches remain dominant, while embedding-based representations and end-to-end deep learning architectures appear only sporadically. The literature also reveals constraints, including challenges in processing long financial documents, limited availability of labeled datasets, and strong geographic and linguistic concentration. In addition, the review identifies highly heterogeneous modeling approaches with limited convergence toward shared benchmark tasks. These findings highlight research opportunities for machine learning applications in finance and for the development of sentiment-based corporate disclosure analytics.
Summary
Main Finding
Sentiment-based textual features extracted from corporate reports are underutilized and unevenly integrated into machine learning pipelines for predicting firm financial outcomes. Across 42 studies (2014–2025), most work relies on engineered sentiment indices (lexicons or sentence-level labels) combined with traditional supervised models, while embedding-based representations and end-to-end deep learning are rare. The literature is U.S.-centric, constrained by long-document processing, scarce labeled data, and a lack of shared benchmark tasks, producing heterogeneous approaches and limited convergence.
Key Points
- Scope: Systematic review of 42 studies published 2014–2025 investigating how corporate-report sentiment is used to predict financial outcomes.
- Geographic/language concentration: Predominantly focused on the U.S. stock market and English-language disclosures.
- Sentiment representations:
- Dominant: Feature-engineered sentiment indices (lexicon counts, polarity scores) and sentence-level classification aggregated into features.
- Rare: Embedding-based representations and end-to-end deep learning architectures.
- Modeling approaches:
- Predominant use of regression and other supervised learning (e.g., tree-based models).
- Limited use of deep learning architectures that directly consume raw text.
- Data issues and constraints:
- Difficulty processing long financial documents (length, structure, and noise).
- Limited availability of labeled datasets for supervised NLP tasks in corporate disclosures.
- Heterogeneous datasets and tasks with few shared benchmarks, hindering comparability.
- Resulting landscape: Fragmented methods with weak standardization and limited methodological convergence.
Data & Methods
- Review design: Systematic literature review examining dataset construction, sentiment representation choices, and how textual features are integrated with financial variables for prediction.
- Sample: 42 empirical studies (2014–2025).
- Typical data sources in reviewed studies:
- Corporate filings and reports (e.g., 10-Ks, annual reports), investor presentations, MD&A text.
- Financial variables (returns, volatility, accounting outcomes) used alongside textual features.
- Feature engineering practices:
- Lexicon-based scores (e.g., positive/negative word counts, bag-of-words polarity).
- Sentence-level classifiers producing aggregated sentiment indices.
- Sparse adoption of modern text embeddings (word/sentence/document vectors) and transformer-based encodings.
- Modeling pipelines:
- Most pipelines combine engineered textual features with numeric financial covariates and feed these to regressions or tree-based models.
- Few studies employ end-to-end models that jointly learn text representations and predictive mappings.
- Methodological gaps identified:
- Few labeled corpora for disclosure sentiment and outcome-specific annotation.
- Limited treatment of document structure (hierarchical/sectional modeling), temporal alignment, and cross-firm generalization.
- Absence of widely adopted benchmark tasks, metrics, and datasets for disclosure-driven prediction.
Implications for AI Economics
- Research priorities:
- Create labeled, multilingual corpora and widely shared benchmark tasks for corporate-disclosure sentiment and outcome prediction to enable comparability and reproducibility.
- Develop methods for long-document processing tailored to financial reports (hierarchical transformers, selective attention/summary-first pipelines, retrieval-augmented models).
- Explore and evaluate embedding-based and end-to-end architectures versus engineered features, including robustness and interpretability.
- Practical implications:
- Better textual-sentiment integration could improve short- and medium-term firm outcome forecasting, risk assessment, and investor decision-support tools.
- Tools that reliably extract disclosure sentiment could assist regulators and auditors in monitoring disclosure quality and detecting misreporting or risk signals.
- Policy and market-structure considerations:
- U.S.-centric evidence limits generalizability—cross-country and multi-language work is needed to assess market-structure-dependent effects.
- Standardized evaluation protocols would help quantify the marginal value of disclosure sentiment over traditional financial covariates.
- Broader AI-economics opportunities:
- Combining causal inference with NLP to separate informative disclosure signals from managerial tone manipulation.
- Investigating how disclosure-driven sentiment interacts with market efficiency, investor attention, and algorithmic trading strategies.
- Examining fairness, manipulation risk, and regulatory implications of deploying automated disclosure-analytics in investment and compliance workflows.
Assessment
Claims (10)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Machine learning methods have been widely used to predict stock prices using technical indicators and sentiment features, mostly extracted from social media and news. Other | positive | medium | stock price prediction |
n=42
many ML studies predict stock prices using technical indicators and sentiment (social media/news)
0.02
|
| Less attention has been given to how sentiment-based textual features obtained from corporate reports are integrated into machine learning pipelines to predict firms' financial outcomes. Other | negative | medium | prediction of firms' financial outcomes (e.g., stock returns, earnings) |
n=42
few studies integrate corporate-report sentiment into ML pipelines for firm financial outcomes
0.02
|
| We conducted a systematic review of 42 studies published between 2014 and 2025. Other | positive | high | characteristics of the reviewed study corpus (number and date-range of studies) |
n=42
systematic review sample size: 42 studies (2014-2025)
0.04
|
| Most studies focus on the U.S. stock market. Other | positive | medium | geographic focus of empirical studies (U.S. market prevalence) |
n=42
majority of reviewed studies focus on the U.S. stock market
0.02
|
| The reviewed studies rely on feature-engineered sentiment indices derived from lexicons or sentence-level classification. Other | positive | medium | type of sentiment representation used (lexicon-based indices, sentence-level classification) |
n=42
reviewed studies commonly use lexicon-based or sentence-level sentiment features
0.02
|
| Regression-based and other supervised learning approaches remain dominant. Other | positive | medium | modeling approach prevalence (regression / supervised learning) |
n=42
regression-based and other supervised learning approaches dominate the literature
0.02
|
| Embedding-based representations and end-to-end deep learning architectures appear only sporadically. Other | negative | medium | use of embedding representations and end-to-end deep learning |
n=42
embedding-based representations and end-to-end deep learning appear only sporadically
0.02
|
| The literature reveals constraints, including challenges in processing long financial documents, limited availability of labeled datasets, and strong geographic and linguistic concentration. Other | negative | medium | reported methodological and data limitations (document processing difficulty, dataset labeling scarcity, geographic/linguistic concentration) |
n=42
literature reports constraints: long-document processing challenges, limited labeled datasets, geographic/linguistic concentration
0.02
|
| The review identifies highly heterogeneous modeling approaches with limited convergence toward shared benchmark tasks. Other | negative | medium | degree of methodological heterogeneity and convergence on benchmark tasks |
n=42
high methodological heterogeneity and limited convergence on shared benchmark tasks
0.02
|
| These findings highlight research opportunities for machine learning applications in finance and for the development of sentiment-based corporate disclosure analytics. Other | positive | medium | identification of research opportunities and directions (not an empirical outcome) |
identified research opportunities for ML in finance and sentiment-based corporate disclosure analytics
0.02
|