The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

Researchers using machine learning on corporate disclosures rely mostly on hand-crafted sentiment indices and conventional supervised models rather than embedding-based or end-to-end deep learning, while studies are concentrated geographically and lack common benchmarks — a fragmentation that constrains robust conclusions about how disclosure tone predicts firm financial outcomes.

Machine Learning for Sentiment-Based Corporate Disclosure Analytics: A Systematic Review of Data, Sentiment Representations, and Predictive Models
Ramon Abilio, Guilherme Palermo Coelho, Ana Estela Antunes da Silva · March 12, 2026
openalex review_meta n/a evidence 7/10 relevance DOI Source PDF
This PRISMA systematic review of 42 studies (2014–2025) finds that sentiment-based corporate disclosure analytics is dominated by lexicon- and sentence-level engineered sentiment indices and regression/supervised models, with limited use of embedding-based representations, end-to-end deep learning, long-document architectures, labeled datasets, and shared benchmarking across studies.

<title>Abstract</title> Machine learning methods have been widely used to predict stock prices using technical indicators and sentiment features, mostly extracted from social media and news. However, less attention has been given to how sentiment-based textual features obtained from corporate reports are integrated into machine learning pipelines to predict firms' financial outcomes. To examine this issue, we conducted a systematic review of 42 studies published between 2014 and 2025. The review examines how datasets are constructed, how sentiment representations are defined, and how predictive models combine textual features with financial variables. Most studies focus on the U.S. stock market and rely on feature-engineered sentiment indices derived from lexicons or sentence-level classification. Regression-based and other supervised learning approaches remain dominant, while embedding-based representations and end-to-end deep learning architectures appear only sporadically. The literature also reveals constraints, including challenges in processing long financial documents, limited availability of labeled datasets, and strong geographic and linguistic concentration. In addition, the review identifies highly heterogeneous modeling approaches with limited convergence toward shared benchmark tasks. These findings highlight research opportunities for machine learning applications in finance and for the development of sentiment-based corporate disclosure analytics.

Summary

Main Finding

Sentiment-based textual features extracted from corporate reports are underutilized and unevenly integrated into machine learning pipelines for predicting firm financial outcomes. Across 42 studies (2014–2025), most work relies on engineered sentiment indices (lexicons or sentence-level labels) combined with traditional supervised models, while embedding-based representations and end-to-end deep learning are rare. The literature is U.S.-centric, constrained by long-document processing, scarce labeled data, and a lack of shared benchmark tasks, producing heterogeneous approaches and limited convergence.

Key Points

  • Scope: Systematic review of 42 studies published 2014–2025 investigating how corporate-report sentiment is used to predict financial outcomes.
  • Geographic/language concentration: Predominantly focused on the U.S. stock market and English-language disclosures.
  • Sentiment representations:
    • Dominant: Feature-engineered sentiment indices (lexicon counts, polarity scores) and sentence-level classification aggregated into features.
    • Rare: Embedding-based representations and end-to-end deep learning architectures.
  • Modeling approaches:
    • Predominant use of regression and other supervised learning (e.g., tree-based models).
    • Limited use of deep learning architectures that directly consume raw text.
  • Data issues and constraints:
    • Difficulty processing long financial documents (length, structure, and noise).
    • Limited availability of labeled datasets for supervised NLP tasks in corporate disclosures.
    • Heterogeneous datasets and tasks with few shared benchmarks, hindering comparability.
  • Resulting landscape: Fragmented methods with weak standardization and limited methodological convergence.

Data & Methods

  • Review design: Systematic literature review examining dataset construction, sentiment representation choices, and how textual features are integrated with financial variables for prediction.
  • Sample: 42 empirical studies (2014–2025).
  • Typical data sources in reviewed studies:
    • Corporate filings and reports (e.g., 10-Ks, annual reports), investor presentations, MD&A text.
    • Financial variables (returns, volatility, accounting outcomes) used alongside textual features.
  • Feature engineering practices:
    • Lexicon-based scores (e.g., positive/negative word counts, bag-of-words polarity).
    • Sentence-level classifiers producing aggregated sentiment indices.
    • Sparse adoption of modern text embeddings (word/sentence/document vectors) and transformer-based encodings.
  • Modeling pipelines:
    • Most pipelines combine engineered textual features with numeric financial covariates and feed these to regressions or tree-based models.
    • Few studies employ end-to-end models that jointly learn text representations and predictive mappings.
  • Methodological gaps identified:
    • Few labeled corpora for disclosure sentiment and outcome-specific annotation.
    • Limited treatment of document structure (hierarchical/sectional modeling), temporal alignment, and cross-firm generalization.
    • Absence of widely adopted benchmark tasks, metrics, and datasets for disclosure-driven prediction.

Implications for AI Economics

  • Research priorities:
    • Create labeled, multilingual corpora and widely shared benchmark tasks for corporate-disclosure sentiment and outcome prediction to enable comparability and reproducibility.
    • Develop methods for long-document processing tailored to financial reports (hierarchical transformers, selective attention/summary-first pipelines, retrieval-augmented models).
    • Explore and evaluate embedding-based and end-to-end architectures versus engineered features, including robustness and interpretability.
  • Practical implications:
    • Better textual-sentiment integration could improve short- and medium-term firm outcome forecasting, risk assessment, and investor decision-support tools.
    • Tools that reliably extract disclosure sentiment could assist regulators and auditors in monitoring disclosure quality and detecting misreporting or risk signals.
  • Policy and market-structure considerations:
    • U.S.-centric evidence limits generalizability—cross-country and multi-language work is needed to assess market-structure-dependent effects.
    • Standardized evaluation protocols would help quantify the marginal value of disclosure sentiment over traditional financial covariates.
  • Broader AI-economics opportunities:
    • Combining causal inference with NLP to separate informative disclosure signals from managerial tone manipulation.
    • Investigating how disclosure-driven sentiment interacts with market efficiency, investor attention, and algorithmic trading strategies.
    • Examining fairness, manipulation risk, and regulatory implications of deploying automated disclosure-analytics in investment and compliance workflows.

Assessment

Paper Typereview_meta Evidence Strengthn/a — This is a systematic review synthesizing prior empirical studies rather than producing new causal estimates; it does not itself implement an identification strategy for causal inference. Methods Rigormedium — The review follows a PRISMA protocol, searches six major databases, and extracts detailed metadata (datasets, sentiment representations, inputs/outputs), which indicates systematic procedures; however, the search strategy and eligibility choices introduce limitations (English-only, specific keyword formulation that excluded some common phrases, potential database coverage gaps), and study heterogeneity limits the ability to aggregate findings. SampleSystematic review of 42 primary studies published 2014–2025, identified from 1,293 retrieved records (1,064 after deduplication) across six information sources (ACL Anthology, ACM Digital Library, IEEE Xplore, ScienceDirect, Scopus, Web of Science); extracted metadata include document types (annual/quarterly reports, MD&A, CEO letters, earnings press releases, transcripts), market coverage (mostly U.S., some China, Vietnam, India, Japan), 404 input variables, 43 financial targets, and 1,006 input–output modeling relationships documented; analysis limited to English-language, peer-reviewed articles and conference papers and excludes studies focused on social media/news/analyst reports or stock-price prediction. Themesinnovation adoption governance GeneralizabilityRestricted to corporate disclosures (annual/quarterly reports, MD&A, press releases, transcripts) and publicly listed firms — does not cover social media, analyst reports, or news-based sentiment, English-language only — may miss substantial non-English literature, Strong geographic concentration (many studies focus on the U.S.; some on China and other single countries) limiting cross-country generalization, Temporal scope limited to studies published 2014–2025 and datasets used by those studies — newer techniques or datasets post-2025 are not covered, Heterogeneous modeling tasks, targets, and evaluation practices across included studies impede meta-analytic aggregation and transferability, Potential publication and selection bias: limited databases and keyword choices could omit relevant work (e.g., papers using different terminology)

Claims (10)

ClaimDirectionConfidenceOutcomeDetails
Machine learning methods have been widely used to predict stock prices using technical indicators and sentiment features, mostly extracted from social media and news. Other positive medium stock price prediction
n=42
many ML studies predict stock prices using technical indicators and sentiment (social media/news)
0.02
Less attention has been given to how sentiment-based textual features obtained from corporate reports are integrated into machine learning pipelines to predict firms' financial outcomes. Other negative medium prediction of firms' financial outcomes (e.g., stock returns, earnings)
n=42
few studies integrate corporate-report sentiment into ML pipelines for firm financial outcomes
0.02
We conducted a systematic review of 42 studies published between 2014 and 2025. Other positive high characteristics of the reviewed study corpus (number and date-range of studies)
n=42
systematic review sample size: 42 studies (2014-2025)
0.04
Most studies focus on the U.S. stock market. Other positive medium geographic focus of empirical studies (U.S. market prevalence)
n=42
majority of reviewed studies focus on the U.S. stock market
0.02
The reviewed studies rely on feature-engineered sentiment indices derived from lexicons or sentence-level classification. Other positive medium type of sentiment representation used (lexicon-based indices, sentence-level classification)
n=42
reviewed studies commonly use lexicon-based or sentence-level sentiment features
0.02
Regression-based and other supervised learning approaches remain dominant. Other positive medium modeling approach prevalence (regression / supervised learning)
n=42
regression-based and other supervised learning approaches dominate the literature
0.02
Embedding-based representations and end-to-end deep learning architectures appear only sporadically. Other negative medium use of embedding representations and end-to-end deep learning
n=42
embedding-based representations and end-to-end deep learning appear only sporadically
0.02
The literature reveals constraints, including challenges in processing long financial documents, limited availability of labeled datasets, and strong geographic and linguistic concentration. Other negative medium reported methodological and data limitations (document processing difficulty, dataset labeling scarcity, geographic/linguistic concentration)
n=42
literature reports constraints: long-document processing challenges, limited labeled datasets, geographic/linguistic concentration
0.02
The review identifies highly heterogeneous modeling approaches with limited convergence toward shared benchmark tasks. Other negative medium degree of methodological heterogeneity and convergence on benchmark tasks
n=42
high methodological heterogeneity and limited convergence on shared benchmark tasks
0.02
These findings highlight research opportunities for machine learning applications in finance and for the development of sentiment-based corporate disclosure analytics. Other positive medium identification of research opportunities and directions (not an empirical outcome)
identified research opportunities for ML in finance and sentiment-based corporate disclosure analytics
0.02

Notes