The Commonplace
Home Dashboard Papers Evidence Digests 🎲
← Papers

Reconstructed sentiment from sparse AI news reliably precedes stock-price moves by roughly three weeks across multiple pipeline designs; stable, market-relevant sentiment indices require careful aggregation and causal reconstruction rather than only better classifiers.

Causal Reconstruction of Sentiment Signals from Sparse News Data
Stefania Anca Stan, Marzio Lunghi, Vito Vargetto, Cláudio Ricci, Rolands Repetto, Brayden Leo, Shao-Hong Gan · March 24, 2026 · arXiv (Cornell University)
openalex descriptive medium evidence 7/10 relevance Source PDF
The authors develop a three-stage causal reconstruction pipeline that converts sparse probabilistic news-sentiment outputs into stable temporal sentiment series, and show that the reconstructed signals consistently lead firm stock prices by about three weeks across tested configurations.

Sentiment signals derived from sparse news are commonly used in financial analysis and technology monitoring, yet transforming raw article-level observations into reliable temporal series remains a largely unsolved engineering problem. Rather than treating this as a classification challenge, we propose to frame it as a causal signal reconstruction problem: given probabilistic sentiment outputs from a fixed classifier, recover a stable latent sentiment series that is robust to the structural pathologies of news data such as sparsity, redundancy, and classifier uncertainty. We present a modular three-stage pipeline that (i) aggregates article-level scores onto a regular temporal grid with uncertainty-aware and redundancy-aware weights, (ii) fills coverage gaps through strictly causal projection rules, and (iii) applies causal smoothing to reduce residual noise. Because ground-truth longitudinal sentiment labels are typically unavailable, we introduce a label-free evaluation framework based on signal stability diagnostics, information preservation lag proxies, and counterfactual tests for causality compliance and redundancy robustness. As a secondary external check, we evaluate the consistency of reconstructed signals against stock-price data for a multi-firm dataset of AI-related news titles (November 2024 to February 2026). The key empirical finding is a three-week lead lag pattern between reconstructed sentiment and price that persists across all tested pipeline configurations and aggregation regimes, a structural regularity more informative than any single correlation coefficient. Overall, the results support the view that stable, deployable sentiment indicators require careful reconstruction, not only better classifiers.

Summary

Main Finding

Reconstructing a causal temporal sentiment series from sparse, redundant, and uncertain news requires a dedicated three-stage pipeline (aggregation → causal gap-filling → causal smoothing). Using this approach on AI-related news (Nov 2024–Feb 2026) the authors find a robust structural regularity: the reconstructed sentiment series exhibits a persistent ~three-week lead/lag relationship with stock prices across pipeline configurations. The paper argues that reliable deployable sentiment indicators depend as much on careful reconstruction as on classifier improvements.

Key Points

  • Problem framing: treat transforming article-level probabilistic sentiment outputs into a temporal signal as a causal signal-reconstruction problem rather than a pure classification task.
  • Three core reconstruction stages (strictly causal at each step):
  • Aggregation onto a regular time grid (weekly/monthly) with configurable weights.
  • Causal gap-filling via forward carry with optional staleness decay.
  • Causal smoothing (EWMA, Kalman variants, Beta–Binomial) to reduce residual noise.
  • Aggregation choices and enhancements:
    • Global vs. local (per-category) aggregation.
    • Uncertainty-aware weights: normalized entropy (Went), top-two margin (Wtop), polarity-conflict (Wpol).
    • Redundancy control via embedding-based grouping; weighting families:
      • Deduplication: ϕded(n) = n^(-α) (α in [0,1]).
      • Corroboration: ϕcor(n) = 1 for n=1, log(n) for n>1.
    • Intra-bin recency weight: causal exponential-type decay within bin.
    • Composite weight wp,i = Wunc · Wdup · Wtime (multiplicative).
  • Causal gap-filling: forward carry of last observed bin scaled by non-increasing staleness function g(Δ), options include constant carry-forward or finite-horizon linear decay.
  • Causal smoothing: strictly using past+present only; methods include EWMA, Kalman filters (and weighted variants that use article counts tk as proxy for observation variance), Beta–Binomial conjugate smoother.
  • Label-free evaluation framework (because no longitudinal ground truth):
    • Internal diagnostics: stability (e.g., total variation), smoothing-induced lag, behavior under gap-filling.
    • Counterfactual tests: impulse causality test (enforce/verify strict causal compliance), duplicate-injection test (robustness to redundancy).
    • External plausibility: consistency checks versus stock-price dynamics (secondary plausibility filter, not proof of causal effect).
  • Empirical regularity: across pipeline variants and aggregation regimes, sentiment shows a three-week lead–lag pattern with stock prices—presented as a structural regularity rather than a single predictive correlation.

Data & Methods

  • Data: multi-firm dataset of AI-related news titles collected by vocabulary-based scraping; evaluation period reported is November 2024 to February 2026. Each article has a timestamp, category (vocabulary), and a fixed classifier output probability vector π = (pos, neg, neu).
  • Scoring: scalar sentiment sp,i = πpos − πneg ∈ [−1, 1].
  • Aggregation:
    • Time binned into regular grid {D1..DT}. For bin Dk, aggregated value is typically a weighted mean of sp,i with weights wp,i.
    • Optional local (per-category then combine) vs. global aggregation.
  • Uncertainty weighting:
    • Went = 1 − normalized entropy over the 3-class output.
    • Wtop = difference between top-two probabilities (margin confidence).
    • Wpol = 1 − Upol(π) where Upol penalizes polar-conflict (high mass on pos and neg but conflicted).
  • Redundancy control:
    • Compute embeddings ep,i and cluster per (bank, bin) by similarity (connected components, hierarchical, etc.).
    • Assign duplication weight Wdup = ϕ(n) where n = cluster size; choose dedup (n^-α) or corroboration (log n) families depending on whether duplicates are noise or informative.
  • Gap-filling:
    • Forward carry with staleness decay g(Δ) (constant or finite-horizon linear decay).
  • Smoothing:
    • EWMA: S_t = α F_t + (1−α) S_{t−1}.
    • Kalman-family filters and weighted Kalman that incorporate tk (article count) as proxy for observation reliability (heteroscedasticity).
    • Beta–Binomial conjugate smoother as an alternative probabilistic smoother.
  • Evaluation:
    • Internal metrics: total variation, lag induced by smoother (information-preservation lag proxies).
    • Counterfactual tests: inject impulse/noise or duplicate documents to test strict causality and robustness.
    • External check: compare lead/lag relationships with stock prices across firms. The observed ~3-week lead/lag persisted across pipeline settings.

Implications for AI Economics

  • Practical signal engineering matters: For applications in market surveillance, technology diffusion tracking, or portfolio overlays, transforming per-article classifier outputs into a causal, stable time series is crucial. Improving classifier accuracy alone is insufficient.
  • Robust monitoring: Embedding-based deduplication or corroboration and uncertainty weighting meaningfully change the effective signal; choice should be dataset- and task-specific (e.g., local aggregation + deduplication when thematic coverage uneven).
  • Deployment constraints: Strict causality (no lookahead) must be enforced in real-time systems; gap-filling and smoothing choices introduce trade-offs between stability and responsiveness (and add predictable lag).
  • Predictive structure (caveats): The observed three-week structural lead–lag vs. prices suggests informational content in reconstructed news sentiment, but external consistency is only a plausibility check—not proof of causal market impact. Economic interpretation requires further causal identification and control for confounders.
  • Label-free evaluation value: In domains lacking ground-truth longitudinal sentiment labels, the proposed diagnostics and counterfactual tests offer a practical evaluation toolkit for comparing pipeline configurations and guarding against design-induced artifacts.
  • Research & policy: For AI-economics studies that use textual signals to study technology adoption, competition, or asset pricing, this work highlights (i) the need to report reconstruction choices and their induced lag/stability, and (ii) that structural regularities (e.g., consistent lead/lag) can be more informative than single correlation statistics.
  • Limitations to note: No ground-truth latent sentiment series is available; the external stock-price check is a secondary plausibility test; dataset specifics (vocabulary scraping, news sourcing) and parameter choices (aggregation window, clustering threshold, decay horizon) materially affect outcomes and should be validated per application.

Summary takeaway: when using news-derived sentiment in AI economics, treat temporal signal construction as a causal reconstruction problem—use uncertainty weighting, redundancy-aware aggregation, causal gap-filling, and causal smoothing, and validate pipelines with label-free diagnostics and counterfactual tests before interpreting market associations.

Assessment

Paper Typedescriptive Evidence Strengthmedium — The pipeline is validated through extensive model-based diagnostics and a consistent three-week lead/lag pattern with stock prices across configurations, which supports internal robustness; however, there is no ground-truth sentiment series, no exogenous variation or instrumental source to identify causal effects on prices, and findings rely on correlations and choices about aggregation/weights and a fixed classifier, limiting causal claims and external validation. Methods Rigormedium — The paper presents a carefully structured, modular pipeline (uncertainty-aware weighting, redundancy handling, causal projection, causal smoothing) and introduces sensible label-free diagnostics and counterfactual tests, showing attention to engineering and validation; but rigor is constrained by lack of ground-truth labels, potential sensitivity to classifier calibration and hyperparameters, limited information about robustness to alternative classifiers or source selection, and absence of stronger identification strategies. SampleMulti-firm dataset of AI-related news titles (November 2024–February 2026) scored by a fixed probabilistic sentiment classifier at the article level, aggregated onto a regular temporal grid and compared to matched firm-level stock-price time series; paper does not report exhaustive sample size or firm selection criteria in the summary provided. Themesinnovation adoption IdentificationNo external causal identification for economic effects; identification of a latent sentiment series is achieved by modeling choices: uncertainty- and redundancy-aware weighted aggregation of classifier probability outputs onto a regular temporal grid, strictly causal projection rules to fill gaps, and causal smoothing to reduce noise; validation uses label-free diagnostics (stability, information-lag proxies) and counterfactual-style tests plus correlation/lead-lag comparison with firm stock-price data. GeneralizabilityShort and recent time window (Nov 2024–Feb 2026) may reflect specific market regimes, Restricted to AI-related news titles (domain-specific) and likely English-language / particular news sources, Relies on a single fixed classifier — results may depend on classifier calibration and architecture, Works on firms with public stock-price data; not directly generalizable to private firms or non-financial outcomes, Aggregation and weighting choices may not transfer to contexts with different sparsity/redundancy patterns

Claims (8)

ClaimDirectionConfidenceOutcomeDetails
Sentiment signals derived from sparse news are commonly used in financial analysis and technology monitoring, yet transforming raw article-level observations into reliable temporal series remains a largely unsolved engineering problem. Other negative high reliability of temporal sentiment series reconstructed from article-level news
0.03
Rather than treating this as a classification challenge, we propose to frame it as a causal signal reconstruction problem: given probabilistic sentiment outputs from a fixed classifier, recover a stable latent sentiment series that is robust to the structural pathologies of news data such as sparsity, redundancy, and classifier uncertainty. Other positive high quality/stability of reconstructed latent sentiment series from classifier outputs
0.03
We present a modular three-stage pipeline that (i) aggregates article-level scores onto a regular temporal grid with uncertainty-aware and redundancy-aware weights, (ii) fills coverage gaps through strictly causal projection rules, and (iii) applies causal smoothing to reduce residual noise. Other positive high method for producing stable temporal sentiment series
0.03
Because ground-truth longitudinal sentiment labels are typically unavailable, we introduce a label-free evaluation framework based on signal stability diagnostics, information preservation lag proxies, and counterfactual tests for causality compliance and redundancy robustness. Other positive high evaluation of reconstructed sentiment signals without labeled longitudinal sentiment
0.03
As a secondary external check, we evaluate the consistency of reconstructed signals against stock-price data for a multi-firm dataset of AI-related news titles (November 2024 to February 2026). Firm Revenue positive high consistency (relationship) between reconstructed sentiment signals and stock prices
0.18
The key empirical finding is a three-week lead lag pattern between reconstructed sentiment and price that persists across all tested pipeline configurations and aggregation regimes. Firm Revenue positive high lead/lag interval between reconstructed sentiment and stock price (sentiment leads price by three weeks)
three-week lead-lag
0.18
This three-week lead-lag is a structural regularity more informative than any single correlation coefficient. Decision Quality positive high informativeness of lead-lag structural regularity versus single correlation coefficients
0.18
Overall, the results support the view that stable, deployable sentiment indicators require careful reconstruction, not only better classifiers. Decision Quality positive high reliability/deployability of sentiment indicators as a function of reconstruction method versus classifier improvements
0.18

Notes