← Papers

Deep-learning models (LSTM and Transformer) beat traditional and tree‑based methods in predicting A‑share returns and produce more robust long–short portfolios, according to an out-of-sample 2013–2024 study; gains are visible on a finance‑specific WEI metric and in tail‑risk metrics, though real-world trading frictions could narrow profits.

Optimizing stock market prediction and stock trading strategies with deep learning models enhanced by nonlinear feature identification and robust prediction evaluation

Haoyu Wang, Dejun Xie, Yuqing Duan, Wenze Xiong, D. Z. Chen · May 08, 2026 · Financial Innovation

openalex correlational medium evidence 7/10 relevance DOI Source PDF

In China’s A‑share market (2013–2024), deep-learning models—particularly LSTM and Transformer—outperform linear and tree-based benchmarks in cross-sectional return prediction and produce stronger WEI scores and long–short portfolio performance with improved tail-risk control.

This study investigates cross-sectional stock return prediction in the A-share market from 2013 to 2024 with equity and firm-characteristic data available at databases such as RESSET and CSMAR for more than 5,000 listed firms. We introduce the Diff-RMSE method for nonlinear factor identification and the weighted evaluation index (WEI), a finance-specific performance metric that integrates prediction accuracy with market adaptability. Based on 30 market, liquidity, valuation, profitability, technical and risk factors, we compare linear models, tree-based machine learning and deep learning architectures—including GRU, LSTM and Transformer—within a rolling-window forecasting framework. We further translate return forecasts into long–short portfolios to assess economic performance. Our results show that deep learning models, particularly LSTM and Transformer, deliver superior accuracy, more stable WEI scores and stronger tail-risk control than traditional benchmarks. These findings provide practical guidance for quantitative portfolio managers and enrich the literature on machine-learning-based stock-selection models in stock markets.

Summary

Main Finding

Deep learning models—particularly LSTM and Transformer—outperform linear and tree-based benchmarks in cross-sectional stock-return prediction on the China A‑share market (2013–2024). They deliver higher predictive accuracy, more stable finance-specific performance (WEI), stronger long–short portfolio returns, and better tail-risk control. Two new tools introduced—Diff‑RMSE (nonlinear feature identification) and the weighted evaluation index (WEI)—improve factor screening and model evaluation for stock-selection tasks.

Key Points

Sample and scope: Firm-level A‑share data (RESSET, CSMAR), 2013–2024, >5,000 listed firms; 30 predictors spanning market, liquidity, valuation, profitability, technical and risk dimensions.
New methodological tools:
- Diff‑RMSE: model‑agnostic recursive feature‑elimination measure to quantify marginal and interaction contributions of factors across rolling windows (nonlinear factor identification).
- WEI (Weighted Evaluation Index): a finance‑specific metric combining error magnitude and directional accuracy, capturing cross‑sectional asymmetry and market adaptability; complements RMSE/MAE and directional losses (MADL/GMADL).
Models compared: linear regressions and regularized linear models, tree‑based ML (e.g., gradient boosting), and deep architectures (GRU, LSTM, Transformer). SHAP used alongside Diff‑RMSE for interpretability.
Experimental design: rolling‑window forecasting to mimic real‑time deployment and reduce look‑ahead bias; forecasts translated into ranked long–short portfolios; economic performance assessed with risk‑adjusted statistics and tail‑risk measures.
Main empirical results:
- LSTM and Transformer produce superior out‑of‑sample accuracy and higher, more stable WEI scores than linear/tree models.
- Deep models generate materially stronger long–short returns and show improved tail‑risk control.
- Diff‑RMSE helps identify nonlinear and interaction effects among the 30 candidate factors that conventional linear screening may miss.
Practical emphasis: attention mechanisms (transformers) and Shapley analyses help partially address interpretability concerns of deep models in finance.

Data & Methods

Data: RESSET and CSMAR firm-level panels, A‑share universe, monthly (or next‑period) return prediction horizon; 2013–2024.
Features: 30 characteristics across six categories (market, liquidity, valuation, profitability, technical, risk).
Preprocessing: standard cross‑sectional feature preprocessing (scaling, winsorization, etc.; specifics in full paper).
Forecasting framework:
- Rolling‑window estimation (window length chosen empirically to balance bias/variance and model sample needs).
- Models re‑trained each window; next‑period cross‑sectional returns forecasted.
Models:
- Linear baselines (OLS, penalized regressions), tree models (gradient boosting), deep nets (GRU, LSTM, Transformer; attention‑based variants).
Interpretability and feature screening:
- Diff‑RMSE: recursive elimination assessing marginal RMSE change when removing factors across windows to detect nonlinear importance and interactions.
- SHAP (Shapley values) to decompose feature contributions per model.
Evaluation:
- Statistical: RMSE, MAE.
- Directional/asymmetric: MADL, GMADL.
- New finance‑centric: WEI (combines magnitude and directional accuracy; sensitive to cross‑sectional imbalance).
- Economic: long–short portfolio returns, risk‑adjusted metrics, tail‑risk statistics.

Implications for AI Economics

For empirical asset‑pricing and factor research:
- Nonlinear screening (Diff‑RMSE) can reduce false discoveries from the factor zoo by revealing interaction and state‑dependent contributions that linear screens miss.
- Finance‑tailored evaluation (WEI) better aligns model selection with economic goals (directional accuracy and asymmetric losses) than generic loss metrics.
For practitioners and portfolio managers:
- LSTM/Transformer architectures can meaningfully improve stock‑selection outcomes in an emerging‑market A‑share context, but require careful rolling re‑training, transaction‑cost-aware implementation, and interpretability tools.
- Attention mechanisms and SHAP-style attributions increase operational transparency, aiding adoption despite deep models’ “black box” reputation.
For AI economics and market structure:
- Demonstrated gains from flexible deep models suggest increasing potential for AI to compress exploitable cross‑sectional signals, which may affect persistency of anomalies and the evolution of market efficiency.
- Adoption of these methods raises demands for robust out‑of‑sample evaluation protocols and regulatory scrutiny regarding model risk and explainability.
Directions for future work:
- Generalizability tests across other geographies/periods, explicit incorporation of transaction costs and market impact, and exploration of alternative/unstructured inputs (text, alternative data) within the Diff‑RMSE/WEI framework.

Assessment

Paper Typecorrelational Evidence Strengthmedium — Uses a large universe (2013–2024 A‑share, >5,000 firms) and a rolling-window out-of-sample framework with both statistical (RMSE/WEI) and economic (long–short portfolios, tail-risk metrics) evaluations, which provides credible predictive evidence; however, results are not causal, and common finance pitfalls (data-snooping, hyperparameter search bias, potential look-ahead, limited transaction‑cost/timing realism) may inflate apparent performance. Methods Rigormedium — Appropriate modern methods (comparative baseline models, rolling-window holdouts, multiple architectures including GRU/LSTM/Transformer, and a domain-specific WEI metric) indicate careful implementation, but the description lacks detail on key rigor elements such as hyperparameter tuning and validation protocols, multiple-testing correction, handling of survivorship or listing-bias, transaction-cost and liquidity-adjusted backtests, and robustness checks across market regimes. SampleChinese A-share listed firms from 2013–2024 (over 5,000 firms) with equity and firm-characteristic data drawn from RESSET and CSMAR; 30 predictor variables spanning market, liquidity, valuation, profitability, technical and risk factors; rolling-window cross-sectional return forecasting; model families compared include linear models, tree-based machine learning, and deep-learning architectures (GRU, LSTM, Transformer); forecasts translated into long–short portfolios for economic evaluation. Themesinnovation adoption GeneralizabilityResults are specific to the China A‑share market and 2013–2024 period and may not generalize to other countries or time periods., Model performance may depend on the chosen frequency and exact factor definitions available in RESSET/CSMAR., Backtests may not fully account for realistic transaction costs, market impact, shorting constraints, or capacity limits., Potential survivorship or listing-bias in the sample could overstate real-world returns., Hyperparameter tuning, model selection, and data-snooping risks may reduce out-of-sample transferability to other datasets or regimes.

Claims (9)

Claim	Direction	Confidence	Outcome	Details
The study uses A-share market data from 2013 to 2024 with equity and firm-characteristic data available from databases such as RESSET and CSMAR for more than 5,000 listed firms. Other	null_result	high	dataset coverage (time span and number of firms)	n=5000 0.5
We introduce the Diff-RMSE method for nonlinear factor identification. Other	null_result	high	method for nonlinear factor identification	0.5
We introduce the weighted evaluation index (WEI), a finance-specific performance metric that integrates prediction accuracy with market adaptability. Other	null_result	high	performance evaluation metric (WEI)	0.5
The analysis is based on 30 market, liquidity, valuation, profitability, technical and risk factors and compares linear models, tree-based machine learning and deep learning architectures (including GRU, LSTM and Transformer) within a rolling-window forecasting framework. Other	null_result	high	model comparison across 30 factors within rolling-window forecasting	n=5000 0.5
Return forecasts are translated into long–short portfolios to assess economic performance. Other	null_result	high	economic performance of long–short portfolios constructed from forecasts	n=5000 0.3
Deep learning models, particularly LSTM and Transformer, deliver superior prediction accuracy compared to traditional benchmarks (linear and tree-based models). Output Quality	positive	high	prediction accuracy	n=5000 0.3
Deep learning models (especially LSTM and Transformer) produce more stable WEI scores than traditional benchmarks. Output Quality	positive	high	WEI stability /WEI scores	n=5000 0.3
Deep learning models (particularly LSTM and Transformer) exhibit stronger tail-risk control than traditional benchmark models. Output Quality	positive	high	tail-risk control (tail-risk metrics)	n=5000 0.3
These findings provide practical guidance for quantitative portfolio managers and enrich the literature on machine-learning-based stock-selection models in stock markets. Research Productivity	null_result	medium	practical guidance / contribution to literature	0.03