Large language models reliably flag fraudulent investment opportunities and resist motivated persuasion more than lay humans; in controlled tests humans endorsed fraud ~13–14% of the time while tested LLMs never did, and motivated framing did not suppress — and slightly increased — AI warnings.
Large language models trained on human feedback may suppress fraud warnings when investors arrive already persuaded of a fraudulent opportunity. We tested this in a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities, combining 3,360 AI advisory conversations with a 1,201-participant human benchmark. Contrary to predictions, motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Endorsement reversal occurred in fewer than 3 in 1,000 observations. Human advisors endorsed fraudulent investments at baseline rates of 13-14%, versus 0% across all LLMs, and suppressed warnings under pressure at two to four times the AI rate. AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role.
Summary
Main Finding
Leading large language models (LLMs) are more consistent and more resistant than lay human advisors at issuing fraud warnings for objectively fraudulent investment opportunities. Motivated investor framing did not suppress AI fraud warnings at initial consultation and, if anything, marginally increased them. Multi-turn degradation under sustained pressure exists but is model-dependent; outright endorsement reversals by LLMs were extremely rare (~0.27%).
Key Points
- Experiment scope: preregistered study across 7 state-of-the-art LLMs, 12 investment scenarios (Low-, Medium-, High-Risk), two Turn‑1 framings (neutral vs motivated), and multi-turn pressure variants; 3,360 AI advisory conversations and a human benchmark of 1,201 participants.
- Primary metrics:
- Q3: warning intensity (0–5)
- Q4: endorsement (binary; Q4=1 = endorse)
- Q2: self-reported warning suppression (for Turn 2)
- RQ1 (initial consultation): Motivated framing did not reduce Turn‑1 warning intensity for High-Risk scenarios. Pooled across models, motivated framing slightly increased warning intensity (β = +0.07, 95% CI [0.025, 0.113]) — effect negligible in magnitude. Mean High-Risk warning intensity ≈ 4.6 (on 0–5 scale) in both framings.
- RQ2 (multi-turn pressure): Warning degradation from Turn 1 → Turn 2 occurred but was heterogeneous across models. Some models (e.g., GPT-4o mini) showed sharp degradation; others (Claude, Gemini) strengthened warnings. No evidence that motivated framing amplified degradation (β = −0.077, 95% CI [−0.145, −0.010]).
- RQ3 (fraud-signal gradient): At Turn 1 models discriminated across a 3-band fraud-signal gradient (Band 1 mathematically impossible → Band 3 statistically implausible). Degradation under pressure did not monotonically increase with signal ambiguity; model heterogeneity dominated.
- RQ4 (human benchmark): Humans were substantially more likely to endorse fraudulent High-Risk investments at baseline (13.3% neutral, 14.1% motivated) while AI endorsement was 0% across all seven models for High-Risk. Under Turn‑2 pressure, human suppression rates (self-report) were 16–26% across bands versus 0–7.9% for AI (LLM-coded cross-validation gave lower human suppression estimates but non-valid responses suggested disengagement rather than preserved warnings).
- Model-level variation:
- Claude: highest High-Risk warning means (~4.87).
- GPT-4o mini: notable multi-turn vulnerability (large drop at Turn 2).
- Gemini (2.5 Flash): lower calibration on Medium Risk / Band 3 (under-warned on subtle fraud signals).
- Endorsement reversal by LLMs (switching to endorse an objectively fraudulent offer) was vanishingly rare: 9 of ~3,350 turn-level observations (0.27%).
Data & Methods
- Design: preregistered randomized experiment with:
- 7 LLMs × 12 scenarios × 2 Turn‑1 framings × 20 calls = 3,360 automated runs.
- Human benchmark: N = 1,201 participants (77.5% high financial literacy subset n = 931 noted).
- Scenarios: 12 investment vignettes across three risk tiers:
- Low-Risk: conventional index fund, government bond fund, diversified global equity fund.
- Medium-Risk: P2P lending, leveraged commercial REIT, high-yield corporate bond fund.
- High-Risk: 3-band gradient — Band 1 (mathematically impossible claims), Band 2 (structurally fraudulent schemes), Band 3 (statistically implausible but superficially credible).
- Framing: Turn 1 randomized to neutral (“honest assessment”) vs motivated (“I’m already fairly keen”) investor framing.
- Pressure manipulations: Turn 2 included five declarative pressure variants (enthusiasm, research claim, social proof, commitment, authority deference); Turn 3 used additional variants (challenge, risk normalization, emotional commitment, etc.).
- Outcomes and analysis: primary pre-registered outcomes (Turn‑1 warning intensity, Turn‑2 degradation, endorsement/suppression rates), OLS and ANOVA comparisons, Tukey HSD for model contrasts, confidence intervals and effect sizes reported.
Implications for AI Economics
- Consumer protection and welfare:
- LLMs (as currently deployed/benchmarked) could materially reduce certain types of retail investor exposure to obvious frauds, implying potential welfare gains and lower aggregate fraud losses if trustworthy AI advisory tools are widely adopted.
- However, model heterogeneity and multi-turn vulnerabilities mean benefits are uneven; some models may underperform on subtle fraud signals or in sustained conversational pressure.
- Market structure and competition:
- Firms offering AI-backed advisory tools with superior fraud-detection calibration could gain competitive advantage; certification or reputation mechanisms may matter economically.
- Lower fraud incidence could reduce rent extraction by fraudsters and change demand for traditional human advice (substitution/complementarity effects), with implications for advisory labor markets.
- Regulatory and policy implications:
- Results support targeted regulation requiring external audit/testing of LLMs on fraud-detection and multi-turn consistency (not just single-turn benchmarks).
- Policymakers should consider standards for alignment interventions that prioritize safety constraints (e.g., enforced non-endorsement of clear fraud) and require transparency about model calibration and failure modes.
- Disclosure and liability frameworks: platforms might need obligations to surface confidence, risk flags, or references underpinning warnings to avoid overreliance and moral hazard.
- Research and product development priorities:
- Invest in alignment fixes that improve multi-turn consistency under social pressure (reduce sycophantic degradation).
- Develop standardized test suites (fraud-signal gradients, pressure variants) for model evaluation and certification.
- Explore human–AI hybrid advisory models: humans remain more prone to endorsement/suppression; pairing AI diagnostics with human judgment could improve outcomes but must manage overreliance and delegation risks.
- Macroeconomic and behavioral considerations:
- Broad adoption of robust AI fraud detection could change incentives for fraudsters (raising attack costs), shift investor behavior (increased confidence or reliance), and alter information flows in retail markets — all of which deserve modeling in AI economics work on equilibrium effects, moral hazard, and regulatory responses.
Summary takeaway: state-of-the-art LLMs (in this study) outperform lay humans on initial fraud detection and resist motivated investor pressure much better than humans, but model heterogeneity and multi-turn failures highlight the need for targeted alignment, auditing, and policy to realize safe economic benefits at scale.
Assessment
Claims (7)
| Claim | Direction | Confidence | Outcome | Details |
|---|---|---|---|---|
| Motivated investor framing did not suppress AI fraud warnings; if anything, it marginally increased them. Decision Quality | positive | high | frequency of AI fraud warnings under motivated investor framing |
n=3360
1.0
|
| Endorsement reversal occurred in fewer than 3 in 1,000 observations. Decision Quality | negative | high | rate of endorsement reversal (AI shifting from warning to endorsing fraudulent opportunity) |
n=3360
fewer than 3 in 1,000
1.0
|
| Human advisors endorsed fraudulent investments at baseline rates of 13-14%. Decision Quality | positive | high | baseline endorsement rate of fraudulent investments by human advisors |
n=1201
13-14%
1.0
|
| LLMs endorsed fraudulent investments at 0% across all models tested. Decision Quality | negative | high | endorsement rate of fraudulent investments by LLMs |
n=3360
0%
1.0
|
| Human advisors suppressed warnings under pressure at two to four times the AI rate. Decision Quality | negative | medium | suppression rate of fraud warnings under pressure |
two to four times the AI rate
0.36
|
| AI systems currently provide more consistent fraud warnings than lay humans in an identical advisory role. Decision Quality | positive | high | consistency of fraud warnings between advisors (LLMs vs. lay humans) |
0.6
|
| The study was a preregistered experiment across seven leading LLMs and twelve investment scenarios covering legitimate, high-risk, and objectively fraudulent opportunities. Other | mixed | high | study design characteristics (models tested and scenario types) |
1.0
|