Learning to Trust: How Humans Mentally Recalibrate AI Confidence Signals

Productive human-AI collaboration requires appropriate reliance, yet contemporary AI systems are often miscalibrated, exhibiting systematic overconfidence or underconfidence. We investigate whether humans can learn to mentally recalibrate AI confidence signals through repeated experience. In a behavioral experiment (N = 200), participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. Results demonstrate robust learning across all conditions, with participants significantly improving their accuracy, discrimination, and calibration alignment over 50 trials. We present a computational model utilizing a linear-in-log-odds (LLO) transformation and a Rescorla-Wagner learning rule to explain these dynamics. The model reveals that humans adapt by updating their baseline trust and confidence sensitivity, using asymmetric learning rates to prioritize the most informative errors. While humans can compensate for monotonic miscalibration, we identify a significant boundary in the reverse confidence scenario, where a substantial proportion of participants struggled to override initial inductive biases. These findings provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience.

Summary

Main Finding

Across repeated interactions people can learn to mentally recalibrate AI confidence signals and adapt reliance accordingly. In a 50-trial behavioral task (N = 200) participants substantially improved accuracy, discrimination (hit / false-alarm rates, d′), and calibration alignment for AIs that were overconfident, underconfident, or standard. A simple cognitive model — a linear-in-log-odds (LLO) mapping from reported confidence to perceived correctness whose intercept and slope are updated via Rescorla–Wagner learning with asymmetric learning rates — explains these dynamics. However, a strong boundary appears for a counterintuitive “reverse confidence” mapping (higher reported confidence → lower actual accuracy): many participants failed to fully invert their prior inductive bias, producing large individual differences.

Key Points

Experiment
- N = 200 (Prolific), median completion ≈ 9 min, small financial incentives.
- Between-subjects: four AI calibration conditions (standard, overconfident, underconfident, reverse).
- AI overall accuracy fixed at 50%; confidence scores sampled from logit-normal distributions (σ = 0.5) with means separated by 2 logit units to make learning possible.
- 50 trials per participant; confidence shown rounded to nearest 10%; feedback given trial-by-trial.
Behavioral results (early → late, trials 1–10 → 41–50)
- Accuracy rose substantially in all conditions: standard 62% → 86%; overconfidence 55% → 87%; underconfidence 62% → 88%; reverse 42% → 69%.
- Hit rate (HR) increased and false-alarm rate (FAR) decreased in all conditions → d′ increased (significant learning slopes p < 0.001).
- Expected calibration error (ECE) decreased significantly across conditions.
- Reverse condition: partial but incomplete learning; larger variance and many non-learners.
Individual differences in reverse condition
- 44% classified as non-learners (late-stage accuracy ≤ 60%) vs <15% non-learners in other conditions.
- Learners successfully inverted the slope (w50 ≈ −2.15); non-learners retained near-zero slope (w50 ≈ 0.01).
- Mechanism: learners had much larger learning rates for confidence sensitivity (αw), especially following wrong AI responses (e.g., αw,g=0 ≈ 0.50 for learners vs ≈ 0.015 for non-learners).
Cognitive model
- Perceived AI correctness vt modeled by: logit(vt) = bt + wt · logit(ct) (LLO transform).
- Parameters bt (baseline trust) and wt (confidence sensitivity) updated each trial by Rescorla–Wagner rule with separate asymmetric learning rates for g = correct vs wrong.
- Six participant-level parameters: b0, w0, and four αs (αb,g=1, αb,g=0, αw,g=1, αw,g=0). Fit with hierarchical Bayesian inference in Stan; model matches human choices on ~75% of trials (mean log-likelihood per trial = −0.38; McFadden’s pseudo-R^2 = 0.45).
- Priors show positive initial bias: mean w0 ≈ 0.69 (SD ≈ 1.8) and baseline trust mean ≈ 0.60.
- Asymmetric learning rates imply humans weigh certain prediction errors more heavily (learners update w much faster when AI is wrong in reverse condition).
Limitations noted by authors
- Laboratory-style, minimal-cue task (dot-counts were identical), 50 trials may not generalize to richer real-world tasks.
- Confidence displayed in coarse bins (10%); participants received explicit feedback each trial — conditions not always available in practice.
- Reverse mapping is an intentionally extreme and unnatural manipulation.

Data & Methods

Task: 50 trials; on each trial participants saw a 1s dot animation, received AI prediction + confidence (0–100%, binned to 10%), judged whether AI was correct, then received binary feedback.
AI confidence generation: logit-normal draws (σ = 0.5). Condition means (logit scale): standard (1, −1); overconfident (2, 0); underconfident (0, −2); reverse (−1, 1). 10,000 pre-generated confidence scores per condition.
Behavioral analyses: multilevel logistic/probit regressions for accuracy, HR, FAR, and d′ over trials; paired t-tests for ECE reductions.
Cognitive model: LLO mapping + Rescorla–Wagner updates for b and w with separate α for correct vs wrong. Hierarchical Bayesian estimation in Stan (4 chains × 2,000 samples), checked for convergence (R̂ < 1.01, ESS > 400).
Key quantitative outcomes: model explains ~75% of trial-level choices; non-learner fraction in reverse condition = 44%; learning rates for w much larger among learners (especially αw when AI is wrong).

Implications for AI Economics

Design trade-offs: calibration effort vs other objectives
- If users typically have repeated interactions and receive feedback, designers could tolerate some systematic miscalibration (e.g., over/underconfident outputs) because users can learn to compensate. This opens a potential economic trade-off: invest less in costly calibration tuning and more in other metrics (accuracy, compute efficiency, fairness), or invest in user-training and feedback mechanisms instead.
- But for counterintuitive or non-monotonic mappings (the reverse case), many users cannot or will not recalibrate — suggesting that designers cannot safely rely on user learning where confidence signals conflict with strong prior beliefs. In such cases, the social cost (errors, liability) may require investing in proper AI calibration or clearer communication.
Product strategy and pricing
- Lower-cost models that are miscalibrated might be viable for contexts with frequent, low-stakes interactions and immediate feedback (users self-correct). Conversely, high-stakes or low-frequency contexts require better calibration (or explicit reliability guarantees), affecting willingness-to-pay and insurance/liability pricing.
- Segmented markets: firms could offer “lite” versions that require user learning (cheaper) and “calibrated” premium versions (higher price) for customers who cannot or will not adapt.
Human capital and training investments
- Firms deploying AI should consider investments in user training, onboarding, and feedback systems (logs, post-hoc labels, supervised correction) as substitutes for algorithmic calibration. The cognitive model suggests targeted feedback that emphasizes the most informative errors accelerates adaptation (as learners showed high αw following wrong AI outputs).
Regulation and risk management
- Regulators should distinguish contexts where user learning is plausible (repeated, feedback-rich) from those where it is not (one-shot, high-stakes). Policies requiring minimum calibration or disclosure may be more necessary in one-shot/critical domains (medicine, safety), while looser rules might be justifiable in iterative consumer apps.
Metrics & evaluation
- Relying solely on intrinsic calibration metrics (ECE) may be insufficient to assess downstream impact; evaluations should consider the human-in-the-loop dynamics (how fast users recalibrate, prevalence of non-learners). Cost–benefit analyses for model improvements should include user adaptation costs and heterogeneity.
Market and welfare considerations
- Heterogeneity in learning rates implies unequal downstream outcomes across user populations. Markets may evolve where better-calibrated AIs attract risk-averse users or those in high-stakes roles, while less-calibrated but cheaper systems serve others — potentially exacerbating welfare disparities.
Practical design prescriptions (economic framing)
- Provide trial-by-trial feedback (low-cost in many digital services) to accelerate user recalibration and reduce need for expensive model calibration.
- Make confidence signals interpretable and monotonic where possible; avoid counterintuitive mappings.
- Track and report real-world calibration over user cohorts; use these signals to decide whether to invest in recalibration, user training, or product segmentation.

Suggested directions for applied research/evaluations in economics: - Quantify the cost trade-offs between investing in model calibration vs. user-training/feedback at scale. - Model adoption and pricing dynamics when users differ in learning ability and prior trust. - Field experiments in real-world, higher-stakes environments (healthcare triage, finance) to estimate external validity and welfare impacts.

If you want, I can (a) produce a one-page slide-ready summary highlighting the economic trade-offs and suggested KPIs for product teams, or (b) draft a short economic model (formalization) that compares the cost of algorithmic calibration vs. user-training given heterogeneous learning rates. Which would you prefer?

Assessment

Paper Typeother Evidence Strengthmedium — The study uses an experimental manipulation with N = 200 and trial-level measurement, giving strong internal validity for the observed learning dynamics; however, evidence is limited to a laboratory/online behavioral task with short-run exposure, simplified AI signals, and a non-representative sample, so external validity and claims about real-world economic impacts are limited. Methods Rigormedium — The experimental design and use of a principled computational model (linear-in-log-odds transform + Rescorla–Wagner learning rule with asymmetric rates) are appropriate and provide mechanistic insight, but the rigor rating is reduced by lack of reported pre-registration, unclear robustness checks and alternative model comparisons, potential demand effects, and limited ecological realism. Sample200 adult participants completed 50 trials each predicting an AI's correctness under four AI confidence-calibration conditions (standard, overconfident, underconfident, reverse confidence); task details and recruitment source not specified in the summary (likely online participants). Themeshuman_ai_collab skills_training adoption IdentificationControlled behavioral experiment manipulating the AI confidence mapping across four within-/between-subject conditions and measuring trial-by-trial changes in participants' predictions over 50 trials to attribute changes in calibration and discrimination to exposure to specific AI signal properties. GeneralizabilityLab/online behavioral task with simplified stimuli may not reflect real-world decision tasks or workplace settings, Short-term learning over 50 trials may not capture long-run adaptation or retention, Participant pool likely non-representative (e.g., online convenience sample rather than professionals interacting with deployed AI), Model AIs used (monotonic miscalibration and reverse mapping) may not match the complexity of deployed AI confidence signals, Findings on reverse-mapping failures may be specific to the experimental framing and not generalize to naturally occurring miscalibrations

Claims (10)

Claim	Direction	Confidence	Outcome	Details
We ran a behavioral experiment (N = 200) in which participants predicted the AI's correctness across four AI calibration conditions: standard, overconfidence, underconfidence, and a counterintuitive "reverse confidence" mapping. Other	null_result	high	experimental conditions / task setup (participants predicting AI correctness)	n=200 0.2
Participants significantly improved their prediction accuracy of the AI's correctness over 50 trials. Decision Quality	positive	high	accuracy (participants' correctness in predicting AI correctness)	n=200 0.2
Participants significantly improved their discrimination (ability to distinguish correct vs. incorrect AI outputs) over 50 trials. Decision Quality	positive	high	discrimination (ability to separate correct from incorrect AI outputs)	n=200 0.2
Participants significantly improved their calibration alignment (alignment between their confidence predictions and actual AI correctness) over 50 trials. Decision Quality	positive	high	calibration alignment (match between predicted confidence and AI correctness)	n=200 0.2
Robust learning occurred across all calibration conditions (standard, overconfidence, underconfidence, reverse) with participants improving accuracy, discrimination, and calibration. Decision Quality	positive	high	learning (improvements in accuracy, discrimination, calibration) across conditions	n=200 0.2
Humans can compensate for monotonic miscalibration (overconfidence and underconfidence) through repeated experience. Decision Quality	positive	high	compensation for monotonic miscalibration (ability to adjust to over/underconfident AI)	n=200 0.2
There is a significant boundary in the reverse confidence scenario: a substantial proportion of participants struggled to override initial inductive biases and thus had difficulty learning in that condition. Decision Quality	negative	high	failure/struggle rate in reverse confidence condition (ability to learn mappings that invert confidence cues)	n=200 0.12
A computational model using a linear-in-log-odds (LLO) transformation combined with a Rescorla–Wagner learning rule explains the observed learning dynamics. Skill Acquisition	positive	high	model fit to behavioral learning dynamics	n=200 0.12
The model indicates that humans adapt by updating two components: baseline trust and confidence sensitivity, and they use asymmetric learning rates that prioritize the most informative errors. Skill Acquisition	positive	high	latent learning parameters (baseline trust, confidence sensitivity, asymmetric learning rates)	n=200 0.12
These results provide a mechanistic account of how humans adapt their trust in AI confidence signals through experience. Decision Quality	positive	high	mechanistic explanation of trust adaptation to AI confidence signals	n=200 0.12

People quickly learn to recalibrate trust in miscalibrated AI confidence signals, improving judgment over repeated trials, but a counterintuitive 'reverse confidence' mapping resists correction for many users.