← Papers

AI decision-support modes do not directly boost novice teachers' task performance but shape it indirectly by changing users' confidence and trust; agreement between teachers and AI substantially increases both trust and performance.

Shaping Human-AI Collaboration in Education: Effects of AI-Assisted Decision-Making Paradigms and Human-AI Decision Consistency on Pre-Service Teachers' Psychological States and Performance

Yingying Wang, Qin Ni, Haoxin Xu, Jiaqi Yin, Tingjiang Wei · Fetched May 10, 2026 · AAAI Conference on Artificial Intelligence

semantic_scholar rct medium evidence 7/10 relevance DOI Source PDF

In an experiment with 59 pre-service teachers, AI-assisted decision paradigms had no direct effect on task performance but affected performance indirectly via a sequential pathway of increased confidence then trust, while higher human–AI decision consistency raised confidence, trust, and performance.

Artificial intelligence is playing an increasingly important role in supporting decision-making, particularly in educational contexts, where it serves as a critical tool to assist teacher judgment and optimize instructional decisions. However, limited research has examined how different AI-assisted decision-making paradigms influence the Performance of human-AI collaboration, as well as the underlying psychological mechanisms and causal pathways. Therefore, this study investigated 59 pre-service teachers to examine how AI-assisted decision-making paradigms and human-AI consistency influenced their psychological states and task performance. Specifically, this study employed a two-factor mixed experimental design, with the AI-assisted decision-making paradigms as the between-subjects factor and human-AI consistency as the within-subjects factor. Data were analyzed using the Bayesian cumulative link mixed model and structural equation modeling. The results reveal that AI-assisted decision-making paradigms do not have a significant direct effect on task performance. However, when the moderating role of human-AI decision consistency is taken into account, the effect of AI-assisted decision-making paradigms on task performance can exert its influence indirectly through a sequential psychological pathway involving users’ confidence and their trust in the AI. Consistency between human and AI decisions not only significantly enhances users’ trust in AI, confidence, and task performance, but the proportion of consistent decisions also significantly moderates the impact of AI-assisted decision-making paradigms on users’ confidence levels. Notably, our findings indicate that users maintain a moderately level of trust in AI even when their decisions diverge from those of AI. In summary, this study highlights the mediating mechanism by which AI-assisted decision-making paradigms influence task performance through psychological states and identifies the moderating role of human-AI consistency in this pathway. These findings advance the theoretical understanding of human-AI interaction models in educational contexts and offer mechanistic insights to guide the optimization of instructional AI systems.

Summary

Main Finding

AI-assisted decision-making paradigms (concurrent vs sequential) did not directly change task performance in an instructional student-performance prediction task. Instead, the type of paradigm influences performance indirectly through a sequential psychological pathway: the paradigm affects users’ confidence, which affects trust in the AI, and these psychological states jointly shape performance. Crucially, the proportion of human–AI decision consistency (alignment) both improves trust, confidence and performance and moderates the paradigm→confidence link. Unexpectedly, higher self-reported confidence was associated with worse performance (larger absolute error).

Key Points

Participants and task
- N = 59 pre-service teachers (29 concurrent, 30 sequential).
- Task: predict student final grade (25 trials) using 8 selected features from the UCI Student Performance dataset.
- AI support: identical random-forest predictions in both conditions; only interaction paradigm differed.
Manipulated paradigms
- Concurrent: AI prediction shown before participants make their decision.
- Sequential: participants make an initial decision, then see AI suggestion and may revise it.
Measures
- Trust in AI and confidence in one’s final decision: single-item, momentary ratings on 7-point Likert scales after each trial.
- Task performance: absolute error between participant prediction and true grade.
Main statistical results (CLMM & SEM)
- CLMM: Human–AI decision consistency significantly reduced decision error (Estimate = 0.26, 95% CI [0.09, 0.42]) and increased trust (consistency main effect large; Estimate ≈ −2.40 on trust scale). AI paradigm had no main effect on performance (Estimate = −0.04, 95% CI [−0.20, 0.13]).
- Confidence: sequential paradigm yielded lower confidence than concurrent (CLMM Estimate = −0.77, 95% CI [−1.46, −0.12]; SEM β = −0.613, p = 0.015). Inconsistency lowered confidence (CLMM).
- SEM: No direct effect of paradigm on performance (β = −0.239, p = 0.297). Confidence positively predicted decision error (β = 0.537, p < 0.001) — i.e., higher confidence → larger errors. The model supports a sequential mediation (paradigm → confidence → trust → performance) with human–AI consistency moderating the paradigm→confidence path.
Additional behavioral insight
- Participants maintained a moderate level of trust in AI even when their decisions disagreed with the AI, though trust was lower under inconsistency.

Data & Methods

Sample: 59 pre-service teachers, randomized between concurrent vs sequential paradigms.
Task design: 25 repeated prediction trials per participant; no ground-truth feedback during task.
AI: Random forest trained on UCI Student Performance data; identical accuracy across conditions.
Measures: single-item momentary trust and confidence (7-pt Likert), absolute error as performance metric.
Analyses:
- Bayesian cumulative link mixed models (CLMM) with random intercepts for participants and trials to examine main and interaction effects on ordinal trust/confidence and discrete absolute-error outcomes.
- Structural equation modeling (SEM; SmartPLS) to test a moderated sequential mediation: paradigm → confidence → trust → performance, with human–AI decision consistency as moderator on the paradigm→confidence path. Bootstrapping (5,000 resamples) used for inference.
Limitations to note: modest sample size, single-domain (pre-service teachers) and lab-style task (no feedback), single-item measures for psychological states, and a controlled AI accuracy (limits generalization to variable-quality AI).

Implications for AI Economics

Product design and value capture
- Interaction paradigm matters economically via its psychological effects. Vendors should optimize how AI recommendations are presented (concurrent vs sequential) to shape user confidence and trust rather than assuming a direct productivity gain from any AI interface.
- Features that preserve or calibrate appropriate user confidence (e.g., explanations, uncertainty estimates) can have outsized returns because confidence mediates downstream performance and adoption.
Adoption, pricing and procurement
- Human–AI consistency (alignment) is a key value-driver: higher alignment increases trust and perceived effectiveness. Procurement decisions and pricing models should account for design elements that increase perceived alignment (calibration, personalization, transparency).
- Buyers (schools, districts) should evaluate not only aggregate accuracy but also how AI outputs interact with human decision processes (i.e., probability of agreement and impact on confidence).
Market outcomes and policy
- Overconfidence risk: higher confidence can correlate with worse outcomes. Economic evaluations (cost–benefit, ROI) should incorporate behavioral responses (miscalibrated confidence) that can reduce realized productivity gains from AI.
- Regulation and auditing: metrics for education-AI performance should include human-AI agreement rates and measures of user calibration (confidence vs accuracy), not only model accuracy.
Modeling and forecasting impacts
- Macroeconomic or microeconomic models of AI-driven productivity should include behavioral parameters (trust, confidence, agreement probability) and interaction-paradigm effects. Returns to AI investments will depend on these behavioral channels.
Implementation and training investments
- Complementary investments (teacher training, interface design to surface alignment information, decision-decision reconciliation workflows) can increase realized benefits from AI tools, affecting cost-effectiveness and adoption curves.
Research & evaluation priorities
- Payoffs from different interaction paradigms will vary by context, task difficulty, and user population. Economic evaluations (pilots, RCTs) should be used to estimate heterogeneity in psychological channels before large-scale deployments.

Suggestions for follow-up empirical work (relevant to economists) - Larger, more diverse samples and field deployments to estimate external validity and heterogeneity in behavioral responses. - Vary AI accuracy and explainability features to quantify marginal returns to alignment/calibration interventions. - Longitudinal studies to assess learning and calibration dynamics (does confidence bias attenuate with feedback?). - Cost-effectiveness analyses comparing investment in model accuracy vs interface/UX and training that target trust/confidence alignment.

Assessment

Paper Typerct Evidence Strengthmedium — The randomized experimental design gives good internal validity for direct effects of the between-subject manipulation and the within-subject consistency manipulation strengthens causal leverage for those contrasts; however the small sample (N=59), reliance on lab tasks and self-reported psychological mediators, and the strong assumptions required for SEM mediation reduce confidence in external validity and in the robustness of indirect-effect estimates. Methods Rigormedium — The study uses appropriate methods for the design (Bayesian cumulative link mixed models to handle repeated measures and ordinal outcomes, and SEM to test sequential mediation), but the modest sample size limits power and precision, mediation inference requires untestable assumptions, and the report (as summarized) lacks details on pre-registration, manipulation checks, model diagnostics, and robustness checks. Sample59 pre-service teachers (convenience sample) participating in a lab/experimental task framed as educational decision-making; participants were randomly assigned to AI-assisted decision-making paradigms (between-subjects) and completed multiple trials with varying proportions of human–AI decision consistency (within-subjects); demographic and contextual details (country, age, prior AI experience) not specified in the summary. Themeshuman_ai_collab skills_training IdentificationRandomized between-subject assignment to different AI-assisted decision-making paradigms combined with a within-subject manipulation of human–AI decision consistency across repeated trials; causal effects on performance rely on the randomization for the between-subject manipulation and on mixed-models that account for repeated measures, while indirect (mediation) pathways are estimated using structural equation modeling (SEM). Identification of mediation effects additionally relies on standard SEM assumptions (no unmeasured confounding of mediator–outcome paths, correct model specification) and on the measurement validity of trust and confidence. GeneralizabilitySmall convenience sample of pre-service teachers — not representative of experienced in-service teachers or broader labor force., Laboratory/short-term decision tasks may not reflect high-stakes, real-world classroom settings or long-term use/adoption., AI systems in experiment likely prototypes or simulated — results may not generalize to deployed commercial systems with different interfaces or accuracies., Cultural/geographic context and demographics not reported, limiting cross-population generalizability., Outcome is task performance in experimental setting and self-reported trust/confidence rather than long-run productivity, learning outcomes, or classroom impacts.

Claims (7)

Claim	Direction	Confidence	Outcome	Details
AI-assisted decision-making paradigms do not have a significant direct effect on task performance. Output Quality	null_result	high	task performance	n=59 0.6
When human-AI decision consistency is taken into account, AI-assisted decision-making paradigms influence task performance indirectly through a sequential psychological pathway involving users’ confidence and their trust in the AI. Output Quality	positive	high	task performance (mediated effect)	n=59 0.6
Consistency between human and AI decisions significantly enhances users' trust in AI. Ai Safety And Ethics	positive	high	trust in AI	n=59 0.6
Consistency between human and AI decisions significantly enhances users' confidence. Decision Quality	positive	high	users' confidence	n=59 0.6
Consistency between human and AI decisions significantly enhances task performance. Output Quality	positive	high	task performance	n=59 0.6
The proportion of consistent decisions significantly moderates the impact of AI-assisted decision-making paradigms on users' confidence levels. Decision Quality	positive	high	users' confidence (moderation effect)	n=59 0.6
Users maintain a moderate level of trust in AI even when their decisions diverge from those of AI. Ai Safety And Ethics	positive	high	trust in AI under decision divergence	n=59 0.6