The Commonplace
Home Dashboard Papers Evidence Syntheses Digests 🎲
← Papers

An AI that challenges users’ reasoning improves judgment: when the system generates counterfactual critiques of human rationales, people rely less on the AI and make more accurate house-price predictions, though they report higher cognitive effort; the gains are strongest for participants comfortable with AI.

Understanding the Effects of AI-Assisted Critical Thinking on Human-AI Decision Making
Harry Yizhou Tian, H. Amin, Ming Yin · Fetched June 06, 2026 · Proceedings of the 2026 CHI Conference on Human Factors in Computing Systems
semantic_scholar quasi_experimental medium evidence 7/10 relevance DOI Source
AACT — an AI that critiques users' own decision rationales via counterfactual analysis — reduced over-reliance and improved accuracy in a house-price prediction case study but increased reported cognitive load and benefited some subgroups (e.g., users familiar with AI).

Despite the growing prevalence of human-AI decision making, the human-AI team’s decision performance often remains suboptimal, partially due to insufficient examination of humans’ own reasoning. In this paper, we explore designing AI systems that directly analyze humans’ decision rationales and encourage critical reflection of their own decisions. We introduce the AI-Assisted Critical Thinking (AACT) framework, which leverages a domain-specific AI model’s counterfactual analysis of human decision to help decision-makers identify potential flaws in their decision argument and support the correction of them. Through a case study on house price prediction, we find that AACT outperforms traditional AI-based decision-support in reducing over-reliance on AI, though also triggering higher cognitive load. Subgroup analysis reveals AACT can be particularly beneficial for some decision-makers such as those very familiar with AI technologies. We conclude by discussing the practical implications of our findings, use cases and design choices of AACT, and considerations for using AI to facilitate critical thinking.

Summary

Main Finding

AACT (AI-Assisted Critical Thinking), a framework that analyzes humans’ decision rationales and provides counterfactual feedback, improves human-AI team decision quality by reducing over-reliance on AI compared with standard AI decision-support—at the cost of increased cognitive load. Benefits are heterogenous, with larger gains for users already familiar with AI.

Key Points

  • Purpose: Move beyond explaining model outputs to directly analyzing and critiquing humans’ own reasoning, prompting users to reflect and correct faulty arguments.
  • Mechanism: A domain-specific AI generates counterfactual analyses of a decision-maker’s rationale (i.e., “if X were different, your conclusion would change”), highlighting possible flaws and suggesting corrections.
  • Outcome (case study on house price prediction):
    • AACT reduced over-reliance on the AI’s prediction more effectively than traditional decision-support.
    • AACT also increased reported or measured cognitive load.
    • Subgroup analysis: users very familiar with AI technologies gained the most from AACT.
  • Design trade-offs: stronger engagement and calibration versus higher mental effort; effectiveness depends on user characteristics and how counterfactual feedback is presented.

Data & Methods

  • Framework: Introduced AACT that takes as input a human’s decision rationale and produces domain-specific counterfactual critiques to encourage reflection and revision.
  • Evaluation: Empirical case study in the domain of house price prediction comparing AACT to conventional AI decision-support tools.
  • Metrics reported: decision performance (accuracy/calibration), measures of over‑reliance on AI, cognitive load, and subgroup heterogeneity (familiarity with AI).
  • Notes on scope: The summary reflects reported results from the single-domain case study; details such as sample size, exact model architecture, counterfactual generation method, and statistical tests were not provided in the brief and are needed to assess external validity and robustness.

Implications for AI Economics

  • Productivity and value of AI complements:
    • AACT can increase the productive complementarity of AI and human judgment by improving calibration and reducing blind reliance on model outputs.
    • However, higher cognitive load may reduce throughput or increase time costs per decision, affecting net productivity gains.
  • Adoption and diffusion:
    • Stronger benefits for AI‑literate users suggest differential adoption returns; organizations with more AI-savvy employees may capture more value, potentially widening performance gaps across firms and workers.
    • UI/UX and training investments can change who benefits—firms may invest in familiarizing workers with AI to realize AACT gains.
  • Market for AI tools and product design:
    • Demand may grow for decision-support that critiques human rationales (not just explains model outputs), creating niches for domain-specific counterfactual feedback systems.
    • Vendors face a trade-off in design between maximizing accuracy gains and minimizing cognitive burden; pricing and product positioning should reflect this.
  • Welfare and regulation:
    • Improved calibration can reduce systemic errors (e.g., mispriced assets), but uneven benefits and increased cognitive costs raise equity and efficiency questions.
    • Regulators and standard-setters may consider guidelines for AI systems that influence human reasoning (transparency about critique methods, limits on cognitive load).
  • Future research priorities for AI economics:
    • Quantify time-cost vs. accuracy trade-offs and aggregate productivity effects.
    • Generalize across domains with varying complexity and stakes (finance, medicine, policy).
    • Study incentives for firms to adopt rationale‑focused AI and the labor-market implications of heterogenous returns across worker skill groups.

Assessment

Paper Typequasi_experimental Evidence Strengthmedium — The paper provides experimental-style evidence showing AACT improves decision behavior in a controlled case study, which supports causal interpretation, but the result is limited to one domain (house-price prediction), likely a modest sample and lab-style tasks, and faces external validity and demand-characteristic concerns. Methods Rigormedium — Design appears to be an experimental comparison with subgroup analysis and multiple outcome measures; however, the summary omits key methodological details (randomization procedure, sample size and representativeness, pre-registration, robustness checks, statistical controls, multiple hypothesis adjustments), so rigor cannot be judged high. SampleHuman participants performed house-price prediction tasks assisted by a domain-specific AI that generates counterfactual analyses of participants' rationales (AACT) versus a conventional AI decision-support interface; housing feature data and ground-truth prices were used to evaluate accuracy. The summary does not specify sample size, recruitment source (e.g., students, online workers, professionals), or whether participants had relevant domain expertise. Themeshuman_ai_collab productivity skills_training IdentificationBetween-condition comparison: participants completed house-price prediction tasks under the AACT interface versus a traditional AI decision-support interface; causal claims rest on the experimental manipulation (assignment to interface condition) and comparison of downstream outcomes (decision accuracy, over-reliance, cognitive load). (Paper does not report whether assignment was fully randomized or pre-registered in the provided summary.) GeneralizabilitySingle domain (house price prediction) — may not transfer to other decision domains or high-stakes settings, Likely lab/short-term task with limited ecological validity compared with real-world, repeated decision contexts, Participant pool likely non-representative (students or online workers) — limits applicability to professional decision-makers, Relies on a particular AI model and UI design; effects may differ with other model quality, explanation formats, or deployment constraints, Increased cognitive load tradeoff may reduce adoption in time- or attention-constrained real-world settings

Claims (5)

ClaimDirectionConfidenceOutcomeDetails
We introduce the AI-Assisted Critical Thinking (AACT) framework, which leverages a domain-specific AI model’s counterfactual analysis of human decision to help decision-makers identify potential flaws in their decision argument and support the correction of them. Decision Quality positive high ability to identify and correct flaws in decision arguments
0.08
Through a case study on house price prediction, we find that AACT outperforms traditional AI-based decision-support in reducing over-reliance on AI. Automation Exposure positive high over-reliance on AI
0.48
AACT also triggers higher cognitive load. Worker Satisfaction negative high cognitive load
0.48
Subgroup analysis reveals AACT can be particularly beneficial for some decision-makers such as those very familiar with AI technologies. Decision Quality positive medium decision improvement for users familiar with AI (reduced over-reliance / improved decision performance)
0.29
Despite the growing prevalence of human-AI decision making, the human-AI team’s decision performance often remains suboptimal, partially due to insufficient examination of humans’ own reasoning. Decision Quality negative high human-AI team decision performance
0.08

Notes